lab06-sol
pdf
keyboard_arrow_up
School
Concordia University *
*We aren’t endorsed by this school
Course
6721
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
12
Uploaded by CaptainMaskWildcat29
COMP 6721 Applied Artificial Intelligence (Fall 2023)
Lab Exercise #6: Artificial Neural Networks
Solutions
Question 1
Given the training instances below, use
scikit-learn
to implement a
Perceptron
classifier
1
that classifies students into two categories, predicting who will get
an ‘A’ this year, based on an input feature vector
x
. Here’s the training data
again:
Feature(x)
Output f(x)
Student
’A’ last year?
Black hair?
Works hard?
Drinks?
’A’ this year?
X1: Richard
Yes
Yes
No
Yes
No
X2: Alan
Yes
Yes
Yes
No
Yes
X3: Alison
No
No
Yes
No
No
X4: Jeff
No
Yes
No
Yes
No
X5: Gail
Yes
No
Yes
Yes
Yes
X6: Simon
No
Yes
Yes
Yes
No
Use the following Python imports for the perceptron:
import
numpy as np
from
sklearn.linear
_
model
import
Perceptron
All features must be numerical for training the classifier, so you have to trans-
form the ‘Yes’ and ‘No’ feature values to their binary representation:
# Dataset with binary representation of the features
dataset = np.array([[1,1,0,1,0],
[1,1,1,0,1],
[0,0,1,0,0],
[0,1,0,1,0],
[1,0,1,1,1],
[0,1,1,1,0],])
For our feature vectors, we need the first four columns:
X = dataset[:, 0:4]
and for the training labels, we use the last column from the dataset:
1
https://scikit-learn.org/stable/modules/linear
_
model.html#perceptron
1
y = dataset[:, 4]
(a) Now, create a Perceptron classifier (same approach as in the previous labs)
and train it.
Most of the solution is provided above. Here is the additional code required
to create a Perceptron classifier and train it using the provided dataset:
perceptron
_
classifier = Perceptron(max
_
iter=40, eta0=0.1, random
_
state=1)
perceptron
_
classifier.fit(X,y)
The parameters we’re using here are:
max
_
iter
The maximum number of passes over the training data (aka
epochs). It’s set to 40, meaning the dataset will be passed 40 times to
the Perceptron during training.
eta0
This is the learning rate, determining the step size during the weights
update in each iteration. A value of 0.1 is chosen, which is a moderate
learning rate.
random
_
state
This ensures reproducibility of results.
The classifier will
produce the same output for the same input data every time it’s run,
aiding in debugging and comparison.
Try experimenting with these values, for example, by changing the number
of iterations or learning rate. Make sure you understand the significance
of setting
random
_
state
.
(b) Let’s examine our trained Perceptron in more detail. You can look at the
weights it learned with:
print
(
"Weights: "
, perceptron
_
classifier.coef
_
)
And the bias, here called intercept term, with:
print
(
"Bias: "
, perceptron
_
classifier.intercept
_
)
The activation function is not directly exposed, but
scikit-learn
is using the
step
activation function. Now check how your Perceptron would classify a
training sample by computing the
net
activation (input vector
×
weights
+
bias) and applying the step function.
You can use the following code to compute the net activation on all training
data samples and compare this with your results:
net
_
activation = np.dot(X, perceptron
_
classifier.coef
_
.T) +
→
perceptron
_
classifier.intercept
_
print
(net
_
activation)
2
Remember that the step activation function classifies a sample as 1 if the
net activation is non-negative and 0 otherwise. So, if a net activation is
non-negative, the perceptron’s step function would classify it as 1, and
otherwise, it would classify it as 0.
(c) Apply the trained model to all training samples and print out the predic-
tion.
This works just like for the other classifiers we used before:
y
_
pred = perceptron
_
classifier.predict(X)
print
(y
_
pred)
This will print the classification results like:
[0 1 0 0 1 0]
Compare the predicted labels with the actual labels from the dataset. How
many predictions match the actual labels? What does this say about the
performance of our classifier on the training data?
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 2
Consider the neural network shown below. It consists of 2 input nodes, 1 hidden
node, and 2 output nodes, with an additional bias at the input layer (attached
to the hidden node) and a bias at the hidden layer (attached to the output
nodes). All nodes in the hidden and output layers use the sigmoid activation
function (
σ
).
(a) Calculate the output of y1 and y2 if the network is fed
x
= (1
,
0)
as input.
h
in
=
b
h
+
w
x
1
-
h
x
1
+
w
x
2
-
h
x
2
= (0
.
1) + (0
.
3
×
1) + (0
.
5
×
0) = 0
.
4
h
=
σ
(
h
in
) =
σ
(0
.
4) =
1
1 +
e
-
0
.
4
= 0
.
599
y
1
,in
=
b
y
1
+
w
h
-
y
1
h
= 0
.
6 + (0
.
2
×
0
.
599) = 0
.
72
y
1
=
σ
(0
.
72) =
1
1 +
e
-
0
.
72
= 0
.
673
y
2
,in
=
b
y
2
+
w
h
-
y
2
h
= 0
.
9 + (0
.
2
×
0
.
599) = 1
.
02
y
2
=
σ
(1
.
22) =
1
1 +
e
-
1
.
02
= 0
.
735
As a result, the output is calculated as
y
= (
y
1
, y
2) = (0
.
673
,
0
.
735)
.
(b) Assume that the expected output for the input
x
= (1
,
0)
is supposed to
be
t
= (0
,
1)
. Calculate the updated weights after the backpropagation of
the error for this sample. Assume that the learning rate
η
= 0
.
1
.
δ
y
1
=
y
1
(1
-
y
1
)(
y
1
-
t
1
) = 0
.
673(1
-
0
.
673)(0
.
673
-
0) = 0
.
148
δ
y
2
=
y
2
(1
-
y
2
)(
y
2
-
t
2
) = 0
.
735(1
-
0
.
735)(0
.
735
-
1) =
-
0
.
052
4
δ
h
=
h
(1
-
h
)
i
=1
,
2
w
h
-
y
i
δ
y
i
= 0
.
599(1
-
0
.
599)[0
.
2
×
0
.
148+0
.
2
×
(
-
0
.
052)] = 0
.
005
Δ
w
x
1
-
h
=
-
ηδ
h
x
1
=
-
0
.
1
×
0
.
005
×
1 =
-
0
.
0005
Δ
w
x
2
-
h
=
-
ηδ
h
x
2
=
-
0
.
1
×
0
.
005
×
0 = 0
Δ
b
h
=
-
ηδ
h
=
-
0
.
1
×
0
.
005 =
-
0
.
0005
Δ
w
h
-
y
1
=
-
ηδ
y
1
h
=
-
0
.
1
×
0
.
148
×
0
.
599 =
-
0
.
0088652
Δ
b
y
1
=
-
ηδ
y
1
=
-
0
.
1
×
0
.
148 =
-
0
.
0148
Δ
w
h
-
y
2
=
-
ηδ
y
2
h
=
-
0
.
1
×
(
-
0
.
052)
×
0
.
599 = 0
.
0031148
Δ
b
y
2
=
-
ηδ
y
2
=
-
0
.
1
×
(
-
0
.
052) = 0
.
0052
w
x
1
-
h,new
=
w
x
1
-
h
+ Δ
w
x
1
-
h
= 0
.
3 + (
-
0
.
0005) = 0
.
2995
w
x
2
-
h,new
=
w
x
2
-
h
+ Δ
w
x
2
-
h
= 0
.
5 + 0 = 0
.
5
b
h,new
=
b
h
+ Δ
b
h
= 0
.
1 + (
-
0
.
0005) = 0
.
0995
w
h
-
y
1
,new
=
w
h
-
y
1
+ Δ
w
h
-
y
1
= 0
.
2 + (
-
0
.
0088652) = 0
.
1911348
b
y
1
,new
=
b
y
1
+ Δ
b
y
1
= 0
.
6 + (
-
0
.
0148) = 0
.
5852
w
h
-
y
2
,new
=
w
h
-
y
2
+ Δ
w
h
-
y
2
= 0
.
2 + 0
.
0031148 = 0
.
2031148
b
y
2
,new
=
b
y
2
+ Δ
b
y
2
= 0
.
9 + 0
.
0052 = 0
.
9052
5
Question 3
Let’s see how we can build multi-layer neural networks using
scikit-learn
.
2
(a) Implement the architecture from the previous question using
scikit-learn
and use it to learn the XOR function, which is not linearly separable.
Use the following Python imports:
import
numpy as np
from
sklearn.neural
_
network
import
MLPClassifier
Here is the training data for the XOR function:
dataset = np.array([[1,1,0],
[0,1,1],
[1,0,1],
[0,0,0]])
For our feature vectors, we need the first two columns:
X = dataset[:, 0:2]
and for the training labels, we use the last column from the dataset:
y = dataset[:, 2]
Now you can create a multi-layer Perceptron using
scikit-learn
’s MLP (multi-
layer perceptron) classifier.
3
There are a lot of parameters you can choose
to define and customize, here you need to define the
hidden
_
layer
_
sizes
.
For this parameter, you pass in a tuple consisting of the number of neu-
rons you want at each layer, where the
n
th entry in the tuple represents the
number of neurons in the
n
th layer of the MLP model. You also need to
set the activation to ‘logistic’, which is the logistic Sigmoid function. The
bias and weight details are implicitly defined in the function definition.
Using the code blocks provided above, you can create the network and
train it on the XOR dataset with:
mlp = MLPClassifier(hidden
_
layer
_
sizes=(1,),activation=
'logistic'
)
mlp.fit(X, y)
(b) Now apply the trained model to all training samples and print out its
prediction.
y
_
pred = mlp.predict(X)
print
(y
_
pred)
As you see, our single hidden layer with a single neuron doesn’t perform
well on learning XOR. It’s always a good idea to experiment with different
network configurations. Try to change the number of hidden neurons to
find a solution!
2
https://scikit-learn.org/stable/modules/neural
_
networks
_
supervised.html
3
https://scikit-learn.org/stable/modules/generated/sklearn.neural
_
network.MLPClassifier.html
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
With a single hidden neuron, it can converge in theory, but it is difficult in
practice, highly depending on initial weights and other hyperparameters.
With two neurons in the hidden layer, it’s possible but not guaranteed to
find a solution. The success of training depends on the weight initialization
and the optimization algorithm’s ability to find a suitable combination of
weights. Often, it may get stuck in local minima.
Using three neurons in the hidden layer increases the representational ca-
pacity of the network, making it more likely to converge to a solution for
the XOR problem, Try:
mlp = MLPClassifier(hidden
_
layer
_
sizes=(3,), activation=
'logistic'
,
→
solver=
'lbfgs'
, max
_
iter=100000, random
_
state=42)
7
Question 4
Create a multi-layer Perceptron and use it to classify the MNIST digits dataset,
containing scanned images of hand-written numerals:
4
(a) Load MNIST from
scikit-learn
’s builtin datasets.
5
Like before, use the
train
_
test
_
split
6
helper function to split the digits dataset into a train-
ing and testing subset.
Create a multi-layer Perceptron, like in the pre-
vious question and train the model. Pay attention to the required size of
the input and output layers and experiment with different hidden layer
configurations.
import
numpy as np
from
sklearn
import
datasets
from
sklearn.neural
_
network
import
MLPClassifier
from
sklearn.model
_
selection
import
train
_
test
_
split
from
sklearn.metrics
import
accuracy
_
score, confusion
_
matrix, ConfusionMatrixDisplay
from
sklearn.metrics
import
precision
_
score, recall
_
score
import
matplotlib.pyplot as plt
MNIST digits is another built-in dataset in
scikit-learn
.
First load the
dataset. Since it contains two-dimensional image data, you need to flatten
it, so it can be presented to our neural network as input:
digits = datasets.load
_
digits()
# 2D images in feature matrix
n
_
samples =
len
(digits.images)
# number of samples
data = digits.images.reshape((n
_
samples, -1))
# flatten 2D images into 1D
The third line above “flattens” the 2D image arrays, so that the resulting
data
contains a 1D-vector for each image. Thus,
data
now contains one
row for each image in the dataset, with one column for each pixel in those
images and its value representing a gray scale pixel in the image.
Create training and test splits (reserving 30% of the data for testing and
using the rest of it for training):
4
https://en.wikipedia.org/wiki/MNIST
_
database
5
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load
_
digits.html
6
https://scikit-learn.org/stable/modules/generated/sklearn.model
_
selection.train
_
test
_
split.
html
8
X
_
train, X
_
test, y
_
train, y
_
test = train
_
test
_
split(
data, digits.target, test
_
size=0.3, shuffle=False)
Finally, train a neural network that can actually make predictions with:
mlp = MLPClassifier(hidden
_
layer
_
sizes=(100,), max
_
iter=1000, alpha=1e-4,
solver=
'sgd'
, verbose=
'true'
, random
_
state=1,
learning
_
rate
_
init=0.001)
mlp.fit(X
_
train, y
_
train)
(b) Now run an evaluation to compute the performance of your model using
scikit-learn
’s
7
accuracy score.
You can evaluate the model with:
y
_
pred = mlp.predict(X
_
test)
print
(
'Accuracy: %.2f'
% accuracy
_
score(y
_
test, y
_
pred))
Bonus visualization:
If you want to print out some example images from
the test set with their predicted label, you can use the code below:
# Randomly select 10 images and print them with their predicted labels
n, m = 2, 5
random
_
indices = np.random.choice(X
_
test.shape[0], n
*
m, replace=False)
selected
_
images = X
_
test[random
_
indices]
selected
_
predictions = y
_
pred[random
_
indices]
# Plot the selected images with their predictions in a 2x5 matrix
plt.figure(figsize=(10, 4))
for
i
in range
(n):
for
j
in range
(m):
idx = i
*
m + j
plt.subplot(n, m, idx + 1)
plt.imshow(selected
_
images[idx].reshape((8, 8)), cmap=
'gray'
)
plt.title(f
'Predicted: {selected
_
predictions[idx]}'
)
plt.axis(
'off'
)
plt.tight
_
layout()
plt.show()
(c) In any classification task, whether binary or multi-class, it’s crucial to
assess how well the model is doing.
Precision and recall are commonly
used metrics for this purpose. For binary classification, their computation
is straightforward. However, when we move to multi-class problems, the
landscape becomes more complex. This is where
micro
and
macro
averag-
ing come in, and they provide two different perspectives:
7
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy
_
score.html
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Micro-Averaging:
This method gives a global view.
It pools together
the individual true positives, false negatives, and false positives across
all classes, effectively treating the multi-class problem as a single bi-
nary classification.
It provides an overall sense of how the model is
performing, without differentiating between classes.
Macro-Averaging:
This method breaks down the performance by class.
It calculates precision and recall for each class separately and then av-
erages them. This means every class, regardless of its size, has an equal
say in the final score. It’s useful for understanding the model’s perfor-
mance on individual classes, especially when there are imbalances in
class sizes.
Both of these methods are standard in the field of machine learning and
not specific to any particular library, including
scikit-learn
.
They offer
complementary perspectives: while micro-averaging might show how well
the model performs overall, macro-averaging can highlight if it’s struggling
with any particular class.
Run an evaluation on your results and compute the precision and recall
score with micro and macro averaging, using
scikit-learn
’s
precision
_
score
8
and
recall
_
score
.
9
Make sure you compute these on your
test
set!
pre
_
macro = precision
_
score(y
_
test, y
_
pred, average=
'macro'
)
pre
_
micro = precision
_
score(y
_
test, y
_
pred, average=
'micro'
)
recall
_
macro = recall
_
score(y
_
test, y
_
pred, average=
'macro'
)
recall
_
micro = recall
_
score(y
_
test, y
_
pred, average=
'micro'
)
Here, the
micro
and
macro
averages are very similar, as the classes in this
dataset are mostly balanced. If one class has significantly fewer samples,
macro-averaging will give you a sense of how well the model performs on
that specific class compared to the others.
(d) Use the
confusion matrix
implementation from the
scikit-learn
package to
visualize your classification performance.
The confusion matrix provides a more detailed breakdown of a classifier’s
performance, allowing you to see not just where it got things right, but
where mistakes are being made. Each row in the matrix represents the true
classes, while each column represents the predicted classes. It’s a powerful
tool to understand misclassifications, especially in multi-class problems.
cm = confusion
_
matrix(y
_
test, y
_
pred)
ConfusionMatrixDisplay(cm, digits.target
_
names).plot()
plt.show()
You should get an output similar to the following:
8
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision
_
score.html
9
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall
_
score.html
10
By examining the heatmap, you can quickly identify which classes the
model is confusing with others. The diagonal elements represent the num-
ber of points for which the predicted label is equal to the true label, while
off-diagonal elements are those that are mislabeled by the classifier.
(e) K-fold cross-validation is a way to improve the training process: The data
set is divided into
k
subsets, and the method is repeated
k
times. Each
time, one of the
k
subsets is used as the test set and the other
k
-
1
subsets
are put together to form a training set. Then the average error across all
k
trials is computed. The advantage of this method is that it matters less
how the data gets divided. Every data point gets to be in a test set exactly
once, and gets to be in a training set
k
-
1
times.
The disadvantage of
this method is that the training algorithm has to be rerun from scratch
k
times, which means it takes
k
times as much computation to complete an
evaluation.
10
For this task, don’t use the
train
_
test
_
split
created earlier, instead use
the
KFold
11
class from the
scikit-learn
package to divide your dataset into
k
folds.
For each fold, train your MLP model on the training set and
evaluate its performance on the test set.
Calculate performance metrics
like accuracy, precision, and recall for each fold. After all folds have been
processed, compute the average performance across all folds.
Compare the average performance from cross-validation to the performance
you achieved with a single train/test split.
One option would be to code a loop for the number of loops, perform
training and testing, and then average the results. But
scikit-learn
has a
10
https://scikit-learn.org/stable/modules/cross
_
validation.html#cross-validation
11
https://scikit-learn.org/stable/modules/generated/sklearn.model
_
selection.KFold.html
11
helper function that can do this automatically for you,
cross
_
val
_
score
,
12
here using
accuracy:
from
sklearn
import
datasets
from
sklearn.neural
_
network
import
MLPClassifier
from
sklearn.model
_
selection
import
KFold
digits = datasets.load
_
digits()
# features matrix
n
_
samples =
len
(digits.images)
X = digits.images.reshape((n
_
samples, -1))
y = digits.target
mlp = MLPClassifier(hidden
_
layer
_
sizes=(100,), max
_
iter=1000, alpha=1e-4,
solver=
'sgd'
, verbose=
'true'
, random
_
state=1,
learning
_
rate
_
init=0.001)
# Perform 5-fold cross validation and compute accuracy scores
scores = cross
_
val
_
score(mlp, X, y, cv=10, scoring=
'accuracy'
)
print
(
"Accuracy for each fold:"
)
print
(scores)
print
(f
"Average Accuracy: {scores.mean()
*
100:.2f}%"
)
You can also compute multiple metrics using the
cross
_
validate
function:
scoring = [
'precision
_
macro'
,
'recall
_
macro'
,
'f1
_
macro'
,
'accuracy'
]
scores = cross
_
validate(mlp, X, y, cv=5, scoring=scoring, return
_
train
_
score=False)
# Print the results from each fold
for
metric, values
in
scores.items():
if
'test
_
'
in
metric:
print
(f
"{metric.replace('test
_
', '')}: {values}"
)
# Print the cross-fold results
for
key, values
in
scores.items():
print
(f
"{key}: {values.mean():.4f} (+/- {values.std()
*
2:.4f})"
)
When examining the cross-validation results, ensure you check for consis-
tent performance across folds. Significant variability could hint at under-
lying dataset issues or model sensitivities.
Also, while the average score
offers a broad overview, individual fold results can shed light on model ro-
bustness, possibly highlighting susceptibility to certain data splits, either
overfitting or underfitting.
12
https://scikit-learn.org/stable/modules/generated/sklearn.model
_
selection.cross
_
val
_
score.
html
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Recommended textbooks for you
- Operations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks ColeC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningC++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology Ptr
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengagePrinciples of Information Systems (MindTap Course...Computer ScienceISBN:9781285867168Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningFundamentals of Information SystemsComputer ScienceISBN:9781305082168Author:Ralph Stair, George ReynoldsPublisher:Cengage Learning

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning