lab06-sol

pdf

School

Concordia University *

*We aren’t endorsed by this school

Course

6721

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by CaptainMaskWildcat29

COMP 6721 Applied Artificial Intelligence (Fall 2023) Lab Exercise #6: Artificial Neural Networks Solutions Question 1 Given the training instances below, use scikit-learn to implement a Perceptron classifier 1 that classifies students into two categories, predicting who will get an ‘A’ this year, based on an input feature vector x . Here’s the training data again: Feature(x) Output f(x) Student ’A’ last year? Black hair? Works hard? Drinks? ’A’ this year? X1: Richard Yes Yes No Yes No X2: Alan Yes Yes Yes No Yes X3: Alison No No Yes No No X4: Jeff No Yes No Yes No X5: Gail Yes No Yes Yes Yes X6: Simon No Yes Yes Yes No Use the following Python imports for the perceptron: import numpy as np from sklearn.linear _ model import Perceptron All features must be numerical for training the classifier, so you have to trans- form the ‘Yes’ and ‘No’ feature values to their binary representation: # Dataset with binary representation of the features dataset = np.array([[1,1,0,1,0], [1,1,1,0,1], [0,0,1,0,0], [0,1,0,1,0], [1,0,1,1,1], [0,1,1,1,0],]) For our feature vectors, we need the first four columns: X = dataset[:, 0:4] and for the training labels, we use the last column from the dataset: 1 https://scikit-learn.org/stable/modules/linear _ model.html#perceptron 1

y = dataset[:, 4] (a) Now, create a Perceptron classifier (same approach as in the previous labs) and train it. Most of the solution is provided above. Here is the additional code required to create a Perceptron classifier and train it using the provided dataset: perceptron _ classifier = Perceptron(max _ iter=40, eta0=0.1, random _ state=1) perceptron _ classifier.fit(X,y) The parameters we’re using here are: max _ iter The maximum number of passes over the training data (aka epochs). It’s set to 40, meaning the dataset will be passed 40 times to the Perceptron during training. eta0 This is the learning rate, determining the step size during the weights update in each iteration. A value of 0.1 is chosen, which is a moderate learning rate. random _ state This ensures reproducibility of results. The classifier will produce the same output for the same input data every time it’s run, aiding in debugging and comparison. Try experimenting with these values, for example, by changing the number of iterations or learning rate. Make sure you understand the significance of setting random _ state . (b) Let’s examine our trained Perceptron in more detail. You can look at the weights it learned with: print ( "Weights: " , perceptron _ classifier.coef _ ) And the bias, here called intercept term, with: print ( "Bias: " , perceptron _ classifier.intercept _ ) The activation function is not directly exposed, but scikit-learn is using the step activation function. Now check how your Perceptron would classify a training sample by computing the net activation (input vector × weights + bias) and applying the step function. You can use the following code to compute the net activation on all training data samples and compare this with your results: net _ activation = np.dot(X, perceptron _ classifier.coef _ .T) + → perceptron _ classifier.intercept _ print (net _ activation) 2

Remember that the step activation function classifies a sample as 1 if the net activation is non-negative and 0 otherwise. So, if a net activation is non-negative, the perceptron’s step function would classify it as 1, and otherwise, it would classify it as 0. (c) Apply the trained model to all training samples and print out the predic- tion. This works just like for the other classifiers we used before: y _ pred = perceptron _ classifier.predict(X) print (y _ pred) This will print the classification results like: [0 1 0 0 1 0] Compare the predicted labels with the actual labels from the dataset. How many predictions match the actual labels? What does this say about the performance of our classifier on the training data? 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Question 2 Consider the neural network shown below. It consists of 2 input nodes, 1 hidden node, and 2 output nodes, with an additional bias at the input layer (attached to the hidden node) and a bias at the hidden layer (attached to the output nodes). All nodes in the hidden and output layers use the sigmoid activation function ( σ ). (a) Calculate the output of y1 and y2 if the network is fed x = (1 , 0) as input. h in = b h + w x 1 - h x 1 + w x 2 - h x 2 = (0 . 1) + (0 . 3 × 1) + (0 . 5 × 0) = 0 . 4 h = σ ( h in ) = σ (0 . 4) = 1 1 + e - 0 . 4 = 0 . 599 y 1 ,in = b y 1 + w h - y 1 h = 0 . 6 + (0 . 2 × 0 . 599) = 0 . 72 y 1 = σ (0 . 72) = 1 1 + e - 0 . 72 = 0 . 673 y 2 ,in = b y 2 + w h - y 2 h = 0 . 9 + (0 . 2 × 0 . 599) = 1 . 02 y 2 = σ (1 . 22) = 1 1 + e - 1 . 02 = 0 . 735 As a result, the output is calculated as y = ( y 1 , y 2) = (0 . 673 , 0 . 735) . (b) Assume that the expected output for the input x = (1 , 0) is supposed to be t = (0 , 1) . Calculate the updated weights after the backpropagation of the error for this sample. Assume that the learning rate η = 0 . 1 . δ y 1 = y 1 (1 - y 1 )( y 1 - t 1 ) = 0 . 673(1 - 0 . 673)(0 . 673 - 0) = 0 . 148 δ y 2 = y 2 (1 - y 2 )( y 2 - t 2 ) = 0 . 735(1 - 0 . 735)(0 . 735 - 1) = - 0 . 052 4

δ h = h (1 - h ) i =1 , 2 w h - y i δ y i = 0 . 599(1 - 0 . 599)[0 . 2 × 0 . 148+0 . 2 × ( - 0 . 052)] = 0 . 005 Δ w x 1 - h = - ηδ h x 1 = - 0 . 1 × 0 . 005 × 1 = - 0 . 0005 Δ w x 2 - h = - ηδ h x 2 = - 0 . 1 × 0 . 005 × 0 = 0 Δ b h = - ηδ h = - 0 . 1 × 0 . 005 = - 0 . 0005 Δ w h - y 1 = - ηδ y 1 h = - 0 . 1 × 0 . 148 × 0 . 599 = - 0 . 0088652 Δ b y 1 = - ηδ y 1 = - 0 . 1 × 0 . 148 = - 0 . 0148 Δ w h - y 2 = - ηδ y 2 h = - 0 . 1 × ( - 0 . 052) × 0 . 599 = 0 . 0031148 Δ b y 2 = - ηδ y 2 = - 0 . 1 × ( - 0 . 052) = 0 . 0052 w x 1 - h,new = w x 1 - h + Δ w x 1 - h = 0 . 3 + ( - 0 . 0005) = 0 . 2995 w x 2 - h,new = w x 2 - h + Δ w x 2 - h = 0 . 5 + 0 = 0 . 5 b h,new = b h + Δ b h = 0 . 1 + ( - 0 . 0005) = 0 . 0995 w h - y 1 ,new = w h - y 1 + Δ w h - y 1 = 0 . 2 + ( - 0 . 0088652) = 0 . 1911348 b y 1 ,new = b y 1 + Δ b y 1 = 0 . 6 + ( - 0 . 0148) = 0 . 5852 w h - y 2 ,new = w h - y 2 + Δ w h - y 2 = 0 . 2 + 0 . 0031148 = 0 . 2031148 b y 2 ,new = b y 2 + Δ b y 2 = 0 . 9 + 0 . 0052 = 0 . 9052 5

Question 3 Let’s see how we can build multi-layer neural networks using scikit-learn . 2 (a) Implement the architecture from the previous question using scikit-learn and use it to learn the XOR function, which is not linearly separable. Use the following Python imports: import numpy as np from sklearn.neural _ network import MLPClassifier Here is the training data for the XOR function: dataset = np.array([[1,1,0], [0,1,1], [1,0,1], [0,0,0]]) For our feature vectors, we need the first two columns: X = dataset[:, 0:2] and for the training labels, we use the last column from the dataset: y = dataset[:, 2] Now you can create a multi-layer Perceptron using scikit-learn ’s MLP (multi- layer perceptron) classifier. 3 There are a lot of parameters you can choose to define and customize, here you need to define the hidden _ layer _ sizes . For this parameter, you pass in a tuple consisting of the number of neu- rons you want at each layer, where the n th entry in the tuple represents the number of neurons in the n th layer of the MLP model. You also need to set the activation to ‘logistic’, which is the logistic Sigmoid function. The bias and weight details are implicitly defined in the function definition. Using the code blocks provided above, you can create the network and train it on the XOR dataset with: mlp = MLPClassifier(hidden _ layer _ sizes=(1,),activation= 'logistic' ) mlp.fit(X, y) (b) Now apply the trained model to all training samples and print out its prediction. y _ pred = mlp.predict(X) print (y _ pred) As you see, our single hidden layer with a single neuron doesn’t perform well on learning XOR. It’s always a good idea to experiment with different network configurations. Try to change the number of hidden neurons to find a solution! 2 https://scikit-learn.org/stable/modules/neural _ networks _ supervised.html 3 https://scikit-learn.org/stable/modules/generated/sklearn.neural _ network.MLPClassifier.html 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

With a single hidden neuron, it can converge in theory, but it is difficult in practice, highly depending on initial weights and other hyperparameters. With two neurons in the hidden layer, it’s possible but not guaranteed to find a solution. The success of training depends on the weight initialization and the optimization algorithm’s ability to find a suitable combination of weights. Often, it may get stuck in local minima. Using three neurons in the hidden layer increases the representational ca- pacity of the network, making it more likely to converge to a solution for the XOR problem, Try: mlp = MLPClassifier(hidden _ layer _ sizes=(3,), activation= 'logistic' , → solver= 'lbfgs' , max _ iter=100000, random _ state=42) 7

Question 4 Create a multi-layer Perceptron and use it to classify the MNIST digits dataset, containing scanned images of hand-written numerals: 4 (a) Load MNIST from scikit-learn ’s builtin datasets. 5 Like before, use the train _ test _ split 6 helper function to split the digits dataset into a train- ing and testing subset. Create a multi-layer Perceptron, like in the pre- vious question and train the model. Pay attention to the required size of the input and output layers and experiment with different hidden layer configurations. import numpy as np from sklearn import datasets from sklearn.neural _ network import MLPClassifier from sklearn.model _ selection import train _ test _ split from sklearn.metrics import accuracy _ score, confusion _ matrix, ConfusionMatrixDisplay from sklearn.metrics import precision _ score, recall _ score import matplotlib.pyplot as plt MNIST digits is another built-in dataset in scikit-learn . First load the dataset. Since it contains two-dimensional image data, you need to flatten it, so it can be presented to our neural network as input: digits = datasets.load _ digits() # 2D images in feature matrix n _ samples = len (digits.images) # number of samples data = digits.images.reshape((n _ samples, -1)) # flatten 2D images into 1D The third line above “flattens” the 2D image arrays, so that the resulting data contains a 1D-vector for each image. Thus, data now contains one row for each image in the dataset, with one column for each pixel in those images and its value representing a gray scale pixel in the image. Create training and test splits (reserving 30% of the data for testing and using the rest of it for training): 4 https://en.wikipedia.org/wiki/MNIST _ database 5 https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load _ digits.html 6 https://scikit-learn.org/stable/modules/generated/sklearn.model _ selection.train _ test _ split. html 8

X _ train, X _ test, y _ train, y _ test = train _ test _ split( data, digits.target, test _ size=0.3, shuffle=False) Finally, train a neural network that can actually make predictions with: mlp = MLPClassifier(hidden _ layer _ sizes=(100,), max _ iter=1000, alpha=1e-4, solver= 'sgd' , verbose= 'true' , random _ state=1, learning _ rate _ init=0.001) mlp.fit(X _ train, y _ train) (b) Now run an evaluation to compute the performance of your model using scikit-learn ’s 7 accuracy score. You can evaluate the model with: y _ pred = mlp.predict(X _ test) print ( 'Accuracy: %.2f' % accuracy _ score(y _ test, y _ pred)) Bonus visualization: If you want to print out some example images from the test set with their predicted label, you can use the code below: # Randomly select 10 images and print them with their predicted labels n, m = 2, 5 random _ indices = np.random.choice(X _ test.shape[0], n * m, replace=False) selected _ images = X _ test[random _ indices] selected _ predictions = y _ pred[random _ indices] # Plot the selected images with their predictions in a 2x5 matrix plt.figure(figsize=(10, 4)) for i in range (n): for j in range (m): idx = i * m + j plt.subplot(n, m, idx + 1) plt.imshow(selected _ images[idx].reshape((8, 8)), cmap= 'gray' ) plt.title(f 'Predicted: {selected _ predictions[idx]}' ) plt.axis( 'off' ) plt.tight _ layout() plt.show() (c) In any classification task, whether binary or multi-class, it’s crucial to assess how well the model is doing. Precision and recall are commonly used metrics for this purpose. For binary classification, their computation is straightforward. However, when we move to multi-class problems, the landscape becomes more complex. This is where micro and macro averag- ing come in, and they provide two different perspectives: 7 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy _ score.html 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Micro-Averaging: This method gives a global view. It pools together the individual true positives, false negatives, and false positives across all classes, effectively treating the multi-class problem as a single bi- nary classification. It provides an overall sense of how the model is performing, without differentiating between classes. Macro-Averaging: This method breaks down the performance by class. It calculates precision and recall for each class separately and then av- erages them. This means every class, regardless of its size, has an equal say in the final score. It’s useful for understanding the model’s perfor- mance on individual classes, especially when there are imbalances in class sizes. Both of these methods are standard in the field of machine learning and not specific to any particular library, including scikit-learn . They offer complementary perspectives: while micro-averaging might show how well the model performs overall, macro-averaging can highlight if it’s struggling with any particular class. Run an evaluation on your results and compute the precision and recall score with micro and macro averaging, using scikit-learn ’s precision _ score 8 and recall _ score . 9 Make sure you compute these on your test set! pre _ macro = precision _ score(y _ test, y _ pred, average= 'macro' ) pre _ micro = precision _ score(y _ test, y _ pred, average= 'micro' ) recall _ macro = recall _ score(y _ test, y _ pred, average= 'macro' ) recall _ micro = recall _ score(y _ test, y _ pred, average= 'micro' ) Here, the micro and macro averages are very similar, as the classes in this dataset are mostly balanced. If one class has significantly fewer samples, macro-averaging will give you a sense of how well the model performs on that specific class compared to the others. (d) Use the confusion matrix implementation from the scikit-learn package to visualize your classification performance. The confusion matrix provides a more detailed breakdown of a classifier’s performance, allowing you to see not just where it got things right, but where mistakes are being made. Each row in the matrix represents the true classes, while each column represents the predicted classes. It’s a powerful tool to understand misclassifications, especially in multi-class problems. cm = confusion _ matrix(y _ test, y _ pred) ConfusionMatrixDisplay(cm, digits.target _ names).plot() plt.show() You should get an output similar to the following: 8 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision _ score.html 9 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall _ score.html 10

By examining the heatmap, you can quickly identify which classes the model is confusing with others. The diagonal elements represent the num- ber of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. (e) K-fold cross-validation is a way to improve the training process: The data set is divided into k subsets, and the method is repeated k times. Each time, one of the k subsets is used as the test set and the other k - 1 subsets are put together to form a training set. Then the average error across all k trials is computed. The advantage of this method is that it matters less how the data gets divided. Every data point gets to be in a test set exactly once, and gets to be in a training set k - 1 times. The disadvantage of this method is that the training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to complete an evaluation. 10 For this task, don’t use the train _ test _ split created earlier, instead use the KFold 11 class from the scikit-learn package to divide your dataset into k folds. For each fold, train your MLP model on the training set and evaluate its performance on the test set. Calculate performance metrics like accuracy, precision, and recall for each fold. After all folds have been processed, compute the average performance across all folds. Compare the average performance from cross-validation to the performance you achieved with a single train/test split. One option would be to code a loop for the number of loops, perform training and testing, and then average the results. But scikit-learn has a 10 https://scikit-learn.org/stable/modules/cross _ validation.html#cross-validation 11 https://scikit-learn.org/stable/modules/generated/sklearn.model _ selection.KFold.html 11

helper function that can do this automatically for you, cross _ val _ score , 12 here using accuracy: from sklearn import datasets from sklearn.neural _ network import MLPClassifier from sklearn.model _ selection import KFold digits = datasets.load _ digits() # features matrix n _ samples = len (digits.images) X = digits.images.reshape((n _ samples, -1)) y = digits.target mlp = MLPClassifier(hidden _ layer _ sizes=(100,), max _ iter=1000, alpha=1e-4, solver= 'sgd' , verbose= 'true' , random _ state=1, learning _ rate _ init=0.001) # Perform 5-fold cross validation and compute accuracy scores scores = cross _ val _ score(mlp, X, y, cv=10, scoring= 'accuracy' ) print ( "Accuracy for each fold:" ) print (scores) print (f "Average Accuracy: {scores.mean() * 100:.2f}%" ) You can also compute multiple metrics using the cross _ validate function: scoring = [ 'precision _ macro' , 'recall _ macro' , 'f1 _ macro' , 'accuracy' ] scores = cross _ validate(mlp, X, y, cv=5, scoring=scoring, return _ train _ score=False) # Print the results from each fold for metric, values in scores.items(): if 'test _ ' in metric: print (f "{metric.replace('test _ ', '')}: {values}" ) # Print the cross-fold results for key, values in scores.items(): print (f "{key}: {values.mean():.4f} (+/- {values.std() * 2:.4f})" ) When examining the cross-validation results, ensure you check for consis- tent performance across folds. Significant variability could hint at under- lying dataset issues or model sensitivities. Also, while the average score offers a broad overview, individual fold results can shed light on model ro- bustness, possibly highlighting susceptibility to certain data splits, either overfitting or underfitting. 12 https://scikit-learn.org/stable/modules/generated/sklearn.model _ selection.cross _ val _ score. html 12

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

lab06-sol

Related Documents