COMP6321_ML_Assignment2

pdf

School

Concordia University *

*We aren’t endorsed by this school

Course

6321

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

9

Uploaded by PrivateWaterBuffalo17358

Report
COMP6321 Machine Learning (Fall 2023) Major Assignment #2 Due: 11:59PM, November 30th, 2023 Note Your will be submitting two separate files from this assignment as follows: (a) One(1) .pdf file: containing answers to the questions as well as reported results from coding you develop. Include snapshots of the pieces of code you developed in the appendix. Failure to do so will incur a -5% penalty. (b) One(1) .zip folder: containing all developed Python codes including a README.txt file explaining how to run your code. 1
Theoretical Questions Question 1 Consider the following architecture of a simple Recurrent Neural Network (RNN): In this network, all the inputs, hidden states, and weights are assumed to be 1D. This network is to be used for a many-to-one regression task. The network produces an output y at the last time step L , which is given as y = f ( w hy h L ). For the following questions, assume that no nonlinearities are used (i.e. no activation functions). (a) Write a general expression for h t in terms of w in , w hh , x t , and h t 1 , where t is the timestep. (b) Given the sequential input x = [ 0 . 9 , 0 . 8 , 0 . 7 ] , the initial hidden state h 0 = 0 , and all weights initialized to 0 . 5 , compute h 1 , h 2 , h 3 , and y . (c) You want to perform Backpropagation Through Time (BPTT) , which is used to update the weights of RNN networks. Given a target output y r , Assume that you are using the function l = 1 2 ( y y r ) 2 to calculate the loss. The goal is to compute δl δw in , δl δw hh , and δl δw hy , and use them in the typical update rule given as w i w i η δl δw i . Find the expressions for δl δw in , δl δw hh , and δl δw hy . Your expressions can include only the following: h 0 , w in , w hh , w hy , and x i for i [ 1 , 2 , 3 ] . Show your work. (hint: for δl δw in and δl δw hh , you need to consider the sum across all timesteps) (d) Assume that the target for the previously given data sequence is y r = 0 . 8 , with a learning rate of η = 0 . 1 . Calculate the updated value of each weight. Question 2 Consider a Support Vector Machine (SVM) problem in 1D settings (with one weight and one bias). For the following questions, assume we have the dataset D = {( x 1 ,t 1 ) , ( x 2 ,t 2 ) , ( x 3 ,t 3 )} = {( 3 , 1 ) , ( 5 , 1 ) , ( 2 , 1 )} . (a) Define the optimization problem (objective function and constraints) of SVM based on margin maximization. (b) Solve the optimization problem using the graphical method. Explain your work. Question 3 Consider the following dataset used to train a decision tree to predict if the weather is good for playing outside. Use the dataset to answer the following questions. Show your work. 2
Outlook Temperature Windy Play / Don’t Play sunny 85 false Play sunny 80 true Don’t Play overcast 83 false Play rain 70 false Play rain 68 false Don’t Play rain 65 true Don’t Play overcast 64 true Play sunny 72 false Don’t Play sunny 69 false Play rain 75 false Play sunny 75 true Don’t Play overcast 72 true Play overcast 81 false Play rain 71 true Don’t Play (a) Using Gini impurity, determine the best splitting threshold for Tempera- ture, out of the following values: [65, 70, 75, 80]. (b) Analyze the impurity of the three features and determine the best feature for splitting. (c) Finalize your decision tree. At each node, repeat the two previous steps to determine the best splitting feature and, if applicable, its threshold. Use a maximum depth of 4 (first split occurs at depth = 1). Question 4 Consider the following dataset of unlabeled points: Point X-coordinate Y-coordinate P1 9 1 P2 1 1 P3 9 2 P4 8 1 P5 9 20 P6 2 2 P7 8 2 P8 1 2 P9 2 1 Your goal is to use K-means and K-medoids to cluster these data into two clusters. Use the Euclidean Distance as the distance measure. (a) Using P1 and P2 as the starting centroids, perform three iterations of K-means to cluster all the data points. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(b) K-medoids is a variation of K-means where the cluster heads (medoids) are always chosen from the available data points. A cluster medoid, in each iteration, is given as the point with lowest sum of distances to all other points in the cluster (this is one of the several methods to compute the medoids). Using P1 and P2 as the starting medoids, perform three iterations of K-medoids to cluster all the data points. (c) Compare the performance of K-means and K-medoids. Reflect on the distribution of the data points and their effect on the clustering and cluster heads in K-means and K-medoids. Question 5 Consider the following dataset for the prediction of a customer’s likelihood to purchase a certain product based on historical activities. Online Activity Product Views Past Purchases Purchase Likelihood Low Low None Unlikely Medium Moderate Few Moderate High High Many Likely Low Low Many Moderate High Moderate None Likely Medium Low None Unlikely Low High Few Moderate High Moderate Few Likely Medium Moderate Many Likely Low Low Few Unlikely High High None Moderate Medium Low Many Moderate Medium High Few Moderate High Low Many Likely Low Moderate None Unlikely Medium High None ????? High High Few ????? Low Moderate Many ????? Your goal is to build a Naive Bayes classifier based on the labeled data, and use it to classify the new unlabeled data points. (a) For each feature, build the frequency and likelihood tables. (b) Compute the prior probabilities for each class. (c) Classify each of the three new data points to the appropriate class. Show your work and the steps used to identify the suitable class. Question 6 Consider the following dataset of 5 data points. Each data point has two features (a, b) and a class label { 1 , 1 } . 4
Data Point a b Label P1 1 1 -1 P2 -1 -1 -1 P3 1 -1 1 P4 -1 1 1 P5 0.5 -0.25 1 Consider the following, relatively simple, models that classify a data point to class [-1] if the condition is met, and to class [1] otherwise: (1) a 0 (2) b 0 (3) a + b 0 . 5 (4) a + b 0 . 5 (5) a + b 0 . 5 Your goal is to use AdaBoost to find an ultimate model that combines some of these models. During the process, if two models have similar performances, then the model that comes earlier in the list above should be chosen. (a) Perform AdaBoost for 5 steps. In each step, show all the work done (table formats are recommended). (b) What is the final prediction model, which combines the outcomes of the 5 steps? (c) Assess the performance of the final prediction model on the given dataset in terms of classification accuracy. Implementation Questions Question 1 In this question, you will explore the concept of Transfer Learning (TL) using PyTorch. TL is a technique in which previously trained models can be used to help train new models. There are two possible ways to implemented TL, namely Fine-tuning and Feature Extraction. You can read more about these methods by referring to PyTorch documentations . For the following questions, you are to use the Brain Tumor classification dataset available on Kaggle . Sample images are shown below for the 4 different classes. 5
(a) Normal (b) Glioma Tumor (c) Meningioma Tumor (d) Pituitary Tumor (a) Analyze and visualize the statistics of the dataset. Pre-process the data and prepare them for the training phase. Ensure that the images are resized to 224x224x3 and normalized. Split the data randomly into train and test sets, with a ratio of 7:3. (b) Train a ResNet-18 model from scratch using the provided dataset for the classification task. You are free to choose the hyperparameters (batch size, learning rate, optimizer, loss function, etc). (c) Train another ResNet-18 model using the fine-tuning TL method based on IMAGENET1K weights. You should use the same hyperparameters and same data used to train the previous ResNet-18 model. (d) Report and compare the performance of both models in terms of training accuracy/loss and classification report. What are your conclusions? Question 2 Build an LSTM-based model for time-series forecasting using PyTorch. Given a series of data points, the model should be able to predict the next data point. You should use the Amazon Stock dataset for this task, where the aim is to use the previous data points to predict the next stock value. (a) We will only use the "close" column to train our model, hence you should remove the remaining columns from the dataset. Additionally, since our prediction is based on the historical trend, each data point (each row) in the dataset should be in the form of sequence -> prediction. Assume we want to use a sequence window of size 10 in this problem. Preprocess the dataset such that each sequence of 10 values is used to predict the 11th value. You can refer to this article for some examples on how to preprocess the data into sequences. (hint: if your original dataset has 100 data points, then the preprocessed dataset would have 100-10=90 data points/sequences). (b) Preprocess the data using the MinMaxScaler from scikit-learn. There is no need to split or shuffle the data for this question. (c) Train a vanilla LSTM to forecast the price of amazon stock based on the last 10 values. You can refer to this article for some guidance. You are free to choose and optimize the hyperparameters (optimizer, batch size, etc), but you should use the MSE loss. Aim for better performance. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(d) Analyze the performance of your model by plotting the training loss. Using the available data, compare the original plot of the data points with the predicted plot (each data point on the predicted plot is obtained by feeding the previous 10 data points from the original data). Question 3 In this question, you are tasked with implementing color clustering for an image using k-means from scikit-learn . Consider the following image of a bird (you have to use the original "bird.png" image attached with the assignment files): (a) Normalize the image through division by the maximum possible value. (b) Plot the original normalized image as well as the color space of the image. You can refer to the Appendix for a sample function to plot color spaces. (c) Apple k-means clustering with K=5 and K=10 to the image colors. For each case, plot the color space of the centroids and their corresponding colors (each plot should have K colored points). (d) For each K, recolor the image according to the centroids, where each pixel takes the color of its cluster centroid. For each K, plot the color space showing the centroids and the distribution of recolored pixels. (e) Plot the recolored image, for each K, showing the bird in the new clustered colors. (f) Compute the MSE (using scikit-learn) between the original normalized image and the recolored image, for each K. Comment on the results. Question 4 Consider the lung cancer prediction dataset available on Kaggle , used for a clas- sification task to determine if the patient has cancer or not. You are required to use scikit-learn for the ML models in the following questions. (a) Preprocess the data by converting textual features into numerical values. Split the dataset into train and test sets with a ratio of 7:3. 7
(b) Train a K-nearest neighbors (KNN) model for the classification task with K=10. Study the performance on the training and testing sets using the classification report. (c) Train a Support Vector Machine (SVM) model for the classification task. Study the performance on the training and testing sets using the classifi- cation report. (d) Train a Gaussian Naive Bayes (GaussianNB) model for the classification task. Study the performance on the training and testing sets using the classification report. (e) Train a Decision Tree model for the classification task. Study the perfor- mance on the training and testing sets using the classification report. (f) Train an AdaBoost model for the classification task. Study the perfor- mance on the training and testing sets using the classification report. In scikit-learn, when using the default AdaBoost model, what is the base estimator? (g) Discuss and compare the performance of the trained models. 8
Appendix You can use the following function to plot the colour space of an image. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help