Spring 2023 HW Solutions (6)

pdf

School

Boston University *

*We aren’t endorsed by this school

Course

503

Subject

Computer Science

Date

Apr 3, 2024

Type

pdf

Pages

7

Uploaded by LieutenantFog13651

Report
EC503 Introduction to Learning from Data Boston University Francesco Orabona Spring 2023 Homework 6 Solutions Important: Submit the code on Blackboard and anything else on Gradescope. Boosting Problem 6.1 Here, you will implement a boosted decision tree algorithm. (a) (5 points) Implements the AdaBoost algorithm, following the pseudo-code given in class. As weak learner, use a decision tree. Use the MATLAB command fitctree to train decision trees. Read its documentation to limit the maximum number of splits of the decision tree to 2. The prototype must be [alpha, DTCell, tr_err] = train_boosted_dt(Xtr, ytr, T) where Xtr is the input matrix, ytr is the vector of labels, T is number of boosting rounds, alpha is the vector of the coefficients α t found by AdaBoost, and DTCell is a cell containing the decision trees (study how to use cells in Matlab), and tr_err is the vector of training errors during the T boosting rounds of the classifier sign( t i =1 α i f i ). (b) (5 points) Read the documentation of predict and use it to implement the corresponding testing function: [ypred] = test_boosted_dt(Xte, alpha, DTCell) where ypred is the vector of the predicted classes, Xte is the matrix of the test samples, alpha is the vector of the coefficients α t found by AdaBoost, and DTCell is a cell containing the decision trees. (c) (3 points) Complete the code in traintest_bdt_a9a.m to try your implementations on the binary dataset Adult, attached to this assignment as a9a.mat (this time we use the entire dataset instead of a subset). (d) (2 points) Plot how test error and training error changes during the boosting procedure, for T = 1 , . . . , 500. Comment the results: Should the training error always go to zero in Boosting? How did you expect the test error to be with respect to the training error and the number of rounds of boosting? Does the theory predicts the results? (Note that this will be slow, around 5-10 mins.) 1
2
Solution: (a) See code. (b) See code. (c) See code. (d) See figure. The training error is guaranteed to reach zero with enough rounds of boosting only if the weak learners are successful. In this case, after a certain rounds of boosting, the weak learners will basically fail. Hence, the results do not contradict the theory. The k -Means algorithm Problem 6.2 We saw that the k -Means algorithm consists of the following steps: 1. Select k starting centers that are points from your data set. 2. Assign each data point to the cluster associated with the nearest of the k center points. 3. Re-calculate the centers as the mean vector of each cluster from (2). 4. Repeat steps (2) and (3) until convergence or the limit on the number of iterations is met. Here, we define convergence as no change in label assignment from one step to another. (a) (7 points) Implement the k -means algorithm as described above. The prototype should be [M,obj] = kmeans(X, k, T) where X is the input matrix and k the number of clusters, T the maximum number of iterations to use, M is the matrix containing the centers, and obj is the vector of the values of the objective function during the iterations. The initialization must be done through the k -means++ strategy we saw in class. Please count your iterations as follows: after 100 iterations, you should have assigned the points to the clusters 100 times. Hint: The matlab command repmat might help you to get a faster algorithm. (b) (1 point) Run the k -means algorithm on the training data of MNIST from Homework 5 with k = 10, T = 100. Plot the value of the objective function over the iterations. (This plot is mainly for you to debug your algorithm: the objective function has to decrease!) (c) (2 points) Run the k -means algorithm on the training data of MNIST from Homework 5 with T = 100. Record the objective function of k -means for k = 2, k = 4, k = 6, k = 8, k = 10, k = 12, and k = 14, repeat it for 10 random runs (because the initalization is random) and plot the average and standard deviations of the final value of the objective function for the different values of k . Comment the results: is it possible to have an objective function increasing with k ? Why? 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(d) (3 points) Let’s try to understand how good is k -means in recovering the clusters corre- sponding to the classes. We will use the cluster purity as a measure of success. For n classes, the cluster purity is defined as Purity = 1 m k X i =1 max t =1 ,...,n |{ j : j C i and y j = t }| where y j is the label of the sample j . Note that the curly brackets denotes a set and the bar outside of it is the cardinality of the set. In words, we associate to each cluster the label corresponding to the class with the majority of the points in the cluster. Hence, having associated a label to each cluster, we can count how many points are correctly classified. Note that the purity can only be calculated in retrospect using the labels associated to each sample, that were unknown during the clustering process. (I have to admit that cluster purity is not a very good measure, but it is an easy one to compute...) Record how the purity change with k = 2, k = 4, k = 6, k = 8, k = 10, k = 12, and k = 14, repeat it for 10 random initializations and plot the average and standard deviations of the purities for the different values of k . 4
5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Solution: (a) See code. (b) See figure. (c) See figure. Regarding the objective function, in principle it is possible for the k -means objective to go up with increasing k because the algorithm is not guaranteed to find the global optimum. So, given that the initialization is random, we might end up in a bad local minima with k + 1 that has a worse objective value than with k . (d) See figure. Code-submission via Blackboard: Create three dot-m files, train_boosted_dt.m , test_boosted_dt.m , traintest_bdt_a9a.m , kmeans.m . Place them in a single directory which should be zipped and uploaded into Blackboard. Your directory must be named as follows: <firstname> <lastname> hwX where X is the homework number. For example, if your name is John Doe then for homework number 6 you would submit a single directory named: John Doe HW6.zip which contains all the MATLAB code (and only the code). Reach-out to the TAs via Piazza and their office/discussion hours for questions related to coding. 7