CS4780_HW1

pdf

School

Cornell University *

*We aren’t endorsed by this school

Course

4780

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

3

Uploaded by MinisterGalaxy9309

Report
CS 4780/5780 Homework 1 Due: Tuesday 02/14/23 11:59pm on Gradescope Note: For homework, you can work in teams of up to 3 people. Please include your team- mates’ NetIDs and names on the front page and form a group on Gradescope. (Please answer all questions within each paragraph). Please show all the relevant steps in your solutions to get full credit. Problem 1: Machine Learning Basics 1. Explain why in each of the following scenarios, the trained model may not perform well in testing or deployment. (a) Using a model trained on temperature data from California to predict tomorrow’s temperature in Ithaca. (b) You have the medical record of patients between 21 and 75 years of age, sorted in ascending order by age. You decide to use the first 80% of the data for training, the next 10% of the data for validation, and the last 10% of the data as the test set. (c) Using email data a model is trained to classify emails between spam and not-spam. The data was shuffled uniformly at random and split into 80% of data for training, 10% for validation and 10% for testing. 2. Remember from class that loss functions should always be non-negative and ideally also continuous and differentiable. For the following loss functions, identify if it is a) always non-negative, b) always continuous, c) always differentiable. Briefly justify your answer. (a) L ( h ) = 1 n n X i =1 ( h ( x i ) y i ) 2 (b) L ( h ) = 1 n n X i =1 | h ( x i ) y i | (c) L ( h ) = 1 n n X i =1 cos( h ( x i ) y i ) 3. Consider a training data set D = { ( x i , y i ) : 1 i n } . Remember the hypothesis class we discussed in class where each individual hypothesis returns a constant value: h z ∈ H s.t. h z ( x ) = z z Q 0 That is, z can be any rational number greater than or equal to 0. 1
(a) Consider the squared loss: L ( h ) = 1 n n X i =1 ( h ( x i ) y i ) 2 Prove that the function h m ( x ) minimizes the squared loss when we have that m = 1 n n X i =1 y i (b) Suppose we have the following training data set x (feature) y (label) 5 1 -6 500 2 3 Using the absolute loss, show that h 3 gives the minimum loss on D among all functions h z ∈ H . Problem 2: K-nearest Neighbors/Curse of Dimensionality 1. Robin has been given a dataset and will soon receive many more data points to classify. Unfortunately, Robin has never taken CS 4780 and does not know how to classify the points. However, you are taking CS 4780 and Robin decides to enlist your help. The initial dataset is as follows: Positive examples (label +1): { (3 , 1) , (4 , 4) , (5 , 2) } Negative examples (label 1): { (3 , 2) , (2 , 3) } . Suppose the data lies in the grid [0 , 5] × [0 , 5]. The positive points are labeled in blue and the negative points in red in the following figure. Draw the decision boundary for a 1-NN classifier with the Euclidean distance. 2
2. Robin’s boss would like to find out how accurate the model is before deploying it. What proportion of the points in the original dataset would be classified correctly by a k-NN algorithm where k = 3? 3. Briefly describe the curse of dimensionality. Can k -NN still function well on some high-dimensional data sets (such as images of faces or handwritten digits)? Explain why or why not. 4. Suppose there are n training points x 1 , ..., x n R d . What is the time complexity of 1-NN (with k = 1) using the Euclidean distance during testing time (with a single test point) in terms of n and d? Explain. Problem 3: Perceptron 1. At which marvelous university was Perceptron invented? State the assumptions that Perceptron makes about the Data. 2. Suppose you have a fully converged Perceptron with weight vector w trained on D T R and with a separate test data D T E . Here, x R and y ∈ {− 5 , +5 } . Your lovely boss, Kilian, wants to know if the test error is greater than the training error. Can you evaluate this without having to look at all of the D T R ? How about D T E ? 3. Your friend Ambrose, a brilliant Machine Learning engineer, trained a Perceptron on a massive dataset (over 10 trillion training examples). Unfortunately, he forgot to train the final weight vector. The earth cannot afford to train another such model but Ambrose desperately needs this model to help adopt thousands of rescue puppies. Ambrose (and the puppies, AND THE EARTH) is dependent on your expertise in Perceptron. Thankfully, while training, Ambrose managed to save the training examples which were used to make the update at each update step. Surprisingly, only five examples were ever misclassified, which are listed below, along with the number of times those were misclassified. Training Example Label Number of Times Misclassified (4,0,0,0,0) +1 2 (0,0,6,2,0) +1 1 (0,0,0,0,3) -1 1 (0,6,3,4,0) -1 1 (5,2,0,1,0) -1 1 Based on the debug output above and assuming that your initial weight vector w 0 was instantiated as the zero vector (0 , 0 , 0 , 0 , 0), what is the final weight vector of the perceptron (i.e. the weight vector that would have been saved if the code had not malfunctioned). 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help