CS4780_HW1

pdf

School

Cornell University *

*We aren’t endorsed by this school

Course

4780

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by MinisterGalaxy9309

CS 4780/5780 Homework 1 Due: Tuesday 02/14/23 11:59pm on Gradescope Note: For homework, you can work in teams of up to 3 people. Please include your team- mates’ NetIDs and names on the front page and form a group on Gradescope. (Please answer all questions within each paragraph). Please show all the relevant steps in your solutions to get full credit. Problem 1: Machine Learning Basics 1. Explain why in each of the following scenarios, the trained model may not perform well in testing or deployment. (a) Using a model trained on temperature data from California to predict tomorrow’s temperature in Ithaca. (b) You have the medical record of patients between 21 and 75 years of age, sorted in ascending order by age. You decide to use the first 80% of the data for training, the next 10% of the data for validation, and the last 10% of the data as the test set. (c) Using email data a model is trained to classify emails between spam and not-spam. The data was shuffled uniformly at random and split into 80% of data for training, 10% for validation and 10% for testing. 2. Remember from class that loss functions should always be non-negative and ideally also continuous and differentiable. For the following loss functions, identify if it is a) always non-negative, b) always continuous, c) always differentiable. Briefly justify your answer. (a) L ( h ) = 1 n n X i =1 ( h ( x i ) − y i ) 2 (b) L ( h ) = 1 n n X i =1 | h ( x i ) − y i | (c) L ( h ) = 1 n n X i =1 cos( h ( x i ) − y i ) 3. Consider a training data set D = { ( x i , y i ) : 1 ≤ i ≤ n } . Remember the hypothesis class we discussed in class where each individual hypothesis returns a constant value: h z ∈ H s.t. h z ( x ) = z ∀ z ∈ Q ≥ 0 That is, z can be any rational number greater than or equal to 0. 1

(a) Consider the squared loss: L ( h ) = 1 n n X i =1 ( h ( x i ) − y i ) 2 Prove that the function h m ( x ) minimizes the squared loss when we have that m = 1 n n X i =1 y i (b) Suppose we have the following training data set x (feature) y (label) 5 1 -6 500 2 3 Using the absolute loss, show that h 3 gives the minimum loss on D among all functions h z ∈ H . Problem 2: K-nearest Neighbors/Curse of Dimensionality 1. Robin has been given a dataset and will soon receive many more data points to classify. Unfortunately, Robin has never taken CS 4780 and does not know how to classify the points. However, you are taking CS 4780 and Robin decides to enlist your help. The initial dataset is as follows: • Positive examples (label +1): { (3 , 1) , (4 , 4) , (5 , 2) } • Negative examples (label − 1): { (3 , 2) , (2 , 3) } . Suppose the data lies in the grid [0 , 5] × [0 , 5]. The positive points are labeled in blue and the negative points in red in the following figure. Draw the decision boundary for a 1-NN classifier with the Euclidean distance. 2

2. Robin’s boss would like to find out how accurate the model is before deploying it. What proportion of the points in the original dataset would be classified correctly by a k-NN algorithm where k = 3? 3. Briefly describe the curse of dimensionality. Can k -NN still function well on some high-dimensional data sets (such as images of faces or handwritten digits)? Explain why or why not. 4. Suppose there are n training points x 1 , ..., x n ∈ R d . What is the time complexity of 1-NN (with k = 1) using the Euclidean distance during testing time (with a single test point) in terms of n and d? Explain. Problem 3: Perceptron 1. At which marvelous university was Perceptron invented? State the assumptions that Perceptron makes about the Data. 2. Suppose you have a fully converged Perceptron with weight vector w trained on D T R and with a separate test data D T E . Here, x ∈ R and y ∈ {− 5 , +5 } . Your lovely boss, Kilian, wants to know if the test error is greater than the training error. Can you evaluate this without having to look at all of the D T R ? How about D T E ? 3. Your friend Ambrose, a brilliant Machine Learning engineer, trained a Perceptron on a massive dataset (over 10 trillion training examples). Unfortunately, he forgot to train the final weight vector. The earth cannot afford to train another such model but Ambrose desperately needs this model to help adopt thousands of rescue puppies. Ambrose (and the puppies, AND THE EARTH) is dependent on your expertise in Perceptron. Thankfully, while training, Ambrose managed to save the training examples which were used to make the update at each update step. Surprisingly, only five examples were ever misclassified, which are listed below, along with the number of times those were misclassified. Training Example Label Number of Times Misclassified (4,0,0,0,0) +1 2 (0,0,6,2,0) +1 1 (0,0,0,0,3) -1 1 (0,6,3,4,0) -1 1 (5,2,0,1,0) -1 1 Based on the debug output above and assuming that your initial weight vector w 0 was instantiated as the zero vector (0 , 0 , 0 , 0 , 0), what is the final weight vector of the perceptron (i.e. the weight vector that would have been saved if the code had not malfunctioned). 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

hw_week9.docx

hw_week9 (DO NOT SUBMIT THIS FILE ).docx

Homework_8 (DO NOT SUBMIT WITH THIS FILE) .docx

Assignment08.pdf

Assignment Week 5.docx

CS4780-HW5_Solution.pdf

prelim1-20fa-soln (3).pdf

prelim2-19fa.pdf

prelim1-20fa.pdf

prelim1-19fa.pdf

Discussion Unit4.docx

Interview_questions_Capgemini.pdf

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Microsoft Visual C#

Computer Science

ISBN:9781337102100

Author:Joyce, Farrell.

Publisher:Cengage Learning,

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Microsoft Windows 10 Comprehensive 2019

Computer Science

ISBN:9780357392607

Author:FREUND

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781285196145
Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel
Publisher:Cengage Learning
Microsoft Windows 10 Comprehensive 2019
Computer Science
ISBN:9780357392607
Author:FREUND
Publisher:Cengage

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Microsoft Visual C#

Computer Science

ISBN:9781337102100

Author:Joyce, Farrell.

Publisher:Cengage Learning,

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Microsoft Windows 10 Comprehensive 2019

Computer Science

ISBN:9780357392607

Author:FREUND

Publisher:Cengage

SEE MORE TEXTBOOKS