midterm_solution

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

M146

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by tachyonphantom

CS M146 Midterm Exam 4 2. [8 points total] k -Nearest Neighbors In the following questions, you will consider k -nearest neighbor classifiers with two different distance metrics on a binary classification task. Below is the figure for the dataset, which is divided into positive and negative classes. Note that the annotated data points (from 1 to 5) belong to the test set, and the rest belong to the training set. All points are positioned on grid intersections. Figure 1: Dataset for k NN binary classification task. (a) [3 points] Given that the k NN classifier uses Euclidean distance as the distance metric, calculate the test error for k = 1 for the data points marked as 1 , 2 , 3 , 4 , 5 . The test error is defined as the fraction of test points that are misclassified. What class would each point in the test set be assigned? Fill in the table below. Note that a test point cannot be its own nearest neighbor as neighbors must belong to the training set. Test Point 1-NN Truth 1 + 2 - 3 - 4 + 5 +

CS M146 Midterm Exam 5 Answer: Test error = 3/5 = 60%. Test Point 1-NN Truth 1 + + 2 + - 3 - - 4 - + 5 - + (b) [5 points] Previously we used Euclidean distance as our distance metric. Now, we will switch to using the Manhattan distance , also known as the “city block” distance. This metric conceptu- alizes walking between two points in a city laid out in a grid: you can only walk along the streets, not diagonally through buildings, and you add up all the blocks you’d have to walk. Formally, the Manhattan distance between points x (1) = ( x (1) 1 , x (1) 2 ) and x (2) = ( x (2) 1 , x (2) 2 ) is defined as: d ( x (1) , x (2) ) = | x (1) 1 − x (2) 1 | + | x (1) 2 − x (2) 2 | Calculate the test error using k = 3 neighbors. Identify and clearly indicate which test points are misclassified. Fill in the table below. Test Point Test Point Coordinates Nearest Neighbors (Coordinates) Manhattan Distances (to Nearest Neighbors) Predicted Class Actual Class 1 (2, 6) + 2 (4, 1) - 3 (4, 7) - 4 (8, 2) + 5 (9, 7) + Table 1: Classification results using k-NN with Manhattan distance Answer: Test Point Test Point Coordinates Nearest Neighbors (Coordinates) Manhattan Distances (to Neighbors) Predicted Class Actual Class 1 (2, 6) { (1, 5), (2, 3), (4, 8) } { 2, 3, 4 } + + 2 (4, 1) { (4, 2), (6, 1), (5, 3) } { 1, 2, 3 } + - 3 (4, 7) { (4, 8), (7, 8), (6, 9) } { 1, 4, 4 } - - 4 (8, 2) { (8, 3), (7, 4), (6, 1) } { 1, 3, 3 } - + 5 (9, 7) { (9, 6), (7, 8), (7, 5) } { 1, 3, 4 } + + Table 2: Classification results using k-NN with Manhattan distance

CS M146 Midterm Exam 7 3. [10 points total] Decision Trees Suppose you have the following data set of 10 instances to determine whether Alice rates a dish as favorite. Input Attribute Label # Size Taste Temperature Favorite 1 S Sa H Yes 2 S Sa C Yes 3 S Sa C Yes 4 L Sw C Yes 5 L Sw C No 6 L So H No 7 S So H No 8 S So C No 9 L Sw H No 10 L Sw H No Table 3: Abbreviations: S: Small, L: Large, Sa: Salty, Sw: Sweet, So: Sour, H: Hot, C: Cold. The data set consists of 3 features: Size, Taste, Temperature , and a corresponding label Favorite . For all calculations in this problem, use base 2 for the logarithms. (a) [6 points] In the above table, compute the information gains: • I ( Favorite, Size ) • I ( Favorite, Taste ) • I ( Favorite, Temperature )

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CS M146 Midterm Exam 8 Answer: We compute the information gains as follows H ( Favorite ) = − 0 . 4 log 0 . 4 − 0 . 6 log 0 . 6 = 0 . 970950594454 H ( Favorite | Size ) = 0 . 5( − 0 . 4 log 0 . 4 − 0 . 6 log 0 . 6) + 0 . 5( − 0 . 2 log 0 . 2 − 0 . 8 log 0 . 8) = 0 . 846439344 H ( Favorite | Taste ) = 0 . 4( − 0 . 25 log 0 . 25 − 0 . 75 log 0 . 75) = 0 . 32451124978 H ( Favorite | Temperature ) = 0 . 5( − 0 . 2 log 0 . 2 − 0 . 8 log 0 . 8) + 0 . 5( − 0 . 4 log 0 . 4 − 0 . 6 log 0 . 6) = 0 . 846439344 I ( Favorite, Size ) = 0 . 9709505 − 0 . 846439 = 0 . 1245115 I ( Favorite, Taste ) = 0 . 9709505 − 0 . 3245 = 0 . 6469505 I ( Favorite, Temperature ) = 0 . 9709505 − 0 . 8464393 = 0 . 124511 (b) [3 points] What attribute should be selected as the root node for the decision tree? Justify your answer. Using the ID3 algorithm, construct a decision tree using the selected root node. What is the training accuracy? Answer: If we compare the information gain of the various attributes, Taste should be selected as the root node. Running the ID3 algorithm with Taste as the root, we obtain a decision tree with 3 decision nodes. Taste So No Sw Temperature H No C size(Yes or No(tie)) Sa Yes The tree obtains a training accuracy of 0.9, since one point is misclassified. (c) [1 points] Suppose you are allowed only a tree of depth 1 (i.e., a tree with only the root node). Redraw the tree from part (b), pruning nodes at greater depths and replace them with leaf nodes that maximize overall training accuracy. Answer: Taste So No Sw No Sa Yes

CS M146 Midterm Exam 9 4. [12 points total] Linear Models Consider the setting of regression with x ( i ) ∈ R d , y ( i ) ∈ R + . Note we are assuming labels can only take on positive scalars. We have access to a training dataset of n examples, { ( x (1) , y (1) ) , ( x (2) , y (2) ) , · · · , ( x ( n ) , y ( n ) ) } . To learn a regression model, we will consider a linear hypothesis class consisting of the following hypothesis: h θ ( x ) = | θ T x | (1) Here, we use the augmented representation x ∈ R d +1 and θ ∈ R d +1 . We set x 0 = 1 such that θ 0 is the bias parameter and θ 1: d are the weight parameters. In the lectures, you have learned the MSE loss for a training sample ( x ( i ) , y ( i ) ) ℓ MSE ( x ( i ) , y ( i ) , θ ) = 1 2 h y ( i ) − h θ ( x ( i ) ) i 2 . (2) In this problem, we will consider the Mean Squared Logarithmic Error (MSLE) loss. Given a sample ( x ( i ) , y ( i ) ), the MSLE loss is given as: ℓ MSLE ( x ( i ) , y ( i ) , θ ) = 1 2 h log(1 + y ( i ) ) − log(1 + h θ ( x ( i ) )) i 2 . (3) (a) [6 points] The MSLE loss is preferred over MSE when the target outcomes span several orders of magnitude and potentially have outliers. In these cases, we only want to penalize a model for relative errors. Examples include stock market prices, web traffic at different times of the day, etc. For the dataset below, complete the table by computing the MSE and MSLE loss. [You do not need to show your work.] Table 4: Linear regression dataset i 1 2 3 h θ ( x ( i ) ) 3 30 300 y ( i ) 2 20 200 ℓ MSE ( x ( i ) , y ( i ) , θ ) ℓ MSLE ( x ( i ) , y ( i ) , θ ) Answer: Table 5: Linear regression dataset i 1 2 3 h θ ( x ( i ) ) 3 30 300 y ( i ) 2 20 200 ℓ MSE ( x ( i ) , y ( i ) , θ ) 0.5 50 5000 ℓ MSLE ( x ( i ) , y ( i ) , θ ) 0.04138 0.07584 0.08153

CS M146 Midterm Exam 10 (b) [4 points] Derive the partial derivative ∂ℓ MSLE ( x ( i ) ,y ( i ) , θ ) ∂θ j . [ Hint: You can consider using sign function or deriving separately for both cases: θ T x ≤ 0 and otherwise] Answer: Case 1: θ T x ( i ) ≤ 0 h θ ( x ) = − θ T x ( i ) ∂ℓ MSLE ( x ( i ) , y ( i ) , θ ) ∂θ j = log(1 + y ( i ) ) − log(1 − θ T x ( i ) ) 1 − θ T x ( i ) x ( i ) j Case 2: θ T x ( i ) > 0 h θ ( x ) = θ T x ( i ) ∂ℓ MSLE ( x ( i ) , y ( i ) , θ ) ∂θ j = log(1 + θ T x ( i ) ) − log(1 + y ( i ) ) 1 + θ T x ( i ) x ( i ) j (c) [2 points] Write the gradient ∇ θ ℓ MSLE ( x ( i ) , y ( i ) , θ ) using vectorized notation. Answer: ∇ θ ℓ MSLE ( x ( i ) , y ( i ) , θ ) = ( log(1+ y ( i ) ) − log(1 − θ T x ( i ) ) 1 − θ T x ( i ) x ( i ) θ T x ( i ) ≤ 0 , log(1+ θ T x ( i ) ) − log(1+ y ( i ) ) 1+ θ T x ( i ) x ( i ) otherwise .

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

hw7_csUML.docx

cs1011-unit-1-practice-milestone.pdf

MUSIC_inReport.docx

JOURN_isChart.docx

HW0_solutions (1).pdf

practical 1.docx

practice_final_soln.pdf

practice_final.pdf

UCLA-COM-SCI-111-2022F-final.pdf

practice_midterm_soln.pdf

434HW4.docx

CRIM 101 M07 Assignment Template.docx

Recommended textbooks for you

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

EBK JAVA PROGRAMMING

Computer Science

ISBN:9781337671385

Author:FARRELL

Publisher:CENGAGE LEARNING - CONSIGNMENT

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

Microsoft Visual C#

Computer Science

ISBN:9781337102100

Author:Joyce, Farrell.

Publisher:Cengage Learning,

C++ Programming: From Problem Analysis to Program...

Computer Science

ISBN:9781337102087

Author:D. S. Malik

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning