Hw12P8

pdf

School

San Francisco State University *

*We aren’t endorsed by this school

Course

Subject

Computer Science

Date

Nov 24, 2024

Type

pdf

Pages

Uploaded by MinisterSteelSheep33

In [16]: k = 47 k Out[16]: 47 In [17]: grader.check( "q1_6" ) Out[17]: q1_6 passed! Question 1.7. Why do we divide our data into a training and test set? What is the point of a test set, and why do we only want to use the test set once? Explain your answer in 3 sentences or less. (10 points) Hint: Check out this section in the textbook. The data is divided into a training and test set to minimize overfitting and balance training and testing accuracies to form the most reliable classifications. We should not use our test set to find the best possible number of neigbors as the test is meant to evaluate the performance of our classifier, and using the test set to find the best possible k would skew the classifier towards that particular test set and the result would not be an objective evaluation ofthe performance, and instead be biased. This is why we only want to use the test set once, as it represents an out-of-sample data set which can be used to objectively evaluate performance, skewing the performance if used multiple times. Question 1.8. Why do we use an odd-numbered k in k-NN? Explain. (10 points) We use an odd-numbered k in k-NN to ensure no ties when evaluating the data. For this example, if you chose 4 as the k value, and you end up with two neighbors as "Berkeley" and two as "Stanford," the result is inconclusive and we cannot classify the data point. Question 1.9.0. Setup Thomas has devised a scheme for splitting up the test and training set. For each row from coordinates : ● Rows for Stanford students have a 50% chance of being placed in the training set and 50% chance of being placed in the test set. ● Rows for Berkeley students have a 80% chance of being placed in the training set and 20% chance of being placed in the test set. Hint 1: Remember that there are 77 Berkeley students and 23 Stanford students in coordinates. Hint 2: Thomas' last name is Bayes. (So 18.1 from the textbook may be helpful here!)

Discover more documents: Sign up today!

Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

2023-07-31 (1).png

B4B56375-784B-430A-9730-7469F44278FC

CS 4402 DF 8.edited.docx

Lab 7 status report .docx

EE5731_Questions_updated_7-7.pdf

2.05_task1.docx

Reading assignement-N5.pdf

Assignment 3.pdf

Screenshot 2023-11-01 104446.png

Discussion Post 2- MASTER (1).htm

Week 3.docx

IMG_1920.jpeg

Recommended textbooks for you

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

Programming with Microsoft Visual Basic 2017

Computer Science

ISBN:9781337102124

Author:Diane Zak

Publisher:Cengage Learning

A Guide to SQL

Computer Science

ISBN:9781111527273

Author:Philip J. Pratt

Publisher:Course Technology Ptr

CMPTR

Computer Science

ISBN:9781337681872

Author:PINARD

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
A Guide to SQL
Computer Science
ISBN:9781111527273
Author:Philip J. Pratt
Publisher:Course Technology Ptr
CMPTR
Computer Science
ISBN:9781337681872
Author:PINARD
Publisher:Cengage