CSC522_HW2

pdf

School

North Carolina State University *

*We aren’t endorsed by this school

Course

522

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by DrKuduMaster1078

Homework 2 Automated Learning and Data Analysis Dr. Pankaj Telang Spring 2024 Instructions Due Date: February, 21 2024 at 11:45 PM Total Points : 30 Submission checklist : • Clearly list each team member’s names and Unity IDs at the top of your submission. • Your submission should be a single PDF file containing your answers. Name your file : G(homework group number) HW(homework number), e.g. G1 HW2. • If a question asks you to explain or justify your answer, give a brief explanation using your own ideas, not a reference to the textbook or an online source. • Submit your PDF through Gradescope under the HW2 assignment (see instructions on Moodle). Note : Make sure to add you group members at the end of the upload process. • Submit the programming portion of the homework individually through JupyterHub. 1

ADLA – Spring 2024 Homework 2 Last Updated: February 15, 2024 1 Evaluation Measures & Pruning (15 points) This analysis pertains to the Titanic Survival Prediction dataset, which includes attributes about the survival status of individual passengers on the Titanic (Yes/No). To predict the survival, consider using the decision tree shown in Figure 1 which involves Ticket Price, Gender, Pclass (passenger class) and Age. Complete the following tasks: Figure 1: Decision Tree 1a) ( 4 points ) Use the decision tree above to classify the provided dataset. hw2q1.csv . Construct a confusion matrix and report the test Accuracy, Error Rate, Precision, Recall, and F1 score. Use “Yes” as the positive class in the confusion matrix. 1b) ( 4 points ) Calculate the optimistic training classification error before splitting and after splitting using Age , respectively. Consider only the subtree starting with the Age node. If we want to minimize the optimistic error rate, should the node’s children be pruned? 1c) ( 4 points ) Calculate the pessimistic training errors before splitting and after splitting using Age re- spectively. Consider only the subtree starting with the Age node. When calculating pessimistic error, use a leaf node error penalty of 0.8. If we want to minimize the pessimistic error rate, should the node’s children be pruned? 1d) ( 3 points ) Assuming that the “Age” node is pruned, recalculate the test Error Rate using hw2q1.csv . Based on your evaluation using the dataset in hw2q1.csv , was the original tree (with the Age node) over-fitting? Why or why not? 2

ADLA – Spring 2024 Homework 2 Last Updated: February 15, 2024 2 1-NN, & Cross Validation (15 points) Consider the following dataset (9 instances) with 2 continuous attributes ( x 1 and x 2 ) that have been scaled to be in the same range, and a class attribute y , shown in Table 1. For this question, we will consider a 1-Nearest-Neighbor (1-NN) classifier that uses euclidean distance. Table 1: 1-NN ID x1 x2 Class 1 5.56 1.25 + 2 3.61 3.33 - 3 8.06 5 - 4 3.89 4.17 + 5 10 7.5 - 6 2.78 7.08 - 7 1.94 0 + 8 2.22 6.25 + 9 6.11 4.17 - 2a) Calculate the distance matrix for the dataset using euclidean distance. Tip : You can write a simple program to do this for you, and there is an example of how to do this in the programming portion of this homework. 2b) By hand, evaluate the 1-NN classifier, calculating the confusion matrix and testing accuracy (show your work by labeling each data object with the predicted class). Tip : you can scan a row or column of the distance matrix to easily find the closest neighbor. Use the following evaluation methods: i) A holdout test dataset consisting of last 4 instances ii) 3-fold cross-validation, using the following folds with IDs: [1,2,3], [4,5,6], [7,8,9] respectively. iii) Leave one out cross validation (LOOCV) 2c) For a data analysis homework, you are asked to perform an experiment with a binary classification algorithm that uses a “simple majority vote classifier“ which always predicts the majority class in the training dataset (if there is no majority, one of the classes is chosen at random). You are given a dataset with 50 instances and a class attribute that can be either Positive or Negative. The dataset includes 25 positive and 25 negative instances. You use three different validation methods: holdout (with a random 30/20 training/validation split), 5-fold cross validation (with random folds) and LOOCV. You expect the simple majority classifier to achieve approximately 50% validation accuracy, but for one of these evaluation methods you get 0% validation accuracy. Which evaluation gives this results and why? 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

CMIT 351 Project 2 Template.docx

Week_01 copy.pdf

Larry -Watson Week 1 Writing Assignment CSI EFFECT 3 FEB 2020.docx

CYB_300_Milestone_One_Worksheet_SG.docx

HW2 Solution.pdf

cs6350_reading_assignment1_vxt200001.docx

ARCH+150+Midterm+SOLUTIONS+2022.pdf

Chapter5 Lab - Coding Summary Queries (1).docx

Desiree Ellison D179 Task 2.docx

609_Intervention Recommendations Assignment-1 (1)-1 (3).docx

Module 2 short responses.docx

annotated-Module20Assignment-1.pdf

Recommended textbooks for you

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781337627900

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

SEE MORE TEXTBOOKS

Recommended textbooks for you

Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781285196145
Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781337627900

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

SEE MORE TEXTBOOKS