Homework 6C

pdf

School

University of Arkansas *

*We aren’t endorsed by this school

Course

4143

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by SuperBatMaster87

1 Homework 5 Problem 3 1. In a decision tree, the feature used for the first split at the root node is typically chosen based on: (a) Random selection (b) Feature with the highest information gain (c) Feature with the lowest entropy (d) Feature with the highest standard deviation 2. A decision tree with deeper depth and more complex splits is likely to: (a) Overfit the training data (b) Underfit the training data (c) Generalize well to unseen data (d) Perform better only on categorical data 3. What is the maximum depth of a decision tree with 7 leaf nodes? (The depth of a decision tree is defined as the length of the longest path from the root node to a leaf node in the tree) (a) 6 (b) 7 (c) 8 (d) It depends on the dataset 4. What is the primary advantage of using decision trees for classification and regression? (a) Simplicity and interpretability (b) Computational efficiency (c) Ability to handle missing data (d) High resistance to overfitting

2 Problem 4 The titanic train.csv dataset on the Blackboard Learn system contains information about passengers who was on the Titanic. We would like to train a logistic regression classifier to predict whether a passenger can survive or not. Detailed information about this dataset can be found at https://www.kaggle.com/competitions/titanic/data . Complete the following tasks: 1. Download the dataset, load it into Python, and preview it. Then, extract features and labels from the dataset, and describe how you identified the features and what they are: • Features: Passenger Class (Pclass), Sex, Age, Sibling/Spouse (SibSp), Parent/Child (Parch), Fare, and Embarked. • Label: Survived. 2. Perform appropriate preprocessing to make the data suitable for logistic regression. Please describe your data preprocessing steps in your submission. ( Hint : To deal with categorical variables, please check out sklearn.preprocessing.LabelEncoder . To remove NaN s, please use df.dropna() ). Data Processing: • dfdropna(): In order to apply the code line dfdropna(), I used the fillna() code line first to display how many missing values we have in our data per feature. Then I used the dfdropna() code line to clear those missing values which are considered as outliers that could influence in our performance model. • sklearn.preprocessing.LabelEncoder I used LabEncoder to transform categorical data in my data processing. The encoder is then fitted to the given labes and transformed into numerical representations using fit_transform(). The output will include the original categories labels as well as the encoded numerical representations produced by the LabEncoder.

3 3. Split the entire dataset into a training dataset and a test dataset that have reasonable sizes. Train a logistic regression model using the training dataset, evaluate its performance on the test set. In your submission, please attach the generated classification report. Classification report with precision, recall, f1-score, and support:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4 Problem 5 Answer the following questions. 1. In the figure below, you may see two classes (blue circles and green squares), a separating line (solid line) and two margins (dashed line). Which of the data points are called support vectors? The points 5 and 7 are called support vectors and are equidistant from the hyperplane. 2. Which of the following statement is true? Select all answers that are true (a) For a p -dimensional feature space, there exists a p -dimensional “plane” that cuts the feature space into two halves. (b) For a p -dimensional feature space, there exists a ( p − 1)-dimensional “plane” that cuts the feature space into two halves. (c) If a hyperplane, β 0 + β 1 x 1 + β 2 x 2 = 0, divides the 2-dimensional feature space into two halves, the upper half is represented by β 0 + β 1 x 1 + β 2 x 2 < 0. (d) If a hyperplane, β 0 + β 1 x 1 + β 2 x 2 = 0, divides the 2-dimensional feature space into two halves, the lower half is represented by β 0 + β 1 x 1 + β 2 x 2 < 0.

5 Figure 1: The Question 3 in Problem 4 3. Which of the following statement is true? Select all answers that are true (a) In Figure 1, the “margin” is given by the line A. (b) In Figure 1, the “margin” is given by the line B. (c) The maximal margin classifier minimizes the margin of the separating hyperplane. (d) The support vectors must have equal distances from the maximal margin hyperplane. 4. In Figure 1, if the Maximal Margin Classifier can be found, which data points are most likely to become the support vectors? The points that are more likely to become the support vectors are points 1 and 6 because they are closer to hyperplane. 5. Which of the following statement is true? (a) The Maximal Margin Classifier is not robust against adding or deleting data points (b) The Maximal Margin Classifier may not always exist (c) The Support Vector Classifier allows some points to be on the wrong side of the margin, but not on the wrong side of the separating hyperplane. (d) The Support Vector Classifier allows some points to be on the wrong side of the margin, or on the wrong side of the separating hyperplane.

6 Perform appropriate preprocessing to make the data suitable for logistic regression. Please describe your data preprocessing steps in your submission at the moment to use sklearn.preprocessing.LabelEncoder. and df.dropna() ).

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Lab 1 - Debugging New.docx

lb 3.pdf

Lab1-EIGRP.pdf

Lab 9 - Ethernet and ARP.docx

Assignment 3 UC.docx

Tech 4010 - Assignment 3-2.pdf

Homework 8C.pdf

MIS QUIZ 2.docx

Homework 3C.pdf

206 Lab G - Struct and Dynamic memory.pdf

HW3 (2).pdf

Homework2_Solution.pdf

Recommended textbooks for you

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

MIS

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning
MIS
Computer Science
ISBN:9781337681919
Author:BIDGOLI
Publisher:Cengage
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning