hw3

pdf

School

Stevens Institute Of Technology *

*We aren’t endorsed by this school

Course

500

Subject

Electrical Engineering

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by CorporalLightning10005

AAI/CPE/EE 695 Applied Machine Learning: Homework #3 Spring 2024 Due date: 3/4/2024 , end of day (11:59PM) Written Response Question(s): 1. Explain the bias-variance tradeoff. Describe a few techniques to reduce bias and variance respectively. 2. Assume the following is the confusion matrix of a classifier. Calculate the following metrics for this classifer: a. Precision b. Recall c. F1-Score Predicted Results Class 1 Class 2 True values Class 1 50 30 Class 2 40 60 3. Build a decision tree using the following training instances (using the information gain approach): DAY OUTLOOK TEMPERATURE HUMIDITY WIND PLAYTENNIS D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes For questions 1-4, please submit a *.pdf file via Canvas. For question 5, please submit a *.ipynb file via Canvas.

4. The naïve Bayes method is an ensemble method, as we learned in Lecture 5. Assume we have three classifiers, and their predicted results are given in Table 4.1. The confusion matrix of each classifier is given in Table 4.2. Calculate the final decision using the Naïve Bayes method: Table 4.1 Sample x Result Classifier 1 Class 1 Classifier 2 Class 1 Classifier 3 Class 2 Table 4.2 Programming Problem(s): 5. Use decision tree and random forest to train models using the titanic.csv dataset included in this assignment. Step 1: Read titanic.csv and observe a few samples, noting that there are both categorical and numerical features. If some features are missing, fill them in using the average of the same feature of other samples. Take a random 80% of samples for training and use the remaining 20% for testing. Step 2 : Fit a decision tree model using independent variables “pclass + sex + age + sibsp” and dependent variable “survived.” Plot the full tree. Make sure “survived” is a qualitative variable taking 1 (yes) or 0 (no) in your code. You may see a tree similar to this one, but the actual structure and size may vary: Step 3 : Use the GridSearchCV() function to find the best value for the parameter max_leaf_nodes to prune the tree. Plot the pruned tree, which will be smaller than the tree you obtained in Step 2. b) Classifier 1 Class 1 Class 2 Class 1 40 10 Class 2 30 20 a) Classifier 2 Class 1 Class 2 Class 1 20 30 Class 2 20 30 c) Classifier 3 Class 1 Class 2 Class 1 50 0 Class 2 40 10

Step 4: For the pruned tree, report its accuracy on the test set for the following: 1. Percent survivors correctly predicted (on test set) 2. Percent fatalities correctly predicted (on test set) Step 5: Use the RandomForestClassifier() function to train a random forest using the value of max_leaf_nodes you found in Step 3. You can set n_estimators =50. Report the accuracy of the random forest on the test set for the following: 1. Percent survivors correctly predicted (on test set) 2. Percent fatalities correctly predicted (on test set) Check whether the random forest improves on the results of the single tree from Step 4.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version