Assignment - DT and RF
docx
keyboard_arrow_up
School
Indiana University, Bloomington *
*We aren’t endorsed by this school
Course
S364
Subject
Mathematics
Date
Feb 20, 2024
Type
docx
Pages
5
Uploaded by EarlNightingale4120
Name(s): Adi Sarangee
K353 Assignment - Decision Trees and Random Forests (Total Points: 40)
1.
Conceptual Questions (10 point):
These questions pertain to key concepts covered during the class. This will include a series of multiple choice, fill in the blank, short answer, and matching questions. These questions are tightly linked to the learning objectives of that week. Questions are also likely candidates for exams. 2.
Hands-on Exercises (15 point):
These questions relate to the hands-on activity(ies). These activities are related to the content covered in the chapter and give students hands-on experience, which is highly sought after by employers for the exciting entry-level positions in the industry.
3.
Custom Code Implementation (15 point):
These questions allow students to create their own code to achieve a particular task of their choosing. The problem specification must be clear, code commented, and the code must also include concepts that were covered that week. These questions are intended to get students to think creatively about the concepts covered in class and build something of use or of interest to them. 4.
Student Feedback (ungraded):
These questions allow each student to offer feedback to the instructor of there were particular areas which were difficult or needed additional explanation. Students may form groups of up to two
for assignments. You may also choose to work alone. However, the number of questions and the questions themselves will not change if you choose to work alone or with
someone. If you choose to work with someone, only one of you is required to submit the assignment with BOTH of your names on it. Both of you will receive the same score for the assignment. You may choose to work individually for certain assignments, and in groups for others. However, you are responsible for making these decisions and resolving any potential conflicts (e.g., free-riding) – neither I nor the TAs will intervene. No late assignments will be accepted.
In this course, turnitin.com will be utilized. Turnitin is an automated system which instructors may use to quickly and easily compare each student's assignment with billions of web sites, as well as an enormous database of student papers that grows with each submission. After the assignment is processed, as instructor I receive a report from turnitin.com that states if and how another author’s work was used in the assignment. For a more detailed look at this process visit http://www.turnitin.com.
Suggestion
The document is tightly styled. After every question, there is space to respond to the question. Questions use the “question” style and the blank space between questions uses the “answer” style. Students should just start typing into the space provided for the answers and their answers will be distinct from the questions to facilitate grading.
1
Name(s): Please insert your name(s) here
Decision Trees and Random Forests
Conceptual Questions (10 points)
1.
Fill in the Blank: In a decision tree, each internal node represents a ____________, and each leaf node represents a ____________. a)
feature, class b)
attribute, decision c)
decision, feature d)
class, feature
2.
Multiple Choice: What problem in decision trees does "overfitting" refer to? a)
The tree is too small and lacks predictive power. b)
The tree is too large and fits the training data noise. c)
The tree is perfectly balanced. d)
The tree has too few branches.
3.
Pruning in decision trees is a technique used to: a)
Increase the depth of the tree b)
Make the tree larger c)
Trim or reduce the size of the tree to prevent overfitting d)
Randomly select features for splitting
4.
What is "bagging" in the context of random forests? a)
A technique to trim decision trees b)
A method for handling missing values in datasets c)
The process of aggregating multiple decision trees to reduce variance d)
A way to balance bias and variance in a single decision tree
5.
What trade-off does a random forest aim to address? a)
The trade-off between precision and recall b)
The trade-off between bias and variance c)
The trade-off between feature selection and feature engineering d)
The trade-off between classification and regression
Please refer to these links for additional resources about the questions: 1.
Slide Decks
2.
Scikit-Learn Decision Trees - https://scikit-learn.org/stable/modules/tree.html
3.
Scikit-Learn Emsembles -
https://scikit-learn.org/stable/modules/ensemble.html
4.
1
Name(s): Please insert your name(s) here
Hands-on Exercises (15 points) Exercise 1 - Titanic Survival Prediction with Decision Trees
Please download the Titanic dataset and build a decision tree classifier to predict passenger Survival
on the Titanic based on various features.
Step 1: Data Preparation
Check missing values of the variables and handle the missing values by filling them with mean values.
Encode categorical features (e.g., Sex) using one-hot encoding or label encoding.
Step 2: Feature Selection
Select the features for building the decision tree classifier. For this exercise, use Pclass, Sex, Age, SibSp, Parch, and Fare
as your features.
Step 3: Split the Data
Split the dataset into training and testing sets. Use 70% of the data for training.
Step 4: Build and Train the Decision Tree
Create a decision tree classifier object and fit it to the training data.
Step 5: Make Predictions
Use the trained decision tree to make predictions on the test data.
Step 6: Evaluation
Calculate and print the accuracy of the decision tree classifier on the test data.
Choose and calculate two other metrics (e.g., precision, recall, F1-score, AUC, etc.) to assess the
model's performance.
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Name(s): Please insert your name(s) here
Custom Code Implementations (15 points)
1.
Using the same Titanic dataset, please create a Random Forest classifier now.
2.
You can choose any features to include for your model training. Please explain why you included the specific features in your model.
3.
If needed, you can try any hyperparameter tuning strategies.
4.
After making predictions, please use the same evaluations as the ones you chose in the previous exercise.
5.
Compare the two models according to the evaluation metrics, and try to explain the potential reasons why one classifier outperforms the other in this context.
3
Name(s): Please insert your name(s) here
6.
2. The specific features chosen for the random forest classifier were Pclass, Age, Sex(female and male) (by encoding them separately you can better account for the decision tree), and Fare. These features were chosen because these are the most likely factors that influence the chance of survival. 1.
Using the evaluation metrics. The random forest does a better job with the data and predicting survivability. Because the random forest uses bagging, it can do a better job handling complex relationships and therefore reducing overfitting of the model. This is why it
may be better suited to handle the Titanic.csv data.
Student Feedback (No Points; Ungraded)
On a scale of 1 – 10 how difficult (1 being very easy and 10 being extremely difficult) was this assignment
for you?
8
How long did this assignment take you to complete?
45 minutes
Please list any additional feedback you have about this assignment. N/A
4