Homework #3
Your name:
Your student ID:
Please submit this word docx file and the .ipynb format of your code. .py is not accepted.
Make sure you run your code, points will be deducted if you did not run the code.
(hint: please use the class code as your reference to start with)
Instructions:
1] Please read the paper titled “
Prediction for the Risk of Multiple Chronic Conditions Among
Working Population in the United States With Machine Learning Models
” to understand the
problem of prediction of high risk patients for hypertension.
2] Please download the data called
hypertension_dataset.csv
and
Homework_3_2023.ipynb
3] The prediction variable (y variable) is “Flag”.
Questions:
1)
How many samples are in the positive class (denoted as 1 in the
Flag
column) and negative
class (denoted as 0 in the
Flag
column)? [5 points]
2)
In line 5, split the training and testing data with 65% for training and 35% for testing, set the
random_state variable to be your last two digits of your student ID. [5 points]
3)
In line 8 and 9, the model selected 15 best features, what are they? [5 points]
4)
In line 12, you will use the MinMaxScaler to scale the features, write your code to finish the
task.
[5 points]
5)
In line 15, use the function provided “ConfusionMatrix_Report” to report the prediction
accuracy.
[5 points]
6)
In line 16, write a code that use two
for loops
to generate the prediction accuracy report
based on a combination of hyperparameters kernels = ['linear', 'rbf', 'poly'], C_values = [0.01,
0.1, 0.5, 1, 5, 10] (there will be 3x6=18 ACC F1 confusion matrix reports) [30 points] [hint
leverage the class code in Lecture 8.]
7)
What is the best parameter combination for this problem [5 points]
8)
What is hyperparameter tuning discussed in the paper “
Prediction for the Risk of Multiple
Chronic Conditions Among Working Population in the United States With Machine Learning
Models
” [10 points]
9) Please write down any suggestions for this course (3 points).