Homework 4

docx

School

George Mason University *

*We aren’t endorsed by this school

Course

688

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

3

Uploaded by BailiffSteelLemur27

Report
Homework #4 Your name: Your student ID: Please submit this word docx file and the .ipynb format of your code. .py is not accepted. Make sure you run your code, points will be deducted if you did not run the code. (hint: please use the class code as your reference to start with) Part I: Instructions: 1] Please read the paper titled “ Prediction for the Risk of Multiple Chronic Conditions Among Working Population in the United States with Machine Learning Models ” to understand the problem of prediction of high risk patients for hypertension. 2] Please download the data called hypertension_dataset.csv and Homework_4_2023.ipynb 3] The prediction variable (y variable) is “Flag”. Questions: 1.1 How many samples are in the positive class (denoted as 1 in the Flag column) and negative class (denoted as 0 in the Flag column)? [5 points] 1.2 In line 5, split the training and testing data with 65% for training and 35% for testing, set the random_state variable to be your last two digits of your student ID. [5 points] 1.3 In line 8 and 9, the model selected 15 best features, what are they? [5 points] 1.4 In line 12, you will use the MinMaxScaler to scale the features, write your code to finish the task. [5 points] 1.5 In line 15, use the function provided “ConfusionMatrix_Report” to report the prediction accuracy. [5 points] 1.6 In line 16, write a code that can generate the prediction accuracy report based on a combination of hyperparameters kernels = ['linear', 'rbf', 'poly'], C_values = [0.01, 0.1, 0.5, 1, 5, 10] (there will be 3x6=18 ACC F1 confusion matrix reports) [20 points] 1.7 What is the best parameter combination for this problem [5 points] 1.8 What is hyperparameter tuning discussed in the paper “ Prediction for the Risk of Multiple Chronic Conditions Among Working Population in the United States With Machine Learning Models ” [10 points]
Part 2: 2.1 What is SVM classification, what are the hyperparameters for SVM? What is soft margin for SVM? (5 points) 2.2 What is the difference between regression and classification task? What is the Variance Inflation Factor in multivariance regression? (5 points) 2.3 What is the difference between LASSO and ridge regression? (5 points) 2.4 What is the k-means algorithm, can you search in scholar.google.com and find one application of this k-means method? (5 points) 2.5 What is mRMR feature selection method? Can you search in scholar.google.com and find one application of this mRMR method? Can you name a few other feature selection methods? (5 points)
3. Please write down any suggestions for this course (5 points).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help