Hw10LR-Clustering

pdf

School

University of Michigan *

*We aren’t endorsed by this school

Course

300

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

2

Uploaded by DukeCloverLeopard24

Report
TO 301 “Homework 10 Logistic Regression and Clustering This problem set is for practice only. It will not be graded. 1. The Hw10Data.csv file (in Canvas) contains data on 100 consumers about their beer preferences vs variables: gender, marital status, annual income, and age. Make sure to transform variables to correct types when you upload the file in Radiant/R. a. Run a logistic regression and obtain a final model where all variables are significant at 5% level. Interpret the coefficients. b. Predict the likelihood that a 47 year-old, married, male consumer with an annual income of $42,000 prefers Regular beer. c. Predict and find confidence intervals for the likelihoods of individuals in the data file preferring Regular beer (you are NOT responsible to predict using Radiant in this or in parts below. If you are interested, Choose Predict, from Data file, and specify Hw10DataNew file to predict from). d. Under the Predict tab, using the command: IncomeUSDx1000 = seq(40, 80, 20), Age = seq(30, 60, 15) find the predictions and confidence intervals for the likelihoods of Regular beer preferences of all combinations of (average) individuals having incomes $40,000, $60,000, $80,000 and ages 30, 45, and 60. e. Under the Predict tab, using the command: IncomeUSDx1000 = seq(39, 41, 1), Age = seq(44, 46, 1) find the predictions and confidence intervals for the likelihoods of Regular beer preferences of all combinations of (average) individuals having incomes $39,000, $40,000, $41,000 and ages 44, 45, and 46. 2. Using the same data: a. Implement the K-means clustering model on Income and Age, obtain three clusters, and interpret the heterogeneity between clusters. b. Store the Cluster ID of each data point creating a new variable in the data. Using only the new Cluster variable as a categorical variable and Cluster 1 as the reference category, run a logistic regression to find the effects of the clusters on the likelihood of Regular beer preference. Interpret the results. 3. (You are NOT responsible for creating the Confusion Matrix. If you are interested, how to create it is demonstrated below. Please use the given Confusion Matrix to answer the questions). Using the final logistic regression model found in question 1a, create a new variable that is “Regular” if the predicted probability is greater than or equal to 0.5, and “Light” otherwise as your prediction of whether each individual prefers Regular or Light beer. Then create a confusion matrix using predicted and actual preferences and for the model find:
Probability of False Positive (assuming “Regular” corresponds to Positive) Probability of False Negative (assuming “Light” corresponds to Negative) Sensitivity (probability correctly predicting “Regular”) Specificity (probability of correctly predicting “Light”) Accuracy
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help