The dating web site Oollama.com requires its users to create profiles based on a survey in which they rate their interest (on a scale from 0 to 3) in five categories: physical fitness, music, spirituality, education, and alcohol consumption. A new Oollama customer, Erin O'Shaughnessy, has reviewed the profiles of 40 prospective dates and classified whether she is interested in learning more about them. Based on Erin's classification of these 40 profiles, Oollama has applied a logistic regression to predict Erin's interest in other profiles that she has not yet viewed. The resulting logistic regression model is as follows: For the 40 profiles (observations) on which Erin classified her interest, this logistic regression model generates that following probability of Interested. Probability of Probability of Observation Interested Interested Observation Interested Interested 35 1 1.000 13 1 0.412 21 1 0.999 2 0 0.285 29 1 0.999 3 0 0.219 25 1 0.999 7 0 0.168 39 1 0.999 9 0 0.168 26 1 0.990 12 0 0.168 23 1 0.981 18 0 0.168 33 1 0.974 22 1 0.168 1 0 0.882 31 1 0.168 24 1 0.882 6 0 0.128 28 1 0.882 20 0 0.128 36 1 0.882 15 0 0.029 16 0 0.791 5 0 0.020 27 1 0.791 14 0 0.015 30 1 0.791 19 0 0.011 32 1 0.791 8 0 0.008 34 1 0.791 10 0 0.001 37 1 0.791 17 0 0.001 40 1 0.791 4 0 0.001 38 0 0.732 11 0 0.000 (a) Using a cutoff value of 0.5 to classify a profile observation as Interested or not, construct the confusion matrix for this 40-observation training set. Predicted Actual 0 1 0 1 Compute sensitivity, specificity, and precision measures and interpret them within the context of Erin's dating prospects. If required, round your answers to two decimal places. Do not round intermediate calculations. The sensitivity of the model is ?? . This suggests that the model is reasonably good or bad at identifying the profiles that Erin is interested in. The specificity of the model is ?? . This suggests that the model is good or bad reasonably at avoiding recommending profiles to Erin that she will not be interested in. The precision of the model is ?? . This suggests that the model is reasonably good or bad at suggesting profiles of interest to Erin. (b) Oollama understands that its clients have a limited amount of time for dating and therefore use decile-wise lift charts to evaluate their classification models. For the training data, what is the first decile lift resulting from the logistic regression model? Interpret this value. The first decile lift of this classification is ?? . It means that the first decile of the logistic regression model halves , doubles , triples, does not change the number of profiles that Erin is interested in versus random selection. (c) A recently posted profile has values of Fitness = 3, Music = 1, Education = 2, and Alcohol = 1. Use the estimated logistic regression equation to compute the probability of Erin's interest in this profile. If required, round your answers to three decimal places. Do not round intermediate calculations. Log odds = ?? Probability of Interest = ??
The dating web site Oollama.com requires its users to create profiles based on a survey in which they rate their interest (on a scale from 0 to 3) in five categories: physical fitness, music, spirituality, education, and alcohol consumption. A new Oollama customer, Erin O'Shaughnessy, has reviewed the profiles of 40 prospective dates and classified whether she is interested in learning more about them.
Based on Erin's classification of these 40 profiles, Oollama has applied a logistic regression to predict Erin's interest in other profiles that she has not yet viewed. The resulting logistic regression model is as follows:
For the 40 profiles (observations) on which Erin classified her interest, this logistic regression model generates that following
Probability of | Probability of | ||||||
Observation | Interested | Interested | Observation | Interested | Interested | ||
35 | 1 | 1.000 | 13 | 1 | 0.412 | ||
21 | 1 | 0.999 | 2 | 0 | 0.285 | ||
29 | 1 | 0.999 | 3 | 0 | 0.219 | ||
25 | 1 | 0.999 | 7 | 0 | 0.168 | ||
39 | 1 | 0.999 | 9 | 0 | 0.168 | ||
26 | 1 | 0.990 | 12 | 0 | 0.168 | ||
23 | 1 | 0.981 | 18 | 0 | 0.168 | ||
33 | 1 | 0.974 | 22 | 1 | 0.168 | ||
1 | 0 | 0.882 | 31 | 1 | 0.168 | ||
24 | 1 | 0.882 | 6 | 0 | 0.128 | ||
28 | 1 | 0.882 | 20 | 0 | 0.128 | ||
36 | 1 | 0.882 | 15 | 0 | 0.029 | ||
16 | 0 | 0.791 | 5 | 0 | 0.020 | ||
27 | 1 | 0.791 | 14 | 0 | 0.015 | ||
30 | 1 | 0.791 | 19 | 0 | 0.011 | ||
32 | 1 | 0.791 | 8 | 0 | 0.008 | ||
34 | 1 | 0.791 | 10 | 0 | 0.001 | ||
37 | 1 | 0.791 | 17 | 0 | 0.001 | ||
40 | 1 | 0.791 | 4 | 0 | 0.001 | ||
38 | 0 | 0.732 | 11 | 0 | 0.000 |
(a) | Using a cutoff value of 0.5 to classify a profile observation as Interested or not, construct the confusion matrix for this 40-observation training set. | |||||||||||
|
||||||||||||
Compute sensitivity, specificity, and precision measures and interpret them within the context of Erin's dating prospects. | ||||||||||||
If required, round your answers to two decimal places. Do not round intermediate calculations. | ||||||||||||
The sensitivity of the model is ?? . This suggests that the model is reasonably good or bad at identifying the profiles that Erin is interested in. The specificity of the model is ?? . This suggests that the model is good or bad reasonably at avoiding recommending profiles to Erin that she will not be interested in. The precision of the model is ?? . This suggests that the model is reasonably good or bad at suggesting profiles of interest to Erin. |
||||||||||||
(b) | Oollama understands that its clients have a limited amount of time for dating and therefore use decile-wise lift charts to evaluate their classification models. For the training data, what is the first decile lift resulting from the logistic regression model? Interpret this value. | |||||||||||
The first decile lift of this classification is ?? . It means that the first decile of the logistic regression model halves , doubles , triples, does not change the number of profiles that Erin is interested in versus random selection. |
||||||||||||
(c) | A recently posted profile has values of Fitness = 3, Music = 1, Education = 2, and Alcohol = 1. Use the estimated logistic regression equation to compute the probability of Erin's interest in this profile. | |||||||||||
If required, round your answers to three decimal places. Do not round intermediate calculations. | ||||||||||||
Log odds = ?? Probability of Interest = ?? |
||||||||||||
(d) | Now that Oollama has trained a logistic regression model based on Erin's initial evaluations of 40 profiles, what should its next steps be in the modeling process? | |||||||||||
Oollama should use their model to suggest profiles of interest, with lack of interest, or both to Erin in order to compute classification accuracy measures on a validation set. | ||||||||||||
Trending now
This is a popular solution!
Step by step
Solved in 3 steps
how you will calculate the degrees of freedom in multiple linear regression and in simple linear regression?