Assignment 2
pdf
keyboard_arrow_up
School
University of Michigan, Dearborn *
*We aren’t endorsed by this school
Course
DS 633
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
Pages
13
Uploaded by monikagautam93
Assignment 2
1.
Odds are defined: Odds = p/(1 – p). Use this relationship to show that p = Odds/(1 + Odds) = 1/(1 + Odds-1). For your solution, do not assume any specific functional form p = f(X) (so no x’s should appear in your solution). In particular, do not assume the form of the logistic function 1/(1 + e^-f(X)) as we do for logistic regression. (Hint: What does 1 + Odds equal?).
Given,
???? = ?
1 − ?
To prove: ? =
????
1 + ????
=
1
(1 + ????
−1
)
Consider R.H.S,
1
(1 + ????
−1
)
=
???? ∗ 1
????(1 + ????
−1
)
[
????𝑖??? ????????? & ?????𝑖????? ?? ????]
=
????
(???? + 1)
=
????
(1 + ????)
????
1 + ????
=
?
1 − ?
1 +
?
1 − ?
[?𝑖??? ???? =
?
1 − ?
]
=
?
1 − ?
1 − ? + ?
1 − ?
= ? [
????? ??????]
2.
Consider the Card Study data highlighted at the end of the Week 2. Use the file “card_study.jmp” rather than the Excel file. Note that the jmp file has value labels, a target level, and correct nominal encoding already (whereas importing from Excel would assume continuous variables by default). Here is the scatter plot:
For this data, determine the two thresholds in terms of Purchases that separate the Upgraded and Not Upgraded regions, for both Extra Cards = No (0) and Extra Cards = Yes (1). For the thresholds, use the cutoff propensity 0.5. (Hint: See the annotated Week 2 slides.) Show your calculations. (You may use Excel, and it may be easy to copy and paste the coefficients from JMP (click on the plus sign to reveal the formula after saving the prediction formulas to the Data Table) to Excel. If so, go ahead and submit your Excel file too.) Finally, list the 6 misclassifications (Purchases, Extra Cards). (Note that these misclassifications are in the Training data, as there are not Validation/Test data for this problem.)
?𝑖?[𝑌??] = (−5.55267123468932) + 0.139468534472191 ∗ ????ℎ????
+ ????ℎ(𝐸???? 𝐶????) (
0 ⇒ −1.38716758819077
1 ⇒ 1.38716758819077
???? ⇒ . )
????[𝑌??] =
1
(1 + 𝐸??(−?𝑖?[𝑌??]))
????[??] =
1
(1 + 𝐸??(?𝑖?[𝑌??]))
???? 𝑈???????? = ????? (
????[𝑌??] ⇒ 1
????[??] ⇒ 0
???? ⇒ .
)
Extra Cards at 0 and determine the Purchases value at which the Log Odds is 0.
P=0.5
ln (
?
1 − ?
) = ?
0
+ ?
1
∗ ????ℎ????
ln (
0.5
1 − 0.5
) = (−5.55267123468932) + 0.139468534472191 ∗ ????ℎ????
− 1.38716758819077
0 = −6.93983881 + 0.139468534472191 ∗ ????ℎ????
????ℎ???? =
6.93983881
0.139468534472191
𝑷𝒖?𝒄𝒉𝒂?𝒆? = ??. ???????
When Extra Card =1
ln (
?
1 − ?
) = ?
0
+ ?
1
∗ ????ℎ????
ln (
0.5
1 − 0.5
) = (−5.55267123468932) + 0.139468534472191 ∗ ????ℎ????
+ 1.38716758819077
0 = −4.165503646 + 0.139468534472191 ∗ ????ℎ????
????ℎ???? =
4.165503646
0.139468534472191
𝑷𝒖?𝒄𝒉𝒂?𝒆? = ??. ????????
By seeing the result, we can conclude that the threshold decreases if there are Extra Cards.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.
Read the Titanic Passengers case study. Note the errata on the Week 2 page. Use the
“TitanicPassengers.jmp” dataset for this exercise. a.
Recreate the analysis and graphs Exhibits 2 and 3 of the Titanic Passengers case study, and record the parameter estimates for each simple model.
Exhibit 2
Passenger Class By Survived
Mosaic Plot
Contingency Table
Tests
N
DF
-LogLike
RSquare (U)
1309
2
63.882734
0.0734
Test
ChiSquare
Prob>ChiSq
Likelihood Ratio
127.765
<.0001*
Pearson
127.859
<.0001*
Count
Total %
Col %
Row %
Yes
No
Total
1
200
15.28
40.00
61.92
123
9.40
15.20
38.08
323
24.68
2
119
9.09
23.80
42.96
158
12.07
19.53
57.04
277
21.16
3
181
13.83
36.20
25.53
528
40.34
65.27
74.47
709
54.16
Total
500
38.20
809
61.80
1309
Sex By Survived
Mosaic Plot Contingency Table
Tests
N
DF
-LogLike
RSquare (U)
1309
1
186.46067
0.2142
Test
ChiSquare
Prob>ChiSq
Likelihood Ratio
372.921
<.0001*
Pearson
365.887
<.0001*
Fisher's Exact Test
Prob
Alternative Hypothesis
Left
1.0000
Prob(Survived=No) is greater for Sex=female than male
Right
<.0001*
Prob(Survived=No) is greater for Sex=male than female
2-Tail
<.0001*
Prob(Survived=No) is different across Sex
Count
Total %
Col %
Row %
Yes
No
Total
female
339
25.90
67.80
72.75
127
9.70
15.70
27.25
466
35.60
male
161
12.30
32.20
19.10
682
52.10
84.30
80.90
843
64.40
Total
500
38.20
809
61.80
1309
Exhibit 3
Logistic Fit of Survived By Age
Whole Model Test
Model
-LogLikelihood
DF
ChiSquare
Prob>ChiSq
Difference
1.61887
1
3.237748
0.0720
Full
705.69135
Reduced
707.31022
RSquare (U)
0.0023
AICc
1415.39
BIC
1425.29
Observations (or Sum Wgts)
1046
Parameter Estimates
Term
Estimate
Std Error
ChiSquare
Prob>ChiSq
Intercept
-0.1365312
0.1447153
0.89
0.3455
Age
-0.0078986
0.0044065
3.21
0.0731
For log odds of Yes/No
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Logistic Fit of Survived By Parents and Children
Whole Model Test
Model
-LogLikelihood
DF
ChiSquare
Prob>ChiSq
Difference
4.38025
1
8.760493
0.0031*
Full
866.13195
Reduced
870.51219
RSquare (U)
0.0050
AICc
1736.27
BIC
1746.62
Observations (or Sum Wgts)
1309
Parameter Estimates
Term
Estimate
Std Error
ChiSquare
Prob>ChiSq
Intercept
-0.5575286
0.0628931
78.58
<.0001*
Parents and Children
0.19317763
0.0662517
8.50
0.0035*
For log odds of Yes/No
b.
What is meant by Whole Effects?
Enters only whole effects when terms involving that effect are significant. This rule applies only when categorical variables with more than two levels are entered as possible model effects. Which means if independent variables represented by multiple dummy variables, includes one of these dummy variables in the model, then all of the dummy variables corresponding to that independent variable must also be included.
c.
Recreate the full logistic regression model shown in the case study, and match Exhibit 9. Using the model parameter estimates, determine the odds ratios for surviving for passengers in 1st class vs 2nd class, 2nd class vs 3rd class, and female vs male. (For instance, the parameter estimates for Passenger Class[3-2] is -0.943, indicating that the odds of surviving in third class were e^(-0.943) = 0.389 times lower than in second class.) Show your work. You may check your answers by selecting Odds Ratios from the red triangle in the Nominal Logistic Fit window. Note: this panel also displays 95% confidence intervals for the odds ratios, which could be useful information.
Nominal Logistic Fit for Survived
Effect Summary
Source
Logworth
PValue
Sex
62.298
0.00000
Passenger Class
18.516
0.00000
Age
8.502
0.00000
Port
3.289
0.00051
Siblings and Spouses
3.123
0.00075
Converged in Gradient, 5 iterations
Whole Model Test
Model
-LogLikelihood
DF
ChiSquare
Prob>ChiSq
Difference
228.04248
7
456.085
<.0001*
Full
477.47308
Reduced
705.51556
RSquare (U)
0.3232
AICc
971.085
BIC
1010.55
Observations (or Sum Wgts)
1044
Lack Of Fit
Source
DF
-LogLikelihood
ChiSquare
Lack Of Fit
629
319.90503
639.8101
Saturated
636
157.56805
Prob>ChiSq
Fitted
7
477.47308
0.3738
Parameter Estimates
Term
Estimate
Std Error
ChiSquare
Prob>ChiSq
Intercept
2.28855656
0.3502206
42.70
<.0001*
Passenger Class[2-1]
-1.1270844
0.2437888
21.37
<.0001*
Passenger Class[3-2]
-0.9437064
0.2028802
21.64
<.0001*
Sex[female]
1.31659442
0.088189
222.88
<.0001*
Age
-0.0383599
0.0067077
32.70
<.0001*
Siblings and Spouses
-0.3323671
0.1030565
10.40
0.0013*
Port[C]
0.71326236
0.1882365
14.36
0.0002*
Port[Q]
-0.7578514
0.2757081
7.56
0.0060*
For log odds of Yes/No
Effect Likelihood Ratio Tests
Source
Nparm
DF
L-R ChiSquare
Prob>ChiSq
Passenger Class
2
2
85.2709275
<.0001*
Sex
1
1
280.796781
<.0001*
Age
1
1
35.0884611
<.0001*
Siblings and Spouses
1
1
11.3524512
0.0008*
Port
2
2
15.1470683
0.0005*
Odds Ratios
For Survived odds of Yes versus No
Unit Odds Ratios
Per unit change in regressor
Term
Odds Ratio
Lower 95%
Upper 95%
Reciprocal
Age
0.962367
0.949797
0.975102
1.0391051
Siblings and Spouses
0.717224
0.586048
0.877762
1.3942646
Range Odds Ratios
Per change in regressor over entire range
Term
Odds Ratio
Lower 95%
Upper 95%
Reciprocal
Age
0.046776
0.016376
0.133609
21.378697
Siblings and Spouses
0.070023
0.013914
0.352382
14.281101
Odds Ratios for Passenger Class
95% Confidence Interval (Wald)
Level1
/Level2
Odds Ratio
Prob>Chisq
Lower
Upper
2
1
0.3239765
<.0001*
0.2009093
0.5224285
3
1
0.126086
<.0001*
0.0789521
0.2013588
3
2
0.3891827
<.0001*
0.2614939
0.5792225
1
2
3.0866439
<.0001*
1.9141374
4.9773702
1
3
7.9310922
<.0001*
4.9662599
12.665915
2
3
2.5694873
<.0001*
1.7264521
3.8241806
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Odds Ratios for Sex
95% Confidence Interval (Wald)
Level1
/Level2
Odds Ratio
Prob>Chisq
Lower
Upper
male
female
0.071849
<.0001*
0.0508496
0.1015205
female
male
13.918082
<.0001*
9.8502239
19.665848
Odds Ratios for Port
95% Confidence Interval (Wald)
Level1
/Level2
Odds Ratio
Prob>Chisq
Lower
Upper
Q
C
0.2296696
0.0009*
0.09608
0.549002
S
C
0.5123879
0.0017*
0.3377008
0.7774377
S
Q
2.2309788
0.0496*
1.001501
4.9698066
C
Q
4.3540818
0.0009*
1.8214869
10.407996
C
S
1.9516464
0.0017*
1.2862766
2.9612011
Q
S
0.4482338
0.0496*
0.2012151
0.9985012
Normal approximations used for ratio confidence limits effects: Passenger Class Sex Port Tests and confidence intervals on odds ratios are Wald based.
Confusion Matrix
Training
Actual
Predicted Count
Survived
Yes
No
Yes
297
128
No
93
526
Actual
Predicted Rate
Survived
Yes
No
Yes
0.699
0.301
No
0.150
0.850
Prediction Profiler
Lin[Yes] = 2.28855656033793 + Match(Passenger Class)
(
1 ⇒ 0 2 ⇒ −1.12708438546824
3 ⇒ −2.07079075499579
???? ⇒ . )
+ Match(Sex) (
"??????" ⇒ 1.31659442170986
"????" ⇒ −1.31659442170986
???? ⇒ . )
+ (−0.0383598691705702 ∗ Age)
+ (−0.332367131703243 ∗ Siblings and Spouses)
+ ????ℎ (????)
(
"C" ⇒ 0.713262356231326
"?" ⇒ −0.757851385734741
"𝑆" ⇒ 0.044589029503414
???? ⇒ . )
????[𝑌??] =
1
(1 + 𝐸??(−?𝑖?[𝑌??]))
????[??] =
1
(1 + 𝐸??(?𝑖?[𝑌??]))
???? 𝑆???𝑖??? = ????? (
????[𝑌??] ⇒ 1
????[??] ⇒ 0
???? ⇒ .
)
Odds Ratio for Passenger Class:
For log odds 1st class vs. 2nd class:
Parameter estimate for Passenger Class [2-1] = −1.12708438546824
?
−1.12708438546824
= 0.323976471
Log odds for Passenger Class [2-1] =
?. ?????????
Inverse the equation to get log Odds for Passenger Class[1-2],
??????? =
1
0.323976471
= 3.086643906
So, log odds for Passenger Class [1-2] = ?. ?????????
For log odds 2nd class vs. 3rd class:
For Passenger Class [3-2],
Parameter Estimate =
−2.070790755 − (−1.127084385) = −0.94370637
?
−0.94370637
= 0.389182704
Log odds for Passenger Class [3-2] = ?. ?????????
Similarly, Inverse the above equation to get the log odds for passenger class [2-3]
??????? =
1
0.389182704
= 2.569487261
So, log odds for Passenger Class [2-3] = ?. ?????????
When examining the Odds ratios for Passenger class [1-2] and Passenger class [2-3], it becomes evident that passengers in class 1 are nearly three times more likely to survive than those in class 2. Subsequently, passengers in class 2 have approximately 2.5 times greater odds of survival compared to passengers in class 3.
Odds Ratio for Sex:
For log odds Female vs. Male:
Parameter estimate for female vs male = 1.316594422 − (−1.316594422)
= 2.633188843
?
2.633188843
= 13.9180818
Log odds for Sex [female-male] =
??. ???????
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Inverse the equation to get log Odds for Sex [male - female],
??????? =
1
13.9180818
= 0.071848981
So, log odds for Sex [male - female] = ?. ?????????
Upon reviewing the Odds ratios for Sex [female - male], it is evident that females have nearly 14 times higher odds of survival compared to males.