Assignment 2

pdf

School

University of Michigan, Dearborn *

*We aren’t endorsed by this school

Course

DS 633

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

13

Uploaded by monikagautam93

Report
Assignment 2 1. Odds are defined: Odds = p/(1 – p). Use this relationship to show that p = Odds/(1 + Odds) = 1/(1 + Odds-1). For your solution, do not assume any specific functional form p = f(X) (so no x’s should appear in your solution). In particular, do not assume the form of the logistic function 1/(1 + e^-f(X)) as we do for logistic regression. (Hint: What does 1 + Odds equal?). Given, ???? = ? 1 − ? To prove: ? = ???? 1 + ???? = 1 (1 + ???? −1 ) Consider R.H.S, 1 (1 + ???? −1 ) = ???? ∗ 1 ????(1 + ???? −1 ) [ ????𝑖??? ????????? & ?????𝑖????? ?? ????] = ???? (???? + 1) = ???? (1 + ????) ???? 1 + ???? = ? 1 − ? 1 + ? 1 − ? [?𝑖??? ???? = ? 1 − ? ] = ? 1 − ? 1 − ? + ? 1 − ? = ? [ ????? ??????]
2. Consider the Card Study data highlighted at the end of the Week 2. Use the file “card_study.jmp” rather than the Excel file. Note that the jmp file has value labels, a target level, and correct nominal encoding already (whereas importing from Excel would assume continuous variables by default). Here is the scatter plot: For this data, determine the two thresholds in terms of Purchases that separate the Upgraded and Not Upgraded regions, for both Extra Cards = No (0) and Extra Cards = Yes (1). For the thresholds, use the cutoff propensity 0.5. (Hint: See the annotated Week 2 slides.) Show your calculations. (You may use Excel, and it may be easy to copy and paste the coefficients from JMP (click on the plus sign to reveal the formula after saving the prediction formulas to the Data Table) to Excel. If so, go ahead and submit your Excel file too.) Finally, list the 6 misclassifications (Purchases, Extra Cards). (Note that these misclassifications are in the Training data, as there are not Validation/Test data for this problem.) ?𝑖?[𝑌??] = (−5.55267123468932) + 0.139468534472191 ∗ ????ℎ???? + ????ℎ(𝐸???? 𝐶????) ( 0 ⇒ −1.38716758819077 1 ⇒ 1.38716758819077 ???? ⇒ . ) ????[𝑌??] = 1 (1 + 𝐸??(−?𝑖?[𝑌??]))
????[??] = 1 (1 + 𝐸??(?𝑖?[𝑌??])) ???? 𝑈???????? = ????? ( ????[𝑌??] ⇒ 1 ????[??] ⇒ 0 ???? ⇒ . ) Extra Cards at 0 and determine the Purchases value at which the Log Odds is 0. P=0.5 ln ( ? 1 − ? ) = ? 0 + ? 1 ∗ ????ℎ???? ln ( 0.5 1 − 0.5 ) = (−5.55267123468932) + 0.139468534472191 ∗ ????ℎ???? − 1.38716758819077 0 = −6.93983881 + 0.139468534472191 ∗ ????ℎ???? ????ℎ???? = 6.93983881 0.139468534472191 𝑷𝒖?𝒄𝒉𝒂?𝒆? = ??. ??????? When Extra Card =1 ln ( ? 1 − ? ) = ? 0 + ? 1 ∗ ????ℎ???? ln ( 0.5 1 − 0.5 ) = (−5.55267123468932) + 0.139468534472191 ∗ ????ℎ???? + 1.38716758819077 0 = −4.165503646 + 0.139468534472191 ∗ ????ℎ???? ????ℎ???? = 4.165503646 0.139468534472191 𝑷𝒖?𝒄𝒉𝒂?𝒆? = ??. ???????? By seeing the result, we can conclude that the threshold decreases if there are Extra Cards.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. Read the Titanic Passengers case study. Note the errata on the Week 2 page. Use the “TitanicPassengers.jmp” dataset for this exercise. a. Recreate the analysis and graphs Exhibits 2 and 3 of the Titanic Passengers case study, and record the parameter estimates for each simple model. Exhibit 2 Passenger Class By Survived Mosaic Plot Contingency Table Tests N DF -LogLike RSquare (U) 1309 2 63.882734 0.0734 Test ChiSquare Prob>ChiSq Likelihood Ratio 127.765 <.0001* Pearson 127.859 <.0001* Count Total % Col % Row % Yes No Total 1 200 15.28 40.00 61.92 123 9.40 15.20 38.08 323 24.68 2 119 9.09 23.80 42.96 158 12.07 19.53 57.04 277 21.16 3 181 13.83 36.20 25.53 528 40.34 65.27 74.47 709 54.16 Total 500 38.20 809 61.80 1309
Sex By Survived Mosaic Plot Contingency Table Tests N DF -LogLike RSquare (U) 1309 1 186.46067 0.2142 Test ChiSquare Prob>ChiSq Likelihood Ratio 372.921 <.0001* Pearson 365.887 <.0001* Fisher's Exact Test Prob Alternative Hypothesis Left 1.0000 Prob(Survived=No) is greater for Sex=female than male Right <.0001* Prob(Survived=No) is greater for Sex=male than female 2-Tail <.0001* Prob(Survived=No) is different across Sex Count Total % Col % Row % Yes No Total female 339 25.90 67.80 72.75 127 9.70 15.70 27.25 466 35.60 male 161 12.30 32.20 19.10 682 52.10 84.30 80.90 843 64.40 Total 500 38.20 809 61.80 1309
Exhibit 3 Logistic Fit of Survived By Age Whole Model Test Model -LogLikelihood DF ChiSquare Prob>ChiSq Difference 1.61887 1 3.237748 0.0720 Full 705.69135 Reduced 707.31022 RSquare (U) 0.0023 AICc 1415.39 BIC 1425.29 Observations (or Sum Wgts) 1046 Parameter Estimates Term Estimate Std Error ChiSquare Prob>ChiSq Intercept -0.1365312 0.1447153 0.89 0.3455 Age -0.0078986 0.0044065 3.21 0.0731 For log odds of Yes/No
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Logistic Fit of Survived By Parents and Children Whole Model Test Model -LogLikelihood DF ChiSquare Prob>ChiSq Difference 4.38025 1 8.760493 0.0031* Full 866.13195 Reduced 870.51219 RSquare (U) 0.0050 AICc 1736.27 BIC 1746.62 Observations (or Sum Wgts) 1309 Parameter Estimates Term Estimate Std Error ChiSquare Prob>ChiSq Intercept -0.5575286 0.0628931 78.58 <.0001* Parents and Children 0.19317763 0.0662517 8.50 0.0035* For log odds of Yes/No
b. What is meant by Whole Effects? Enters only whole effects when terms involving that effect are significant. This rule applies only when categorical variables with more than two levels are entered as possible model effects. Which means if independent variables represented by multiple dummy variables, includes one of these dummy variables in the model, then all of the dummy variables corresponding to that independent variable must also be included. c. Recreate the full logistic regression model shown in the case study, and match Exhibit 9. Using the model parameter estimates, determine the odds ratios for surviving for passengers in 1st class vs 2nd class, 2nd class vs 3rd class, and female vs male. (For instance, the parameter estimates for Passenger Class[3-2] is -0.943, indicating that the odds of surviving in third class were e^(-0.943) = 0.389 times lower than in second class.) Show your work. You may check your answers by selecting Odds Ratios from the red triangle in the Nominal Logistic Fit window. Note: this panel also displays 95% confidence intervals for the odds ratios, which could be useful information. Nominal Logistic Fit for Survived Effect Summary Source Logworth PValue Sex 62.298 0.00000 Passenger Class 18.516 0.00000 Age 8.502 0.00000 Port 3.289 0.00051 Siblings and Spouses 3.123 0.00075 Converged in Gradient, 5 iterations Whole Model Test Model -LogLikelihood DF ChiSquare Prob>ChiSq Difference 228.04248 7 456.085 <.0001* Full 477.47308 Reduced 705.51556 RSquare (U) 0.3232 AICc 971.085 BIC 1010.55 Observations (or Sum Wgts) 1044 Lack Of Fit Source DF -LogLikelihood ChiSquare Lack Of Fit 629 319.90503 639.8101 Saturated 636 157.56805 Prob>ChiSq Fitted 7 477.47308 0.3738
Parameter Estimates Term Estimate Std Error ChiSquare Prob>ChiSq Intercept 2.28855656 0.3502206 42.70 <.0001* Passenger Class[2-1] -1.1270844 0.2437888 21.37 <.0001* Passenger Class[3-2] -0.9437064 0.2028802 21.64 <.0001* Sex[female] 1.31659442 0.088189 222.88 <.0001* Age -0.0383599 0.0067077 32.70 <.0001* Siblings and Spouses -0.3323671 0.1030565 10.40 0.0013* Port[C] 0.71326236 0.1882365 14.36 0.0002* Port[Q] -0.7578514 0.2757081 7.56 0.0060* For log odds of Yes/No Effect Likelihood Ratio Tests Source Nparm DF L-R ChiSquare Prob>ChiSq Passenger Class 2 2 85.2709275 <.0001* Sex 1 1 280.796781 <.0001* Age 1 1 35.0884611 <.0001* Siblings and Spouses 1 1 11.3524512 0.0008* Port 2 2 15.1470683 0.0005* Odds Ratios For Survived odds of Yes versus No Unit Odds Ratios Per unit change in regressor Term Odds Ratio Lower 95% Upper 95% Reciprocal Age 0.962367 0.949797 0.975102 1.0391051 Siblings and Spouses 0.717224 0.586048 0.877762 1.3942646 Range Odds Ratios Per change in regressor over entire range Term Odds Ratio Lower 95% Upper 95% Reciprocal Age 0.046776 0.016376 0.133609 21.378697 Siblings and Spouses 0.070023 0.013914 0.352382 14.281101 Odds Ratios for Passenger Class 95% Confidence Interval (Wald) Level1 /Level2 Odds Ratio Prob>Chisq Lower Upper 2 1 0.3239765 <.0001* 0.2009093 0.5224285 3 1 0.126086 <.0001* 0.0789521 0.2013588 3 2 0.3891827 <.0001* 0.2614939 0.5792225 1 2 3.0866439 <.0001* 1.9141374 4.9773702 1 3 7.9310922 <.0001* 4.9662599 12.665915 2 3 2.5694873 <.0001* 1.7264521 3.8241806
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Odds Ratios for Sex 95% Confidence Interval (Wald) Level1 /Level2 Odds Ratio Prob>Chisq Lower Upper male female 0.071849 <.0001* 0.0508496 0.1015205 female male 13.918082 <.0001* 9.8502239 19.665848 Odds Ratios for Port 95% Confidence Interval (Wald) Level1 /Level2 Odds Ratio Prob>Chisq Lower Upper Q C 0.2296696 0.0009* 0.09608 0.549002 S C 0.5123879 0.0017* 0.3377008 0.7774377 S Q 2.2309788 0.0496* 1.001501 4.9698066 C Q 4.3540818 0.0009* 1.8214869 10.407996 C S 1.9516464 0.0017* 1.2862766 2.9612011 Q S 0.4482338 0.0496* 0.2012151 0.9985012 Normal approximations used for ratio confidence limits effects: Passenger Class Sex Port Tests and confidence intervals on odds ratios are Wald based. Confusion Matrix Training Actual Predicted Count Survived Yes No Yes 297 128 No 93 526 Actual Predicted Rate Survived Yes No Yes 0.699 0.301 No 0.150 0.850 Prediction Profiler
Lin[Yes] = 2.28855656033793 + Match(Passenger Class) ( 1 ⇒ 0 2 ⇒ −1.12708438546824 3 ⇒ −2.07079075499579 ???? ⇒ . ) + Match(Sex) ( "??????" ⇒ 1.31659442170986 "????" ⇒ −1.31659442170986 ???? ⇒ . ) + (−0.0383598691705702 ∗ Age) + (−0.332367131703243 ∗ Siblings and Spouses) + ????ℎ (????) ( "C" ⇒ 0.713262356231326 "?" ⇒ −0.757851385734741 "𝑆" ⇒ 0.044589029503414 ???? ⇒ . ) ????[𝑌??] = 1 (1 + 𝐸??(−?𝑖?[𝑌??])) ????[??] = 1 (1 + 𝐸??(?𝑖?[𝑌??])) ???? 𝑆???𝑖??? = ????? ( ????[𝑌??] ⇒ 1 ????[??] ⇒ 0 ???? ⇒ . ) Odds Ratio for Passenger Class: For log odds 1st class vs. 2nd class: Parameter estimate for Passenger Class [2-1] = −1.12708438546824 ? −1.12708438546824 = 0.323976471 Log odds for Passenger Class [2-1] = ?. ????????? Inverse the equation to get log Odds for Passenger Class[1-2], ??????? = 1 0.323976471 = 3.086643906 So, log odds for Passenger Class [1-2] = ?. ?????????
For log odds 2nd class vs. 3rd class: For Passenger Class [3-2], Parameter Estimate = −2.070790755 − (−1.127084385) = −0.94370637 ? −0.94370637 = 0.389182704 Log odds for Passenger Class [3-2] = ?. ????????? Similarly, Inverse the above equation to get the log odds for passenger class [2-3] ??????? = 1 0.389182704 = 2.569487261 So, log odds for Passenger Class [2-3] = ?. ????????? When examining the Odds ratios for Passenger class [1-2] and Passenger class [2-3], it becomes evident that passengers in class 1 are nearly three times more likely to survive than those in class 2. Subsequently, passengers in class 2 have approximately 2.5 times greater odds of survival compared to passengers in class 3. Odds Ratio for Sex: For log odds Female vs. Male: Parameter estimate for female vs male = 1.316594422 − (−1.316594422) = 2.633188843 ? 2.633188843 = 13.9180818 Log odds for Sex [female-male] = ??. ???????
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Inverse the equation to get log Odds for Sex [male - female], ??????? = 1 13.9180818 = 0.071848981 So, log odds for Sex [male - female] = ?. ????????? Upon reviewing the Odds ratios for Sex [female - male], it is evident that females have nearly 14 times higher odds of survival compared to males.