ISOM835_Lab 8_Categorical Data Analysis_solution

pdf

School

Suffolk University *

*We aren’t endorsed by this school

Course

835

Subject

Economics

Date

Jan 9, 2024

Type

pdf

Pages

16

Uploaded by SuperHumanRainStarling24

Report
1 Lab 8: Classification – Contingency Table Analysis and the Logistic Regression Model - Solution ISOM 835 Predictive Analytics Sawyer Business School Dr. Kate Li ISOM This lab consists of two portions: 1. SAS Enterprise Guide Demonstration: Contingency Table Analysis and the Logistic Regression Model (30 pts) 2. Questions 1-3 (70 pts) While you work through each lab, please take notes of things that you don’t understand and/or are not sure about. I will give you time to ask questions during the following class. Demo portion process flow:
2 Complete the following questions by yourself Question 1 (30 pts): Creating Two-Way Frequency Tables and Analyzing Associations An insurance company wants to relate the safety of vehicles to several other variables. A score is given to each vehicle model, using the frequency of insurance claims as a basis. The data are in the SAFETY.csv file on X: Drive. These are the variables in the data set: Unsafe dichotomized safety score ( 1=Below Average, 0=Average or Above ) Type type of car ( Large , Medium , Small , Sport/Utility , Sports ) Region manufacturing region ( Asia , N America ) Weight weight in 1000s of pounds Size trichotomized version of Type ( 1 = Small or Sports , 2 = Medium , 3 = Large or Sport/Utility ). a. (3 pts) What is the type of each variable? If a variable is a categorical variable, specify whether it is nominal or ordinal. Variable Measurement Scale Unsafe Type Region Weight Size Answer: Variable Measurement Scale Unsafe Nominal, Ordinal, Binary Type Nominal Region Nominal Weight Numeric (continuous) Size Ordinal b. (3 pts) Create one-way frequency tables for the categorical variables. Answer:
3 Step:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 c. (2 pts) What is the proportion of cars made in North America? 63.54 % d. (2 pts) For the variables Unsafe , Size , Region , and Type , are there any unusual data values that warrant further investigation? No. e. Create a crosstabulation of the variables Region and Unsafe . Along with the default output, generate the column and row percentages, expected frequencies, the chi-square test of association, and the odds ratio. 1) (3 pts) Show your results.
5 Steps:
6 2) (2 pts) For the cars made in Asia, what percentage had a below-average safety score? Region is a row variable, so look at the Row Pct value in the Below Average cell of the Asia row. That value is 42.86. 3) (2 pts) For the cars with an average or above safety score, what percentage was made in North America? The Col Pct value for the cell for North America in the column for Average or Above is 69.70. 4) (2 pts) Do you see a statistically significant (at the 0.05 level) association between Region and Unsafe ? Why? The association is not statistically significant at the 0.05 alpha level, because the p-value of the Chi-Square statistic is 0.0631. 5) (3 pts) What does the odds ratio compare and what does this one say about the difference in odds between Asian and North American cars? The odds ratio compares the odds of Unsafe = 0 (i.e., average or above safety level, the left column in the table analysis result) of Asian cars to North American cars. The odds ratio of 0.4348 means that cars made in
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 Asia are less likely to be average or above safety level than North American cars; in fact, the odds of being safe (i.e., average or above safety level) of Asian cars are 43.48% of that of North American cars. f. Examine the ordinal association between Size and Unsafe . 1) (2 pts) Provide a screenshot of your result. Result: (You may not need to generate as many statistics as shown here.)
8
9 Steps: 2) (2 pts) What statistic should you use to detect an ordinal association between Size and Unsafe ? Show SAS result of your analysis. The statistic that we should use to detect an ordinal association between Size and Unsafe is the Mantel Haenszel Chi-Square. 3) (2 pts) Do you reject or fail to reject the null hypothesis at the 0.05 level? Reject because the p-value of the Mantel Haenszel Chi-Square. is <.0001. 4) (2 pts) What is the strength of the ordinal association between Size and Unsafe ? How do you interpret the association?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10 The Spearman correlation is -0.5425, which suggests that there is a relatively strong negative association between Size and Unsafe . Size is coded as 1 = Small or Sports , 2 = Medium , 3 = Large or Sport/Utility , therefore, it means that as size becomes larger, the probability of being unsafe becomes lower (i.e., the probability of Unsafe =1). Question 2 (20 pts): Performing a Simple Logistic Regression Analysis Fit a simple logistic regression model using the SAFETY data set with Unsafe as the outcome variable and Weight as the predictor variable. Model the probability of below-average safety scores. Request Profile Likelihood confidence limits and an odds ratio plot along with an effect plot. a. (5 pts) Show your result. Result:
11 Steps:
12 b. (5 pts) Do you reject or fail to reject the null hypothesis that all regression coefficients of the model are 0? Why? The p -value for the Likelihood Ratio test is <.0001 and therefore the global null hypothesis is rejected. c. (5 pts) Write the logistic regression equation. The regression equation is as follows: Logit(Unsafe)=3.5422+(-1.3901)*Weight. d. (5 pts) Interpret the odds ratio for Weight . The odds ratio for Weight (0.249) says that the odds for being unsafe (having a below average safety rating) are 75.1% lower for each thousand pounds that the car is heavier. The confidence interval (0.102, 0.517) does not contain 1, indicating that that the odds ratio is statistically significant. Question 3 (20 pts): Performing a Multiple Logistic Regression Analysis Including Categorical Variables
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
13 Fit a multiple logistic regression model using the SAFETY data set with Unsafe as the outcome variable and Weight , Region , and Size as the predictor variables (consider only main effects). Request reference coding for Region and Size . Model the probability of below-average safety scores. Request Profile Likelihood confidence limits and an odds ratio plot along with an effect plot. a. (5 pts) Show your results.
14 Steps:
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
16 b. (5 pts) Do you reject or fail to reject the null hypothesis that all regression coefficients of the model are 0? Why? We can reject the null hypothesis because the p value of the likelihood ratio Chi Square is <.0001. c. (5 pts) If you do reject the global null hypothesis, then which predictors significantly predict safety outcome? Only Size =1 is significantly predictive of Unsafe because its p-value is 0.0024 (< 0.05). Since we use reference coding and Size=3 is the reference level, the coefficient of Size=1 (2.6783) being positive indicates that small cars are more likely to be unsafe compared to large cars. d. (5 pts) Interpret the odds ratio for significant predictors. Only Size is significant. The design variables show that Size=1 (Small or Sports) cars have 14.560 times the odds of having a below-average safety rating compared to the reference category, 3 (Large or Sport/Utility). The 95% confidence interval (3.018, 110.732) does not contain 1, implying that the contrast is statistically significant at the 0.05 level. The contrast from the second design variable is 1.931 (Medium versus Sport/Utility), implying a trend toward greater odds of low safety for medium cars. However, the 95% confidence interval (0.343, 15.182) contains 1 and therefore the contrast is not statistically significant. ************************************************************************************* This is the end of Lab 8. Please submit the ISOM835_Lab8_ your initials .docx Word document via Blackboard.