Midterm 2 Practice Test Solutions

pdf

School

Brooklyn College, CUNY *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

19

Uploaded by DoctorKnowledge11847

Report
STAT207 Midterm 2 Lab Section Start Time: ________________________________ Last Name: ________________________________ First Name :_____________________________ Academic Integrity I hereby state that I have not communicated with or gained information in any way from my classmates during this exam, and that all work is my own. I am aware of the course academic dishonesty policies written in the syllabus, which indicates that evidence of cheating on the exam results in an automatic F in the course. Signature: ___________________________________________ Test Instructions 1. You have 80 minutes to complete this exam. 2. Show all your work on the open-ended exam questions in order to get full credit. No credit will be given for open ended questions where no work is shown, even if the answer is correct. 3. On this exam you are allowed: a. A calculator b. A cheatsheet with notes on one-side of a 8.5” by 11” sheet of paper. i. Must be handwritten ii. In your own handwriting 4. You are not allowed to use a cellphone, even if you intend to use it as a calculator or to check the time. 5. If you are completely stumped, write as much as you can about what you do know about what the problem might involve.
Part 1 – Linear Regression Application Basic Dataset Information In the first part of this exam, we will explore and conduct analyses on the following dataset which is a random sample of 400 U.S. counties. This dataset contains the following information about each county. Poverty rate Homeownership rate Percent of housing units in multi-unit structures Unemployment_rate Metro: whether the county contains a metropolitan area (yes, no) Median_edu: median education level (hs_diploma, some_college, bachelors, below_hs) Per_capita_income Median_hh_income: median household income More Dataset Information This dataset has no missing values. Main Research Goal The main research goal that we will pursue in this exam will be to build a predictive model that effectively predicts one of our selected numerical variables for other U.S. counties not in this dataset given some combination of the remaining variables in the dataset (not name or state). Secondary Research Goal Ideally , the model that we select would also be interpretable. Specifically, we would ideally like for this model to also accurately reflect the relationship that exists between the chosen explanatory variables and the response variable. Train-Test-Split We take this dataset and randomly split it into a training dataset and a test dataset. The test dataset is comprised of 20% of the observations.
1. Variable Transformations First, we’d like to build a simple linear regression that predicts poverty with median_hh_income. The plot below shows the relationship between these two variables. 1.1. Linearity Assumption We then fit the following three linear regression models that involve these two variables. A. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 35.25 0.0004 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 B. log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 C. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 = 997.13 0.0136 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 Match the model to the most likely corresponding fitted values vs. residuals plot. Explanation not needed, but may help with partial credit if you are wrong.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A-3 B-2 C-1 Explanation: The scatterplot to the right indicates that there is not a linear relationship between median_hh_income and poverty in this dataset. Thus, we would not expect the non-transformed linear regression model A to meet the linearity condition. Alternatively, though if we transform poverty with 𝑙𝑙𝑚𝑚 ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) , then this has the effect of “squashing” higher poverty values more than lower poverty values. This can have the effect of “straightening out” this nonlinear relationship to the one that we see on the right. Thus, we would expect the log-transformed model B to have a linearity assumption that is closer to being met (like fitted values vs. residuals plot #2). On the other hand, if we transform poverty with 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 , then the higher poverty values will be increased more than the lower poverty values. This can have the effect of magnifying the nonlinear relationship to the one that we would see on the right. Thus, we might expect the poverty^2 model C to have a linearity assumption that is the least close to being met (like fitted values vs. residuals plot #1).
1.2. Suitable Models: Which of these linear regression models is the most suitable type of model for modeling the relationship between poverty and median_hh_income? Model B : 𝑙𝑙𝑝𝑝𝑙𝑙 ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 This is because it’s linearity assumption was the closest to being met out of the three models that we tried. 1.3. Model Prediction: Use the model below to predict the poverty rate of a county with a median household income of $70,000. Log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498(70,000) = 2.2014 Don’t’ forget to exponentiate both sides to get the predicted poverty as opposed to the predicted log(poverty)! 𝑝𝑝 log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 𝑝𝑝 2 . 2014 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 9.038 percent 1.4. Model Slope Interpretation: Put the slope of this model into words. Be sure to not use misleading language! log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) = 3.95 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 If we were to increase the median household income by $1, we would expect, on average the log(poverty) to decrease by 0.00002498.
2. Slope Interpretations Next, suppose we fit the following linear regression model that predicts unemployment_rate given: Per_capita_income Median_hh_income Homeownership rate Here is some information about these four numerical variables in the training dataset. Correlations Summary Statistics
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2.1. Here is the linear regression model that is fitted with the training dataset 𝑢𝑢𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑙𝑙𝑝𝑝𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝 = 7.19 0.000037 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 − 0.0000286 𝑝𝑝𝑝𝑝𝑝𝑝 _ 𝑖𝑖𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚 _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 0. 00002496ℎ𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜ℎ𝑚𝑚𝑝𝑝 Is the following interpretation of this model reasonable? Why or why not? Interpretation : Because the 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 has the highest slope magnitude, it brings the most predictive power to the model. No. The standard deviations of our unscaled explanatory variables are not all roughly the same. Thus, it is not reasonable to infer that median_hh_income brings the most predictive power to the model because it has the highest slope magnitude. 2.2. Suppose we were to z-score scale our explanatory variables in the training dataset first and then refit the model as follows. 𝑢𝑢𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑙𝑙𝑝𝑝𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝 = 4.577 0.5247 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 − 0.205 𝑝𝑝𝑝𝑝𝑝𝑝 _ 𝑖𝑖𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚 _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 0. 0002ℎ𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜ℎ𝑚𝑚𝑝𝑝 Do you have any concerns about making the following interpretation for this model? Why or why not? Interpretation : Because the 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 has the highest slope magnitude, it brings the most predictive power to the model. Yes. On one hand, because we z-score scaled our explanatory variables, the resulting standard deviations of our scaled explanatory variables are now all the same (1). Thus, it becomes MORE REASONABLE to infer that median_hh_income brings the most predictive power to the model because it has the highest slope magnitude. HOWEVER, we need to be cautious of interpreting our slopes in general in this linear regression model, because our model has an issue with multicollinearity. At least one pair of numerical explanatory variables has a strong linear relationship (correlation >0.7) between them. Thus, it’s possible that some of our slope magnitudes may be inflated due to the multicollinearity. These inflated slopes may yield misleading interpretations about the amount of predictive power that a given explanatory variable brings to the model.
2.3. Suppose that we would like to test the performance of our model from 2.2 (that was trained with the training dataset) with the test dataset. Thus, we should also scale our test dataset as well. To the right are the means and standard deviations of the homeownership variables in the training and test dataset. How should we scale a homeownership rate of 80 for a county in the test dataset? 𝑧𝑧 − 𝑜𝑜𝑖𝑖𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑜𝑜 80 = 80 − 𝑝𝑝𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙 ℎ𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜ℎ𝑚𝑚𝑝𝑝 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚 𝑝𝑝𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙 ℎ𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜𝑚𝑚𝑝𝑝𝑝𝑝𝑜𝑜ℎ𝑚𝑚𝑝𝑝 𝑜𝑜𝑝𝑝𝑚𝑚 = 80 73.097 8.57 Even though this observation is from the test dataset, we should still z-score scale it with the means and standard deviations from the training dataset. By doing so, this makes it so that the “units” that we use to train the model as well as make predictions with the model are the same. These “units” are “how many TRAINING standard deviations an observation is away from the TRAINING mean”. When the units of a variable that are used to train the model and make predictions with the model are different, this can lead to biased model predictions.
3. Regularization Models 3.1. Model Slope Matching LASSO and Ridge Regression Suppose that we fit the following models with our scaled features matrix as well as the target array for the training dataset. Each of these models predicts unemployment rate with: median_hh_income, per_capita_income, homeownership, and multi_unit. 1. Non-Regularized Linear Regression – Model D a. This model should have no zero slopes and it’s squared slope sum should be larger than that of the two ridge regression models. 2. LASSO linear regression with 𝜆𝜆 = 0.1 – Model C a. A LASSO model is allowed to have zero slopes. There are two models with zero slopes. One has two zero slopes, the other has one zero slope. The LASSO model with the higher 𝜆𝜆 is likely to have more zero slopes. 3. LASSO linear regression with 𝜆𝜆 = 0.05 – Model E a. A LASSO model is allowed to have zero slopes. There are two models with zero slopes. One has two zero slopes, the other has one zero slope. The LASSO model with the higher 𝜆𝜆 is likely to have more zero slopes. 4. Linear ridge regression with 𝜆𝜆 = 100 – Model A a. A ridge regression model should have no zero slopes and a ridge regression model with the highest 𝜆𝜆 should have a squared slope sum that is the smallest compared to that of the other ridge regression model and the non-regularized linear regression model. 5. Linear ridge regression with 𝜆𝜆 = 10 – Model B a. A ridge regression model should have no zero slopes and a ridge regression model with the highest 𝜆𝜆 should have a squared slope sum that is the smallest compared to that of the other ridge regression model and the non-regularized linear regression model. Match the types of models (1-5 above) to each of the corresponding model slopes below (A-E).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3.2. Slope Interpretation 3.2.1. Which explanatory variable do our regularization models above suggest brings the least amount of predictive power to the model? Explain. Homeownership. It was the first to be zeroed out in our two LASSO models as we increased 𝜆𝜆 3.2.2. Which explanatory variable do our regularization models above suggest brings the second least amount of predictive power to the model? Explain. Multi_unit. It was the second to be zeroed out in our two LASSO models as we increased 𝜆𝜆 3.3. Model Slope Matching Elastic Net Suppose that we fit the following models with our scaled features matrix as well as the target array for the training dataset. 1. Elastic net linear regression with 𝜆𝜆 = 0.1 and 𝛼𝛼 ( 𝑙𝑙 1 𝑝𝑝𝑟𝑟𝑝𝑝𝑟𝑟𝑝𝑝 ) = 0.9 Model B a. An elastic net model with a higher 𝛼𝛼 will have solutions that resemble a LASSO model more. Ie. it will be more likely to have zeroed slopes. There are two models with zeroed slopes. One has two zero slopes, the other has one zero slope. The elastic net model with the higher 𝜆𝜆 is likely to have more zero slopes. 2. Elastic net linear regression with 𝜆𝜆 = 0.1 and 𝛼𝛼 ( 𝑙𝑙 1 𝑝𝑝𝑟𝑟𝑝𝑝𝑟𝑟𝑝𝑝 ) = 0.1 Model C a. An elastic net model with a lower 𝛼𝛼 will have solutions that resemble a ridge regression model more. Thus these models are less likely to have zero slopes. There are two models with all non-zero slopes. The model with the smaller squared slope sum would correspond to the model with the higher value of 𝜆𝜆 . 3. Elastic net linear regression with 𝜆𝜆 = 0.05 and 𝛼𝛼 ( 𝑙𝑙 1 𝑝𝑝𝑟𝑟𝑝𝑝𝑟𝑟𝑝𝑝 ) = 0.9 Model D a. An elastic net model with a higher 𝛼𝛼 will have solutions that resemble a LASSO model more. Ie. it will be more likely to have zeroed slopes. There are two models with zeroed slopes. One has two zero slopes, the other has one zero slope. The elastic net model with the higher 𝜆𝜆 is likely to have more zero slopes. 4. Elastic net linear regression with 𝜆𝜆 = 0.05 and 𝛼𝛼 ( 𝑙𝑙 1 𝑝𝑝𝑟𝑟𝑝𝑝𝑟𝑟𝑝𝑝 ) = 0.1 Model A a. An elastic net model with a lower 𝛼𝛼 will have solutions that resemble a ridge regression model more. Thus these models are less likely to have zero slopes. There are two models with all non-zero slopes. The model with the smaller squared slope sum would correspond to the model with the higher value of 𝜆𝜆 . Match the types of models (1-4 above) to each of the corresponding model slopes below (A-D).
Part 2 – Logistic Regression Application Basic Dataset Information In the second part of this exam, we will explore and conduct analyses on the same dataset as in Part 1. Main Research Goal The main research goal that we will pursue in this exam will be to build a predictive model that effectively predicts whether or not a county has a metropolitan area for other U.S. counties not in this dataset given some combination of the remaining variables in the dataset (not name or state). More Dataset Information We also create a 0/1 response variable in which metropolitan counties = 1 and non-metropolitan counties = 0. Train-Test-Split We take this dataset and randomly split it into a training dataset and a test dataset. The test dataset is comprised of 20% of the observations.
4. Fitting a Logistic Regression Model We fit a logistic regression model that predicts the probability that a county has a metropolitan area given the following explanatory variables: Median_education level: (some_college, hs_diploma, bachelors , below_hs) Unemployment rate 4.1. Use the summary output table above to write out the logistic regression equation that predicts the probability that a county has a metropolitan county. Make sure to use the appropriate notation seen in class. 𝑝𝑝̂ = 1 1 + 𝑝𝑝 ( 1 . 6703−19 . 1483𝑚𝑚𝑝𝑝𝑚𝑚𝑟𝑟𝑟𝑟𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑒𝑒 [ 𝑏𝑏𝑝𝑝𝑏𝑏𝑝𝑝𝑏𝑏 _ ℎ𝑠𝑠 ] −2 . 655𝑚𝑚𝑝𝑝𝑚𝑚𝑟𝑟𝑟𝑟𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑒𝑒 [ ℎ𝑠𝑠 _ 𝑚𝑚𝑟𝑟𝑝𝑝𝑏𝑏𝑝𝑝𝑚𝑚𝑟𝑟 ] −1 . 4544𝑚𝑚𝑝𝑝𝑚𝑚𝑟𝑟𝑟𝑟𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑒𝑒 [ 𝑠𝑠𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑐𝑐𝑝𝑝𝑏𝑏𝑏𝑏𝑝𝑝𝑐𝑐𝑝𝑝 ] −0 . 0169𝑒𝑒𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑏𝑏𝑝𝑝𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑝𝑝𝑟𝑟𝑝𝑝𝑝𝑝 )) 4.2. Use the summary output table above to write out the logistic regression equation that predicts the odds that a county has a metropolitan county. Make sure to use the appropriate notation seen in class. 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 = 𝑝𝑝̂ 1 − 𝑝𝑝̂ = 𝑝𝑝 ( 1 . 6703−19 . 1483𝑚𝑚𝑝𝑝𝑚𝑚𝑟𝑟𝑟𝑟𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑒𝑒 [ 𝑏𝑏𝑝𝑝𝑏𝑏𝑝𝑝𝑏𝑏 _ ℎ𝑠𝑠 ] −2 . 655𝑚𝑚𝑝𝑝𝑚𝑚𝑟𝑟𝑟𝑟𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑒𝑒 [ ℎ𝑠𝑠 _ 𝑚𝑚𝑟𝑟𝑝𝑝𝑏𝑏𝑝𝑝𝑚𝑚𝑟𝑟 ] −1 . 4544𝑚𝑚𝑝𝑝𝑚𝑚𝑟𝑟𝑟𝑟𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑒𝑒 [ 𝑠𝑠𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑐𝑐𝑝𝑝𝑏𝑏𝑏𝑏𝑝𝑝𝑐𝑐𝑝𝑝 ] −0 . 0169𝑒𝑒𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑏𝑏𝑝𝑝𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑝𝑝𝑟𝑟𝑝𝑝𝑝𝑝 ))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4.3. Use the summary output table above to write out the logistic regression equation that predicts the log-odds that a county has a metropolitan county. Make sure to use the appropriate notation seen in class. log ( 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 ) = log ( 𝑝𝑝̂ 1 − 𝑝𝑝̂ ) = 1.6703 19.1483 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑢𝑢 [ 𝑏𝑏𝑝𝑝𝑙𝑙𝑝𝑝𝑜𝑜 _ ℎ𝑜𝑜 ] 2.655 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑢𝑢 [ ℎ𝑜𝑜 _ 𝑚𝑚𝑚𝑚𝑝𝑝𝑙𝑙𝑝𝑝𝑚𝑚𝑚𝑚 ] 1.4544 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ 𝑝𝑝𝑚𝑚𝑢𝑢 [ 𝑜𝑜𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑖𝑖𝑝𝑝𝑙𝑙𝑙𝑙𝑝𝑝𝑙𝑙𝑝𝑝 ] 0.0169 𝑢𝑢𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝𝑙𝑙𝑝𝑝𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝 _ 𝑝𝑝𝑚𝑚𝑝𝑝𝑝𝑝 5. Intercept/Slope Interpretations 5.1. Calculate the base odds for this logistic regression equation and put it into words. Make sure to use non-misleading language. 𝑏𝑏𝑚𝑚𝑜𝑜𝑝𝑝 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 = 𝑝𝑝 𝛽𝛽 0 = 𝑝𝑝 1 . 6703 = 5.31 We expect the odds of a county that has a 0% unemployment rate with a median education level of “bachelors” of having metropolitan area to be 5.31. 5.2. Calculate the odds multiplier for the unemployment rate explanatory variable in this logistic regression equation and put it into words. Make sure to use non-misleading language. 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 𝑚𝑚𝑢𝑢𝑙𝑙𝑝𝑝𝑚𝑚𝑝𝑝𝑙𝑙𝑚𝑚𝑝𝑝𝑝𝑝 = 𝑝𝑝 𝛽𝛽 4 = 𝑝𝑝 −1 . 069 = 0.98 All else held equal, if we were to increase the unemployment rate of a county by 1%, then we would expect, on average for the odds that this county has a metropolitan area to decrease by a multiple of 0.98.
5.3. Calculate the odds ratio for counties with a median education of some college vs. counties with a median education of bachelors in this model. Make sure to use non-misleading language. 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 𝑝𝑝𝑚𝑚𝑝𝑝𝑚𝑚𝑝𝑝 = 𝑝𝑝 𝛽𝛽 3 = 𝑝𝑝 −1 . 4544 = 0.23 All else held equal, we would expect the ratio of the odds of a county with a median education level of some college vs. the odds of a county with a median education level of bachelor’s having a metropolitan area to be 0.23. 6. Prediction 6.1. Predict the probability that a county with an unemployment rate of 6 (percent) that has a median education level of “hs_diploma” has a metropolitan area. 𝑝𝑝̂ = 1 1 + 𝑝𝑝 ( 1 . 6703−19 . 1483 ( 0 ) −2 . 655 ( 1 ) −1 . 4544 ( 0 ) −0 . 0169 ( 6 ))) = 0.25 6.2. Predict the odds (numerical) that a county with an unemployment rate of 6 (percent) that has a median education level of “hs_diploma” has a metropolitan area. Put these numerical odds into prose format. 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 = 𝑝𝑝̂ 1 − 𝑝𝑝̂ = 0.25 1 0.25 = 0.33 𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 = 0.33 1 = 𝑖𝑖ℎ𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑜𝑜 𝑝𝑝𝑜𝑜 ℎ𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑙𝑙 𝑚𝑚 𝑚𝑚𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑚𝑚𝑝𝑝𝑝𝑝𝑚𝑚 𝑖𝑖ℎ𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑜𝑜 𝑝𝑝𝑜𝑜 𝑚𝑚𝑝𝑝𝑝𝑝 ℎ𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑙𝑙 𝑚𝑚 𝑚𝑚𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑚𝑚𝑝𝑝𝑝𝑝𝑚𝑚 The odds of this county having a metropolitan area are 0.33 to 1. The odds of this county having a metropolitan area are 1 to 3.
6.3. Predict the log - odds that a county with an unemployment rate of 6 (percent) that has a median education level of “hs_diploma” has a metropolitan area. Put these numerical odds into prose format. ln �𝑝𝑝𝑚𝑚𝑚𝑚𝑜𝑜 = ln 𝑝𝑝̂ 1 − 𝑝𝑝̂ = ln 0.25 1 0.25 = ln(0.33) = 1.1087 7. Classification Classify the county with an unemployment rate of 6 (percent) that has a median education level of “hs_diploma” is a metropolitan county as either: having or not having a metropolitan area. Use a predictive probability threshold of 𝑝𝑝̂ 0 = 0.5 to classify this observation. Because 𝑝𝑝̂ = 0.25 < 0.5 , we classify it as a 0 (ie. a county that does not have a metropolitan area). Part 3 – Conceptual Questions The questions in this section are conceptual and do not necessarily relate to the dataset discussed in part 1 and 2. 8. R^2 and Adjusted R^2 Conceptual Questions Select whether each statement below is True or False. 8.1. The adjusted R^2 of a model represents the percent of response variable variability that is explained by the model. False . This is the definition of R^2. The adjusted R^2 is simply just a metric that measures parsimoniousness of a linear regression model. 8.2. The adjusted R^2 is a measure of how parsimonious a linear regression model is. True . 8.3. By adding another explanatory variable to a linear regression model, the adjusted R^2 will never decrease. False . It will decrease if the explanatory variables does not bring “enough” predictive power to the model (according to the adjusted R^2) and thus may be contributing to overfitting of the model.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
8.4. By adding another explanatory variable to a linear regression model, the R^2 will never decrease. True 8.5. The R^2 measures the fit of a linear regression model. True 8.6. The R^2 of a model and a given dataset is always between [0,1]. False . This is true when the dataset is the one that specifically trained the linear regression model. However, when the dataset that we are calculating the R^2 for is not the training dataset, this value can be negative, indicating a very poor model fit of this dataset. 8.7. If the adjusted R^2 of a model increases when adding an explanatory variable, this suggests that this explanatory variable may be overfitting the model. False . If the adjusted R^2 increases when we add an explanatory variable to the model, then this suggests (according to the adjusted R^2) that this explanatory variable does bring enough predictive power to the model to overcome the 𝑚𝑚−1 𝑚𝑚−𝑝𝑝−1 penalty incurred by adding an additional slope(s). Thus, if it is deemed to bring “enough” predictive power to the model, then it is being suggested that it’s inclusion would not lead to overfitting. 8.8. If the adjusted R^2 of a model decreases when deleting an explanatory variable, then the adjusted R^2 suggests that this explanatory variable may not bring enough predictive power to the model. False . By including this explanatory variable, we increase the adjusted R^2. If the adjusted R^2 increases when we add an explanatory variable to the model, then this suggests (according to the adjusted R^2) that this explanatory variable does bring enough predictive power to the model to overcome the 𝑚𝑚−1 𝑚𝑚−𝑝𝑝−1 penalty incurred by adding an additional slope(s). Thus, if it is deemed to bring “enough” predictive power to the model, then it is being suggested that it’s inclusion would not lead to overfitting. 8.9. If the adjusted R^2 of a model increases when adding an explanatory variable, then the test R^2 of the model will also increase. False. This is not guaranteed. When the adjusted R^2 increases when adding an explanatory variable to the model, the adjusted R^2 is suggesting that the inclusion of this variable will not lead to overfitting.
When the test R^2 increases when adding an explanatory variable to the model, the test R^2 is suggesting that the inclusion of this variable will not lead to overfitting. However, these two techniques for gauging whether a given explanatory variable will lead to overfitting do not always agree. However, if they do agree, this can lead to additional confidence that the inclusion of that explanatory variable will generally lead to better model fits of new datasets. 9. Training and Test Dataset Conceptual Question Suppose we were to train a series of linear regression models with a randomly selected training dataset. We then calculate the test R^2 of each of these models with the test dataset. We then select the model that has the highest test R^2 in the hopes that this model will yield the best performance on new datasets. Which of the following parts of this analysis are subject to change if we had randomly selected a different training dataset and test dataset? 1. The intercept and slopes of each of our trained models. 2. The test R^2 values of each of our models. 3. Which model had the highest test R^2 and that was selected for use. ALL OF THESE THINGS 10. Cross-Validation Conceptual Questions Answer the following questions with one of the following machine learning techniques: Train-test-split method Leave-One-Out Cross-Validation (LOOCV) K-fold Cross-Validation 10.1. Which of the methods above takes the least amount of computation time to perform? Which of the methods takes the most amount of computation time? o Train-test-split o LOOCV 10.2. Which of the methods above does not have an inherent random element to it? o LOOCV
11. More Cross-Validation Conceptual Questions Suppose that we are considering two linear regression models for a given dataset. Model A o This model achieved a test R^2 = 0.75 using the train-test-split method. o This model achieved an average test R^2 = 0.65 using k=5 fold cross-validation. Model B o This model achieved a test R^2 = 0.65 using the train-test-split method. o This model achieved an average test R^2 = 0.75 using k=5 fold cross-validation. If you were to select one of these models to predict the response variable values for new datasets , which model would you choose and why? o Model B. o In k=5 cross-validation we randomly created 5 sets of training and test datasets. We fit 5 models with our 5 training datasets and test the 5 models with each of our 5 test datasets. We then calculate the average test R^2 performance of the model. o Thus model B did better on average (considering multiple test R^2 values) than model A did. o By evaluating a model based on the average of multiple test R^2 values, we are able to decrease the amount of model variability that we might see in our test R^2 results (as opposed to the train-test-split method which evaluates just a single test R^2). Using the average test R^2 may lead to a higher degree of confidence with respect to how a given model might perform for new datasets, which most likely will not look EXACTLY like the test dataset. 12. Feature Selection Conceptual Questions 12.1. True or False: A backwards elimination algorithm is guaranteed to find the combination of explanatory variables that yield the linear regression model with the highest adjusted R^2. False 12.2. True or False: If a numerical explanatory variable has a strong linear relationship with the response variable, then it will not be dropped in a backwards elimination algorithm. False. For instance, it’s possible that this numerical explanatory variable 𝑥𝑥 1 has a strong linear relationship with another explanatory variable 𝑥𝑥 2 (ie. our model has an issue with multicollinearity). In a situation like this, 𝑥𝑥 1 may get dropped by a backwards elimination algorithm because 𝑥𝑥 2 may already be making the same type of contribution to the predictive power that 𝑥𝑥 1 may have. Thus, by including 𝑥𝑥 1 in addition to 𝑥𝑥 2 , 𝑥𝑥 1 may not bring enough additional predictive power to the model and may get dropped.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
12.3. Which of our regularization techniques is most useful when it comes to determining which explanatory variables may be overfitting a model and should thus be left out? LASSO. LASSO will “zero out” slopes that are set to be small by the regularization model, whereas ridge regression will not. Slopes that have been “zeroed out” can be interpreted as those that are not bringing “enough” predictive power to the model for the given value of 𝜆𝜆 . Remember that in an elastic net model, the higher we set our 𝛼𝛼 (or l1_ratio) value, the more it looks like a LASSO model. So if you set 𝛼𝛼 (or l1_ratio) to be high, your elastic net model will also “zero out” many slopes in the model, particularly those that are not bringing “enough” predictive power to the model for the given value of 𝜆𝜆 .