Midterm 2 Practice Test Solutions

pdf

School

Brooklyn College, CUNY *

*We aren’t endorsed by this school

Course

200

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by DoctorKnowledge11847

STAT207 Midterm 2 Lab Section Start Time: ________________________________ Last Name: ________________________________ First Name :_____________________________ Academic Integrity I hereby state that I have not communicated with or gained information in any way from my classmates during this exam, and that all work is my own. I am aware of the course academic dishonesty policies written in the syllabus, which indicates that evidence of cheating on the exam results in an automatic F in the course. Signature: ___________________________________________ Test Instructions 1. You have 80 minutes to complete this exam. 2. Show all your work on the open-ended exam questions in order to get full credit. No credit will be given for open ended questions where no work is shown, even if the answer is correct. 3. On this exam you are allowed: a. A calculator b. A cheatsheet with notes on one-side of a 8.5” by 11” sheet of paper. i. Must be handwritten ii. In your own handwriting 4. You are not allowed to use a cellphone, even if you intend to use it as a calculator or to check the time. 5. If you are completely stumped, write as much as you can about what you do know about what the problem might involve.

Part 1 – Linear Regression Application Basic Dataset Information In the first part of this exam, we will explore and conduct analyses on the following dataset which is a random sample of 400 U.S. counties. This dataset contains the following information about each county. • Poverty rate • Homeownership rate • Percent of housing units in multi-unit structures • Unemployment_rate • Metro: whether the county contains a metropolitan area (yes, no) • Median_edu: median education level (hs_diploma, some_college, bachelors, below_hs) • Per_capita_income • Median_hh_income: median household income More Dataset Information This dataset has no missing values. Main Research Goal The main research goal that we will pursue in this exam will be to build a predictive model that effectively predicts one of our selected numerical variables for other U.S. counties not in this dataset given some combination of the remaining variables in the dataset (not name or state). Secondary Research Goal Ideally , the model that we select would also be interpretable. Specifically, we would ideally like for this model to also accurately reflect the relationship that exists between the chosen explanatory variables and the response variable. Train-Test-Split We take this dataset and randomly split it into a training dataset and a test dataset. The test dataset is comprised of 20% of the observations.

1. Variable Transformations First, we’d like to build a simple linear regression that predicts poverty with median_hh_income. The plot below shows the relationship between these two variables. 1.1. Linearity Assumption We then fit the following three linear regression models that involve these two variables. A. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � = 35.25 − 0.0004 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 B. log ( 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ) � = 3.95 − 0.00002498 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 C. 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 2 � = 997.13 − 0.0136 𝑚𝑚𝑝𝑝𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 _ ℎℎ _ 𝑚𝑚𝑚𝑚𝑖𝑖𝑝𝑝𝑚𝑚𝑝𝑝 Match the model to the most likely corresponding fitted values vs. residuals plot. Explanation not needed, but may help with partial credit if you are wrong.

Your preview ends here