HW1_Starter Template_R (Summer 24, 5.20 update)

Rmd

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6414

Subject

Statistics

Date

Jun 2, 2024

Type

Rmd

Pages

Uploaded by GrandRat2927

--- title: "HW1 Peer Assessment" output: html_document: df_print: paged --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Part A. Variables In the field of psychology, much research is done using self- report surveys using Likert scales (look it up!). ### A1 __What type of variable is a Likert response?__ (1 pt) ### A2 __What are some (at least 2) benefits of using Likert scales?__ (2 pts) ### A3 __What are some drawbacks of using them? Make sure you mention at least one 'drawback' and one 'danger' (a 'drawback' is a shortcoming, while a 'danger' implies potential harm).__ (2 pts) # Part B. Simple Linear Regression Perform linear regressions on a dataset from a European Toyota car dealer on the sales records of used cars (Toyota Corolla). We would like to construct a reasonable linear regression model for the relationship between the sales prices of used cars and various explanatory variables (such as age, mileage, horsepower). We are interested to see what factors affect the sales price of a used car and by how much. Data Description *Id* - ID number of each used car *Model* - Model name of each used car *Price* - The price (in Euros) at which each used car was sold *Age* - Age (in months) of each used car as of August 2004 *KM* - Accumulated kilometers on odometer

*HP* - Horsepower *Metallic* - Metallic color? (Yes = 1, No = 0) *Automatic* - Automatic transmission? ( Yes = 1, No = 0) *CC* - Cylinder volume (in cubic centimeters) *Doors* - Number of doors *Gears* - Number of gears *Weight* - Weight (in kilograms) The data is in the file "UsedCars.csv". To read the data in `R`, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the `R` function `read.csv()`. Read data and show few rows of data. ```{r} # Read in the data data = read.csv("UsedCars.csv",sep = ",",header = TRUE) # Show the first few rows of data head(data, 3) ``` ## Question B1: Exploratory Data Analysis a. **3 pts** Use a scatter plot to describe the relationship between Price and the Accumulated kilometers on odometer. Describe the general trend (direction and form). Include plots and R-code used. ```{r} # Your code here... ``` b. **3 pts** What is the value of the correlation coefficient between *Price* and *KM*? Please interpret the strength of the correlation based on the correlation coefficient. ```{r} # Your code here... ```

c. **2 pts** Based on this exploratory analysis, would you recommend a simple linear regression model for the relationship? d. **1 pts** Based on the analysis above, would you pursue a transformation of the data? *Do not transform the data.* ## Question B2: Fitting the Simple Linear Regression Model Fit a linear regression model, named *model_1*, to evaluate the relationship between UsedCars Price and the accumulated KM. *Do not transform the data.* The function you should use in R is: ```{r} # Your code here... ``` a. **3 pts** What are the model parameters and what are their estimates? b. **2 pts** Write down the estimated simple linear regression equation. c. **2 pts** Interpret the estimated value of the $\beta_1$ parameter in the context of the problem. d. **2 pts** Find a 95% confidence interval for the $\beta_1$ parameter. Is $\beta_1$ statistically significant at this level? ```{r} # Your code here... ``` e. **2 pts** Is $\beta_1$ statistically significantly negative at an $\alpha$-level of 0.01? What is the approximate p-value of this test? ```{r} # Your code here... ```

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

## Question B3: Checking the Assumptions of the Model Create and interpret the following graphs with respect to the assumptions of the linear regression model. In other words, comment on whether there are any apparent departures from the assumptions of the linear regression model. Make sure that you state the model assumptions and assess each one. Each graph may be used to assess one or more model assumptions. a. **3 pts** Scatterplot of the data with *KM* on the x-axis and *Price* on the y-axis. Make sure you include a line showing the overall trend of the scatterplot ```{r} # Your code here... ``` b. **4 pts** Residual plot - a plot of the residuals, $\hat\ epsilon_i$, versus the fitted values, $\hat{y}_i$. Make sure you include a line showing the ideal baseline (hint: residual = 0) that serves as the comparison ```{r} # Your code here... ``` c. **4 pts** Histogram and q-q plot of the residuals. Make sure you include a line in the q-q showing the ideal baseline that serves as the comparison in a q-q plot ```{r} # Your code here... ``` ## Question B4: Prediction Use the results from both model_1 to discuss the effects of KM on the dependent variable: Holding everything else equal, how much the sales price would decrease if a car accumulated 10,000 more kilometers? What observations can you make about the result in

the context of the problem? (3 pts) ```{r} # Your code here... ``` # Part C. Experiment! You work for the National Park Service (NPS), and you absolutely love bears. Describe an imaginary (it can be realistic) scenario in which you get to run a one-way ANOVA on a few (3+) species of bears. ### Part C1 __What are you comparing (name the variable!)? What do you hope to learn from ANOVA?__ (2 pts) ### Part C2 __Imagine that the results are "mixed", meaning you can draw some conclusions and not others. Describe your conclusions and make sure you detail, with reference to your ANOVA, why the results were "mixed."__ (3 pts) ### Part C3 __Now imagine that you have just been granted 3 months and $50,000 to continue this study (you're a great grant writer and a very likable member of the NPS!). Describe some next steps you would take to clarify, reinforce and/or further explore your nascent investigation. You MUST reference using a 'controlling' variable somehow in your response.__ (5 pts) ## Part D. Explain the meaning of a p-value! __Explain in detail what it means specifically for any result to be "statistically significant" at a particular -level. In other words, explain the meaning and use of p-values. You should research this question, and you should expect your answer to be at least a paragraph long.__ (6 pts)