HW1_Starter Template_R (Summer 24, 5.20 update)
Rmd
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
6414
Subject
Statistics
Date
Jun 2, 2024
Type
Rmd
Pages
5
Uploaded by GrandRat2927
---
title: "HW1 Peer Assessment"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Part A. Variables
In the field of psychology, much research is done using self-
report surveys using Likert scales (look it up!).
### A1
__What type of variable is a Likert response?__ (1 pt)
### A2
__What are some (at least 2) benefits of using Likert scales?__ (2
pts)
### A3
__What are some drawbacks of using them? Make sure you mention at
least one 'drawback' and one 'danger' (a 'drawback' is a shortcoming, while a 'danger' implies potential harm).__ (2 pts)
# Part B. Simple Linear Regression
Perform linear regressions on a dataset from a European Toyota car dealer on the sales records of used cars (Toyota Corolla). We
would like to construct a reasonable linear regression model for the relationship between the sales prices of used cars and various explanatory variables (such as age, mileage, horsepower).
We are interested to see what factors affect the sales price of a used car and by how much.
Data Description
*Id* - ID number of each used car *Model* - Model name of each used car *Price* - The price (in Euros) at which each used car was sold *Age* - Age (in months) of each used car as of August 2004 *KM* - Accumulated kilometers on odometer
*HP* - Horsepower *Metallic* - Metallic color? (Yes = 1, No = 0) *Automatic* - Automatic transmission? ( Yes = 1, No = 0) *CC* - Cylinder volume (in cubic centimeters) *Doors* - Number of doors *Gears* - Number of gears *Weight* - Weight (in kilograms)
The data is in the file "UsedCars.csv". To read the data in `R`, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the `R` function `read.csv()`.
Read data and show few rows of data.
```{r}
# Read in the data
data = read.csv("UsedCars.csv",sep = ",",header = TRUE)
# Show the first few rows of data
head(data, 3)
```
## Question B1: Exploratory Data Analysis
a. **3 pts** Use a scatter plot to describe the relationship between Price and the Accumulated kilometers on odometer. Describe the general trend (direction and form). Include plots and R-code used.
```{r}
# Your code here...
```
b. **3 pts** What is the value of the correlation coefficient between *Price* and *KM*? Please interpret the strength of the correlation based on the correlation coefficient.
```{r}
# Your code here...
```
c. **2 pts** Based on this exploratory analysis, would you recommend a simple linear regression model for the relationship?
d. **1 pts** Based on the analysis above, would you pursue a transformation of the data? *Do not transform the data.*
## Question B2: Fitting the Simple Linear Regression Model
Fit a linear regression model, named *model_1*, to evaluate the relationship between UsedCars Price and the accumulated KM. *Do not transform the data.* The function you should use in R is:
```{r}
# Your code here...
```
a. **3 pts** What are the model parameters and what are their estimates?
b. **2 pts** Write down the estimated simple linear regression equation.
c. **2 pts** Interpret the estimated value of the $\beta_1$ parameter in the context of the problem.
d. **2 pts** Find a 95% confidence interval for the $\beta_1$ parameter. Is $\beta_1$ statistically significant at this level?
```{r}
# Your code here...
```
e. **2 pts** Is $\beta_1$ statistically significantly negative at
an $\alpha$-level of 0.01? What is the approximate p-value of this test?
```{r}
# Your code here...
```
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## Question B3: Checking the Assumptions of the Model
Create and interpret the following graphs with respect to the assumptions of the linear regression model. In other words, comment on whether there are any apparent departures from the assumptions of the linear regression model. Make sure that you state the model assumptions and assess each one. Each graph may be used to assess one or more model assumptions.
a. **3 pts** Scatterplot of the data with *KM* on the x-axis and
*Price* on the y-axis. Make sure you include a line showing the overall trend of the scatterplot
```{r}
# Your code here...
```
b. **4 pts** Residual plot - a plot of the residuals, $\hat\
epsilon_i$, versus the fitted values, $\hat{y}_i$. Make sure you include a line showing the ideal baseline (hint: residual = 0) that serves as the comparison
```{r}
# Your code here...
```
c. **4 pts** Histogram and q-q plot of the residuals. Make sure you include a line in the q-q showing the ideal baseline that serves as the comparison in a q-q plot
```{r}
# Your code here...
```
## Question B4: Prediction
Use the results from both model_1 to discuss the effects of KM on the dependent variable: Holding everything else equal, how much the sales price would decrease if a car accumulated 10,000 more kilometers? What observations can you make about the result in
the context of the problem? (3 pts)
```{r}
# Your code here...
```
# Part C. Experiment!
You work for the National Park Service (NPS), and you absolutely love bears. Describe an imaginary (it can be realistic) scenario in which you get to run a one-way ANOVA on a few (3+) species of bears.
### Part C1
__What are you comparing (name the variable!)? What do you hope to learn from ANOVA?__ (2 pts)
### Part C2
__Imagine that the results are "mixed", meaning you can draw some
conclusions and not others. Describe your conclusions and make sure you detail, with reference to your ANOVA, why the results were "mixed."__ (3 pts)
### Part C3
__Now imagine that you have just been granted 3 months and $50,000 to continue this study (you're a great grant writer and a
very likable member of the NPS!). Describe some next steps you would take to clarify, reinforce and/or further explore your nascent investigation. You MUST reference using a 'controlling' variable somehow in your response.__ (5 pts)
## Part D. Explain the meaning of a p-value!
__Explain in detail what it means specifically for any result to be "statistically significant" at a particular -level. In other words, explain the meaning and use of p-values. You should research this question, and you should expect your answer to be at least a paragraph long.__ (6 pts)
Related Documents
Related Questions
How Panel Data is useful to control some types of omitted variables without actually oberving them?
arrow_forward
what are the four imporatant sources of data?
arrow_forward
What is the dependent variable?
Internet access
Father’s education
Number of absences
Age of the child
https://isle.stat.cmu.edu/data-explorers/schoolabsence/
that is the link to the data set that the question refers to
arrow_forward
Define the term linear interpolation?
arrow_forward
Can you answer A,B,C with clear answers. You can use the data in the second photo
arrow_forward
>
Search
itc.edu.kh v
Activity
Midterm Statistics(2) (2020-2021GICI31STA_GIC_Statistics_OL Say_Mardi_7-9am)
Close
Teams
Hi DIM LIFY, when you submit this form, the owner will be able to see your name and email address.
Assignments
1
Question 5
Calendar
(20 Points)
Files
Let X1, X2, X3,..., Xn be a random sample from a Geometric distribution
Geo(0), where 0 is unknown. Find the maximum likelihood estimator (MLE) of
O based on this random sample. Recall that the pmf of X ~ Geo(0) is
f(x; 0) = (1 – 6)*-10,
(a) Ômle = X
(b) Ômle = 1/X
x = 0, 1, ....
%3D
(c) Ômle = E=, In X,
(d) Ômle = 2X
%3D
(a)
(b)
(c)
Apps
(d)
1:50 PM
A Spotify
T. General (2020-2021...
Details | bartleby - ..
A D 4) G E
ENG
12/16/2020
O
田
arrow_forward
Q4
Deep leaming is a type of Machine Learning, inspired by the function and structure of
a human brain, where machines can learn by experience and acquire skills without
any human involvement. Table Q4 shows the relation of data amount and the
performance of a deep learning technique capability to perform COVID-19 face mask
identification among crowd in Pasar Rabu, Taman Universiti, Parit Raja.
Table Q4
Independent variable
Dependent variable
Face mask identification
Set of experiment
Training data size, x
аccuracy (%), у
1
100
30
200
40
3
300
50
400
55
500
60
6.
600
70
7
700
75
8
800
80
9
900
85
10
1000
90
10
10
10
Given > r; = 4600, = 635, > = 2860000
i=1
i=1
i=1
10
10
= 43875, >r yi = 322000
(a)
Determine Sr, Syy, and Sry
(b)
Determine B. B.
(c)
Determine the estimated regression line equation.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- How Panel Data is useful to control some types of omitted variables without actually oberving them?arrow_forwardwhat are the four imporatant sources of data?arrow_forwardWhat is the dependent variable? Internet access Father’s education Number of absences Age of the child https://isle.stat.cmu.edu/data-explorers/schoolabsence/ that is the link to the data set that the question refers toarrow_forward
- Define the term linear interpolation?arrow_forwardCan you answer A,B,C with clear answers. You can use the data in the second photoarrow_forward> Search itc.edu.kh v Activity Midterm Statistics(2) (2020-2021GICI31STA_GIC_Statistics_OL Say_Mardi_7-9am) Close Teams Hi DIM LIFY, when you submit this form, the owner will be able to see your name and email address. Assignments 1 Question 5 Calendar (20 Points) Files Let X1, X2, X3,..., Xn be a random sample from a Geometric distribution Geo(0), where 0 is unknown. Find the maximum likelihood estimator (MLE) of O based on this random sample. Recall that the pmf of X ~ Geo(0) is f(x; 0) = (1 – 6)*-10, (a) Ômle = X (b) Ômle = 1/X x = 0, 1, .... %3D (c) Ômle = E=, In X, (d) Ômle = 2X %3D (a) (b) (c) Apps (d) 1:50 PM A Spotify T. General (2020-2021... Details | bartleby - .. A D 4) G E ENG 12/16/2020 O 田arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL