HW1_Starter Template_R (Summer 24, 5.20 update)
Rmd
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
6414
Subject
Statistics
Date
Jun 2, 2024
Type
Rmd
Pages
5
Uploaded by GrandRat2927
---
title: "HW1 Peer Assessment"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Part A. Variables
In the field of psychology, much research is done using self-
report surveys using Likert scales (look it up!).
### A1
__What type of variable is a Likert response?__ (1 pt)
### A2
__What are some (at least 2) benefits of using Likert scales?__ (2
pts)
### A3
__What are some drawbacks of using them? Make sure you mention at
least one 'drawback' and one 'danger' (a 'drawback' is a shortcoming, while a 'danger' implies potential harm).__ (2 pts)
# Part B. Simple Linear Regression
Perform linear regressions on a dataset from a European Toyota car dealer on the sales records of used cars (Toyota Corolla). We
would like to construct a reasonable linear regression model for the relationship between the sales prices of used cars and various explanatory variables (such as age, mileage, horsepower).
We are interested to see what factors affect the sales price of a used car and by how much.
Data Description
*Id* - ID number of each used car *Model* - Model name of each used car *Price* - The price (in Euros) at which each used car was sold *Age* - Age (in months) of each used car as of August 2004 *KM* - Accumulated kilometers on odometer
*HP* - Horsepower *Metallic* - Metallic color? (Yes = 1, No = 0) *Automatic* - Automatic transmission? ( Yes = 1, No = 0) *CC* - Cylinder volume (in cubic centimeters) *Doors* - Number of doors *Gears* - Number of gears *Weight* - Weight (in kilograms)
The data is in the file "UsedCars.csv". To read the data in `R`, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the `R` function `read.csv()`.
Read data and show few rows of data.
```{r}
# Read in the data
data = read.csv("UsedCars.csv",sep = ",",header = TRUE)
# Show the first few rows of data
head(data, 3)
```
## Question B1: Exploratory Data Analysis
a. **3 pts** Use a scatter plot to describe the relationship between Price and the Accumulated kilometers on odometer. Describe the general trend (direction and form). Include plots and R-code used.
```{r}
# Your code here...
```
b. **3 pts** What is the value of the correlation coefficient between *Price* and *KM*? Please interpret the strength of the correlation based on the correlation coefficient.
```{r}
# Your code here...
```
c. **2 pts** Based on this exploratory analysis, would you recommend a simple linear regression model for the relationship?
d. **1 pts** Based on the analysis above, would you pursue a transformation of the data? *Do not transform the data.*
## Question B2: Fitting the Simple Linear Regression Model
Fit a linear regression model, named *model_1*, to evaluate the relationship between UsedCars Price and the accumulated KM. *Do not transform the data.* The function you should use in R is:
```{r}
# Your code here...
```
a. **3 pts** What are the model parameters and what are their estimates?
b. **2 pts** Write down the estimated simple linear regression equation.
c. **2 pts** Interpret the estimated value of the $\beta_1$ parameter in the context of the problem.
d. **2 pts** Find a 95% confidence interval for the $\beta_1$ parameter. Is $\beta_1$ statistically significant at this level?
```{r}
# Your code here...
```
e. **2 pts** Is $\beta_1$ statistically significantly negative at
an $\alpha$-level of 0.01? What is the approximate p-value of this test?
```{r}
# Your code here...
```
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
## Question B3: Checking the Assumptions of the Model
Create and interpret the following graphs with respect to the assumptions of the linear regression model. In other words, comment on whether there are any apparent departures from the assumptions of the linear regression model. Make sure that you state the model assumptions and assess each one. Each graph may be used to assess one or more model assumptions.
a. **3 pts** Scatterplot of the data with *KM* on the x-axis and
*Price* on the y-axis. Make sure you include a line showing the overall trend of the scatterplot
```{r}
# Your code here...
```
b. **4 pts** Residual plot - a plot of the residuals, $\hat\
epsilon_i$, versus the fitted values, $\hat{y}_i$. Make sure you include a line showing the ideal baseline (hint: residual = 0) that serves as the comparison
```{r}
# Your code here...
```
c. **4 pts** Histogram and q-q plot of the residuals. Make sure you include a line in the q-q showing the ideal baseline that serves as the comparison in a q-q plot
```{r}
# Your code here...
```
## Question B4: Prediction
Use the results from both model_1 to discuss the effects of KM on the dependent variable: Holding everything else equal, how much the sales price would decrease if a car accumulated 10,000 more kilometers? What observations can you make about the result in
the context of the problem? (3 pts)
```{r}
# Your code here...
```
# Part C. Experiment!
You work for the National Park Service (NPS), and you absolutely love bears. Describe an imaginary (it can be realistic) scenario in which you get to run a one-way ANOVA on a few (3+) species of bears.
### Part C1
__What are you comparing (name the variable!)? What do you hope to learn from ANOVA?__ (2 pts)
### Part C2
__Imagine that the results are "mixed", meaning you can draw some
conclusions and not others. Describe your conclusions and make sure you detail, with reference to your ANOVA, why the results were "mixed."__ (3 pts)
### Part C3
__Now imagine that you have just been granted 3 months and $50,000 to continue this study (you're a great grant writer and a
very likable member of the NPS!). Describe some next steps you would take to clarify, reinforce and/or further explore your nascent investigation. You MUST reference using a 'controlling' variable somehow in your response.__ (5 pts)
## Part D. Explain the meaning of a p-value!
__Explain in detail what it means specifically for any result to be "statistically significant" at a particular -level. In other words, explain the meaning and use of p-values. You should research this question, and you should expect your answer to be at least a paragraph long.__ (6 pts)
Related Documents
Related Questions
Help
arrow_forward
How Panel Data is useful to control some types of omitted variables without actually oberving them?
arrow_forward
what are the four imporatant sources of data?
arrow_forward
A categorical variable with multiple categories is converted to sufficient indicator variables for inclusion in a model. Will those indicator variables be correlated to one another? CLearly explain
arrow_forward
n
texsu.blackboard.com/ultra/courses/ 3217867_1/grades/assessment/ 2701735_1/overvipw/attempt/_1129
Examity::Start Ex...
YouTube
Review Example 8.5 and answer the following questions.
EXAMPLE 8.5
Problem
Maps M Gmail
Do you
Additional content
Content
Drag and drop files here or click to add text.
Questions Filter (2) ▼
General Psycholog...
The average earnings per share (EPS) for 10 industrial stocks randomly selected from those listed on
the Dow Jones Industrial Average was found to be X = 1.85 with a standard deviation of s=0.395.
Calculate a 99% confidence interval for the average EPS of all the industrials listed on the DJIA.
a. A confidence interval is a type of estimator. What are we estimating in this example?
b. Name each symbol in the confidence interval given and explain the use of the symbol.
c Suppose you were conducting a two-tailed hypothesis test: Ho: μ = 3.0 at a = .01. What do you conclude?
a. A confidence interval is a type of estimator. What are we estimating in this…
arrow_forward
What is the dependent variable?
Internet access
Father’s education
Number of absences
Age of the child
https://isle.stat.cmu.edu/data-explorers/schoolabsence/
that is the link to the data set that the question refers to
arrow_forward
Q4
Deep leaming is a type of Machine Learning, inspired by the function and structure of
a human brain, where machines can learn by experience and acquire skills without
any human involvement. Table Q4 shows the relation of data amount and the
performance of a deep learning technique capability to perform COVID-19 face mask
identification among crowd in Pasar Rabu, Taman Universiti, Parit Raja.
Table Q4
Independent variable
Dependent variable
Face mask identification
Set of experiment
Training data size, x
аccuracy (%), у
1
100
30
200
40
3
300
50
400
55
500
60
6.
600
70
7
700
75
8
800
80
9
900
85
10
1000
90
10
10
10
Given > r; = 4600, = 635, > = 2860000
i=1
i=1
i=1
10
10
= 43875, >r yi = 322000
(a)
Determine Sr, Syy, and Sry
(b)
Determine B. B.
(c)
Determine the estimated regression line equation.
arrow_forward
Question 16
Indicate whether the race of a person is a quantitative
variable or a categorical variable?
Edit View Insert Format Tools Table
12pt v
Paragraph v
BIYA 2 TV
53
arrow_forward
10. Mathevon et al, 2010 measured acoustic properties of spotted hyena communication,
termed the giggle call, to investigate whether properties of the call encoded information
about the signaler. Using the data hyenagiggles.csv, choose an appropriate way to plot the
data. Test for significant differences between variation in dominant and subordinate
hyena giggles. (
11. Epel et al, 2004 studied the relationship between telomeres and stress
(telomeres_stress.csv posted on Blackboard) in humans. Telomere length may be linked
to cell senescence and longevity. Stress varied by the number of years a person was
taking care of a chronically ill child (column labelled "years"). Does years caregiving
have a relationship to telomere length? Plot the data and test for a relationship. "
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- A categorical variable with multiple categories is converted to sufficient indicator variables for inclusion in a model. Will those indicator variables be correlated to one another? CLearly explainarrow_forwardn texsu.blackboard.com/ultra/courses/ 3217867_1/grades/assessment/ 2701735_1/overvipw/attempt/_1129 Examity::Start Ex... YouTube Review Example 8.5 and answer the following questions. EXAMPLE 8.5 Problem Maps M Gmail Do you Additional content Content Drag and drop files here or click to add text. Questions Filter (2) ▼ General Psycholog... The average earnings per share (EPS) for 10 industrial stocks randomly selected from those listed on the Dow Jones Industrial Average was found to be X = 1.85 with a standard deviation of s=0.395. Calculate a 99% confidence interval for the average EPS of all the industrials listed on the DJIA. a. A confidence interval is a type of estimator. What are we estimating in this example? b. Name each symbol in the confidence interval given and explain the use of the symbol. c Suppose you were conducting a two-tailed hypothesis test: Ho: μ = 3.0 at a = .01. What do you conclude? a. A confidence interval is a type of estimator. What are we estimating in this…arrow_forwardWhat is the dependent variable? Internet access Father’s education Number of absences Age of the child https://isle.stat.cmu.edu/data-explorers/schoolabsence/ that is the link to the data set that the question refers toarrow_forward
- Q4 Deep leaming is a type of Machine Learning, inspired by the function and structure of a human brain, where machines can learn by experience and acquire skills without any human involvement. Table Q4 shows the relation of data amount and the performance of a deep learning technique capability to perform COVID-19 face mask identification among crowd in Pasar Rabu, Taman Universiti, Parit Raja. Table Q4 Independent variable Dependent variable Face mask identification Set of experiment Training data size, x аccuracy (%), у 1 100 30 200 40 3 300 50 400 55 500 60 6. 600 70 7 700 75 8 800 80 9 900 85 10 1000 90 10 10 10 Given > r; = 4600, = 635, > = 2860000 i=1 i=1 i=1 10 10 = 43875, >r yi = 322000 (a) Determine Sr, Syy, and Sry (b) Determine B. B. (c) Determine the estimated regression line equation.arrow_forwardQuestion 16 Indicate whether the race of a person is a quantitative variable or a categorical variable? Edit View Insert Format Tools Table 12pt v Paragraph v BIYA 2 TV 53arrow_forward10. Mathevon et al, 2010 measured acoustic properties of spotted hyena communication, termed the giggle call, to investigate whether properties of the call encoded information about the signaler. Using the data hyenagiggles.csv, choose an appropriate way to plot the data. Test for significant differences between variation in dominant and subordinate hyena giggles. ( 11. Epel et al, 2004 studied the relationship between telomeres and stress (telomeres_stress.csv posted on Blackboard) in humans. Telomere length may be linked to cell senescence and longevity. Stress varied by the number of years a person was taking care of a chronically ill child (column labelled "years"). Does years caregiving have a relationship to telomere length? Plot the data and test for a relationship. "arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL