H11
docx
keyboard_arrow_up
School
Temple University *
*We aren’t endorsed by this school
Course
2521
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
4
Uploaded by DeanStar27858
H11
a. Use mydata = data.frame(advertisement = x,sales = y) in R. Create a dataframe
named “mydata” with two variables. Store the predictor vector x into a variable called
“advertisement”, and store the response vector y into a variable called “sales”.
>n=100
>x=5+rnorm(n)
>e=rnorm(n)
>y=1+2*x+e
>mydata = data.frame(advertisement = x, sales = y)
>head(mydata)
advertisement
sales
1
3.649284
6.897327
2
2.951805
7.419058
3
4.702271 10.028348
4
4.587121
9.537453
5
3.790129
8.584801
6
3.790236
9.923865
b. Plot sales v.s. advertisement. What is the trend in this plot?
Hint: you can use plot(mydata) or plot(mydata$advertisement,mydata$sales).
>
plot(mydata$advertisement, mydata$sales, main="Trend of sales
v.s. advertisement")
c. Use lm() function in R to fit a linear regression between sales as the response and
advertisement as the predictor. Store the output in “myfit”.
>myfit <- lm(sales ~ advertisement, data=mydata)
>myfit
Call:
lm(formula = sales ~ advertisement, data = mydata)
Coefficients:
(Intercept)
advertisement
1.522
1.911
d. Use summary(myfit) in R to get the summary statistics in the linear regression.
>summary(myfit)
Call:
lm(formula = sales ~ advertisement, data = mydata)
Residuals:
Min
1Q
Median
3Q
Max
-2.46228 -0.69433 -0.08517
0.70894
2.33665
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.5220
0.5767
2.639
0.00967 **
advertisement
1.9113
0.1142
16.742
< 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.02 on 98 degrees of freedom
Multiple R-squared:
0.7409,
Adjusted R-squared:
0.7383
F-statistic: 280.3 on 1 and 98 DF,
p-value: < 2.2e-16
e. Use myfit$coefficients to get the regression coefficients. Write out the fitted regres-
sion line ˆy = ˆβ
0
+ ˆβ
1
x, and explain the meanings of the estimated regression coefficients.
Furthermore, use abline() to add this fitted regression line to the scatterplot in part b.
f. Use cor(x,y) to get the sample correlation between x and y. Find the square of this
correlation. What is the relation between this squared correlation and the coefficient of
determination R
2
?
Hint: get the R
2
value from the summary statistics in part d.
>r=cor(x,y)
>r^2
[1] .7409345
The coefficient of determination is coefficient of correlation squared.
g. Does advertisement has an effect on sales? Set up a formal hypothesis test, find the
test statistic, and report your conclusion based on the p-value.
Hint: get the test statistic and the p-value from the summary statistics in part d.
Based on the p value, advertisement has a high effect on sales.
h. Does advertisement has a positive effect on sales? Set up a formal hypothesis test,
find the test statistic, and report your conclusion based on the p-value.
Hint: the alternative hypothesis should be H
a
: β
1
> 0.
The p value comes out to 0, so yes advertisement does have a high effect on sales
. Use anova(myfit) in R. Fill in the blanks (marked by “
∗
”) in the ANOVA table for
the regression of sales on advertisement.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Source df Sum of Squares Mean Squares F Statistic
Regression
∗
SSR=
∗
MSR=
∗
F=
∗
Error
∗
SSE=
∗
MSE=
∗
Total
∗
SST=
∗
What is the degrees of freedom for the F statistic? What about the corresponding
p-value? What is the null and alternative hypothesis for this F test?
>anova(myfit)
Analysis of Variance Table
Response: sales
Df Sum Sq Mean Sq F value
Pr(>F)
advertisement
1 291.37
291.37
280.28 < 2.2e-16 ***
Residuals
98 101.88
1.04
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Related Questions
The r code for side by side boxplot of vitamind v newage and vitamin d v country.
Scatterplot code for relationship between vitamin d level and age.
arrow_forward
The data set Pain contains hypothetical data for a clinical trial of a drug therapy to control pain. The clinical trial investigates whether adverse responses increase with larger drug doses. Subjects receive either a placebo or one of four drug doses (1, 2, 3, or 4 units). An adverse response is recorded as Adverse = 'Yes'; otherwise, it is recorded as Adverse = 'No'. The number of subjects for each drug dose and response combination is contained in the variable Count.
(a) Construct a contingency table that corresponds to the data set created above. What type of variable is Dose?
(B) Compute the sample proportions of adverse responses at each dose level. Do you observe any trend in the proportion of adverse responses with respect to dose level?
(C) Conduct a Cochran-Armitage trend test at the 5% significance level to address the interests of the trial.
arrow_forward
A survey of used car dealers in the Wichita area was conducted by the Wichita Eagle to determine the relationship between the amount of classified advertising of used cars and used car sales. The table below shows the hundreds of lines of classified ads (200=2) and the number of used cars sold in a month for each of the six dealers who used no other advertising medium.
The dealers want to predict the number of used cars sold based on the amount of classified advertising. Which is the dependent variable, number of used cars sold or amount of classified advertising?
arrow_forward
Describe the structure of the data. Is it different than data you have examined before? Specifically, there are 24 rows in this data set, but the set provides information about 4,526 observations (the sum of the "count" column). Describe what each row represents.
Each of the six departments (A-F) have four rows of data in this set corresponding to each of the following.
1st row
Males
who were admitted
2nd row
Males
who were not admitted
3rd row
4th row
Females
Females
Of the 4,526 applicants in this data set, 1,755 were admitted for an overall admission rate of approximately 39%. The admission rate for males was 45%, while the admission rate for females was 30%. Complete the following table using Section 3 in the Tutorial or by adding up the counts by hand for each category. (For
example: add up all counts where Admission = 1 and Sex = 1 to get the number of males who were admitted.)
Sex Number Admitted
Male
Female
who were admitted
who were not admitted
1198
557
Number Rejected
1493…
arrow_forward
Can subpoints d to f be explained pleased
arrow_forward
The question is in the image.
arrow_forward
A company sets different prices for a particular DVD system in eight different regions of the country. The accompanying table shows the numbers of units sold and the corresponding prices (in dollars). Plot the data using a scatter plot with sales as the dependent variable and price as the independent variable.Sales 420 380 350 400 440 380 450 420Price 104 195 148 204 96 256 141 109
arrow_forward
The entirety of the data set will be in the two pictures
arrow_forward
For test taking, do you retain information better writing it down with paper and pen or typing on a laptop?
There will be 2 groups of students taking the same exam. Prior to the exam, they are given a crash course to study, 1 group writing down notes with pen and paper and the other group typing notes on a laptop.
What is the dependent variable?
arrow_forward
Which two rows of the data table
show the following pattern? Select all
that apply.
A tripling of the value of x results
in the value of y increasing by a
factor of nine.
Row X
A
2
B
C
D
E
Rows A and B.
Rows A and C
Rows A and D
Rows A and E
Rows B and C
4
6
8
12
Tap the options below to select and deselect. Select all that apply.
Rows B and D
Rows B and E
Rows C and D
Rows C and E
Rows D and E
y
2
8
18
32
72
arrow_forward
### Input your data into two columns. Column one will contain the categorical variable (Type) and column two will contain the numeric data (Amount Eaten cm). Create a dataframe containing those two columns. (you may abbreviate the seed type name to shorten the amount of typing)
arrow_forward
Make a linear model for the following data
(1,13) (5,20) (9,27) (13,34)
arrow_forward
4A
Create a scatterplot of the data. Height is X-axis and Shoe size is Y-axis
Person 1
Person 2
Person 3
Person 4
Person 5
Person 6
Person 7
Person 8
Person 9
Height (inches)
69
67
67
64
68
69
58
74
70
Shoe Size
10
9.5
10
9
9.5
11
6
11.5
10.5
arrow_forward
The electric power consumed each month by a chemical plant is thought to be related to the average ambient temperature (x₁), the
number of days in the month (x₂), the average product purity (x3), and the tons of product produced (x4). The past year's historical data
are available and are presented in the following table:
Y
240
236
270
274
301
316
270
296
267
276
288
261
25
31
45
60
65
72
80
84
75
60
50
38
X2
24
21
24
25
25
26
25
25
24
25
25
23
Fit a multiple linear regression to predict power (y) using x1, X2 X3, and X4.
Calculate R2 for this model. Round your answer to 3 decimal places.
91
90
88
87
91
94
87
86
88
91
90
89
X4
100
95
110
88
94
99
97
96
110
105
100
98
arrow_forward
Unstack the data from the variables Cr. hours and Age into two columns labeled Full time and Part time. Calculate the median age of full-time students. Calculate the average age of part-time students. Cr. Hrs. 7 9 9 9 12 6 6 12 9 6 4 3 12 14 15 3 13 12 12 7 Age 36 17 19 21 20 20 22 21 21 16 20 20 34 19 20 22 24 20 31 20
arrow_forward
The following table shows economic development measured in per capita income, PCINC.
a.
Year
1870
1880
1890
1900
1910
PCINC
334
507
591
753
923
Year
1920
1930
1940
1950
1960
What are the independent and dependent variables?
The independent variable is the Click for List
Click for List
PCINC
1,045
1,166
1,360
1,828
2,141
and the dependent variable is the
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
data:image/s3,"s3://crabby-images/f7b2e/f7b2e13a7986b0da326090f527c815066b5aa9ba" alt="Text book image"
Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning
Related Questions
- The r code for side by side boxplot of vitamind v newage and vitamin d v country. Scatterplot code for relationship between vitamin d level and age.arrow_forwardThe data set Pain contains hypothetical data for a clinical trial of a drug therapy to control pain. The clinical trial investigates whether adverse responses increase with larger drug doses. Subjects receive either a placebo or one of four drug doses (1, 2, 3, or 4 units). An adverse response is recorded as Adverse = 'Yes'; otherwise, it is recorded as Adverse = 'No'. The number of subjects for each drug dose and response combination is contained in the variable Count. (a) Construct a contingency table that corresponds to the data set created above. What type of variable is Dose? (B) Compute the sample proportions of adverse responses at each dose level. Do you observe any trend in the proportion of adverse responses with respect to dose level? (C) Conduct a Cochran-Armitage trend test at the 5% significance level to address the interests of the trial.arrow_forwardA survey of used car dealers in the Wichita area was conducted by the Wichita Eagle to determine the relationship between the amount of classified advertising of used cars and used car sales. The table below shows the hundreds of lines of classified ads (200=2) and the number of used cars sold in a month for each of the six dealers who used no other advertising medium. The dealers want to predict the number of used cars sold based on the amount of classified advertising. Which is the dependent variable, number of used cars sold or amount of classified advertising?arrow_forward
- Describe the structure of the data. Is it different than data you have examined before? Specifically, there are 24 rows in this data set, but the set provides information about 4,526 observations (the sum of the "count" column). Describe what each row represents. Each of the six departments (A-F) have four rows of data in this set corresponding to each of the following. 1st row Males who were admitted 2nd row Males who were not admitted 3rd row 4th row Females Females Of the 4,526 applicants in this data set, 1,755 were admitted for an overall admission rate of approximately 39%. The admission rate for males was 45%, while the admission rate for females was 30%. Complete the following table using Section 3 in the Tutorial or by adding up the counts by hand for each category. (For example: add up all counts where Admission = 1 and Sex = 1 to get the number of males who were admitted.) Sex Number Admitted Male Female who were admitted who were not admitted 1198 557 Number Rejected 1493…arrow_forwardCan subpoints d to f be explained pleasedarrow_forwardThe question is in the image.arrow_forward
- A company sets different prices for a particular DVD system in eight different regions of the country. The accompanying table shows the numbers of units sold and the corresponding prices (in dollars). Plot the data using a scatter plot with sales as the dependent variable and price as the independent variable.Sales 420 380 350 400 440 380 450 420Price 104 195 148 204 96 256 141 109arrow_forwardThe entirety of the data set will be in the two picturesarrow_forwardFor test taking, do you retain information better writing it down with paper and pen or typing on a laptop? There will be 2 groups of students taking the same exam. Prior to the exam, they are given a crash course to study, 1 group writing down notes with pen and paper and the other group typing notes on a laptop. What is the dependent variable?arrow_forward
- Which two rows of the data table show the following pattern? Select all that apply. A tripling of the value of x results in the value of y increasing by a factor of nine. Row X A 2 B C D E Rows A and B. Rows A and C Rows A and D Rows A and E Rows B and C 4 6 8 12 Tap the options below to select and deselect. Select all that apply. Rows B and D Rows B and E Rows C and D Rows C and E Rows D and E y 2 8 18 32 72arrow_forward### Input your data into two columns. Column one will contain the categorical variable (Type) and column two will contain the numeric data (Amount Eaten cm). Create a dataframe containing those two columns. (you may abbreviate the seed type name to shorten the amount of typing)arrow_forwardMake a linear model for the following data (1,13) (5,20) (9,27) (13,34)arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageFunctions and Change: A Modeling Approach to Coll...AlgebraISBN:9781337111348Author:Bruce Crauder, Benny Evans, Alan NoellPublisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
data:image/s3,"s3://crabby-images/f7b2e/f7b2e13a7986b0da326090f527c815066b5aa9ba" alt="Text book image"
Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning