Lab 7 Pre-Lab and Exercises
pdf
keyboard_arrow_up
School
University of Calgary *
*We aren’t endorsed by this school
Course
213
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
4
Uploaded by trist8182
11/20/2019
Statistics 213 Lab Exercises – Simple Linear Regression
https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab7.html
1/4
Statistics 213 Lab Exercises – Simple Linear
Regression
© Jim Stallard, Scott Robison, and Claudia Mahler 2019 all rights reserved.
Pre-Lab Exercise:
Is there a relationship between the number of tweets an item receives on Twitter and the success of a product? A study was taken where the
tweet rate of a motion picture and its opening weekend box office revenue was observed. The `tweet rate’ represents the average number of
tweets per hour where the movie was referenced. The revenue of the movie is in millions of $s. The bivariate data was analyzed and
summarized below with the statistical software.
Average Tweets per hour
Revenue (in Millions of dollars)
1365.80
142.0
1212.80
77.0
581.50
61.0
310.00
32.0
455.00
31.0
290.00
30.0
250.00
21.0
680.50
18.0
150.00
18.0
164.50
17.0
113.90
16.0
144.50
15.0
418.00
14.0
98.00
14.0
100.80
12.0
115.40
11.0
74.40
10.0
87.50
9.0
127.60
9.0
11/20/2019
Statistics 213 Lab Exercises – Simple Linear Regression
https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab7.html
2/4
Average Tweets per hour
Revenue (in Millions of dollars)
52.20
9.0
144.10
8.0
41.30
2.0
2.75
0.3
Average_Tweets_Per_Hour=c(1365.8,1212.8,581.5,310,455,290,250,680.5,150,164.5,113.9,144.5,418,98,100.8,115.4,74.4,87.5,12
7.6,52.2,144.1,41.3,2.75) Revenue=c(142,77,61,32,31,30,21,18,18,17,16,15,14,14,12,11,10,9,9,9,8,2,.3) fit=lm(Revenue~Average_Tweets_Per_Hour)
cor(Revenue~Average_Tweets_Per_Hour)
## [1] 0.9078671
fit
## ## Call: ## lm(formula = Revenue ~ Average_Tweets_Per_Hour) ## ## Coefficients: ## (Intercept) Average_Tweets_Per_Hour ## 1.15056 0.07877
a. Identify the predictor/ -variable and the response/ -variable.
0
200
400
600
800
1000
1200
1400
0
20
40
60
80
100
120
140
Average Tweets per hour
Revenue (in Millions of dollars)
11/20/2019
Statistics 213 Lab Exercises – Simple Linear Regression
https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab7.html
3/4
b. What does the scatterplot tell you about the (i) direction of the relationship and (ii) the strength of the
relationship?
c. Consider the provided value of the correlation coefficient. What does the value of this statistic tell you
about (i) the direction and (ii) the strength of the relationship?
d. From the information given, estimate the model that expresses a movie’s opening weekend revenue as
a linear function of its social media prevalence, the latter measures by the average number of tweets per
hour. That is, estimate the model .
e. Suppose a certain movie receives an average of 500 tweets per hour during its opening weekend. Using
the estimate of the model, predict this movie’s opening weekend revenue.
f. If the average number of tweets per hour increases by one, how does this affect the movie’s opening
weekend revenue?
g. What percentage of the variation in a movie’s opening weekend revenue can be explained by its linear
relationship with the average number of tweets the movie receives in an hour?
h. The ability to predict the opening weekend revenue of a movie – what you did in part (e) – depends on a
certain condition. State this condition.
Lab Exercise 1:
In a certain jurisdiction, all students in Grade Three are required to take a standardized test to evaluate their reading comprehension skills.
Educators believe that such standardized testing is not helpful, as a student’s ability can be affected by various socio-economic factors.
School
ID
Average score on the standardized reading test for all Grade
Three students
Percentage of Grade Three students who live below the
poverty-line
School
1
165
91.7
School
2
157.2
90.2
School
3
164.4
86
School
4
162.4
83.9
…
…
…
The data sample actually has entries that must be input into R.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/20/2019
Statistics 213 Lab Exercises – Simple Linear Regression
https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab7.html
4/4
Average_Reading_Score =c(165.0, 157.2, 164.4, 162.4, 162.5, 164.9, 162.0, 165.0, 173.7, 171.0, 169.4, 172.9, 172.7, 174.9, 174.8, 170.1, 181.4, 180.6, 178.0, 175.9, 181.6, 183.8) Percent_below_poverty = c(91.7, 90.2, 86.0, 83.9, 80.4, 76.5, 76.0, 75.8, 75.6, 75.0, 74.7, 63.2, 52.9, 48.5, 39.1, 38.4, 34.3, 30.3, 30.3, 29.6, 26.5, 13.8) fit=lm(Average_Reading_Score~Percent_below_poverty)
This data resulted by sampling random schools within this jurisdiction. For each school, the average score on the standardized reading test for
all Grade Three students as well as the percentage of Grade Three students who live below the poverty-line was observed.
a. create a scatterplot.
Complete the statement: From this, the relationship between the Grade Three reading comprehension test
result and the proportion of Grade Three students living below the poverty line is ____________. (non-
existent, positive, negative)
b. Compute the correlation coefficient.
c. Compute the model that estimates the average Grade Three reading test score based on its linear
relationship to the proportion of Grade Three students who live below the poverty line.
d. In a certain school, the proportion of children in Grade Three who live below the poverty line is 21%.
Predict the average reading comprehension score this school’s Grade Three students.
e. Complete the sentence: “As the proportion of a school’s Grade Three children who live below the
poverty line increases by 1%, the reading comprehension score will _______ by ______ of _______.
f. Find the coefficient of determination.
g. What does the coefficient of determination measure?
h. What conditions are required in order to estimate (from some data), then predict the reading
comprehension score (Y) of a Grade Three class at a certain school based on the proportion of Grade
Three students who live below the poverty line (X)?
Use the skills you have learned in this lab to complete the lab quiz.
Related Documents
Related Questions
Excel file:
https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharing
arrow_forward
COmpare and constrast the use of prediction intervals for a Single Linear Regression model having one X and Multiple Linear Regression Model having two predictors X1 and X2. WHat are the similarities/differences in process and interpretation?
arrow_forward
I need help with part b please.
linear regression equation: y_hat = 6.6500 + 1.7000x
table of temperatures versus converted sugars:
Temperature, x
Converted Sugar, y
1
8.2
1.1
8.1
1.2
8.7
1.3
9.9
1.4
9.6
1.5
8.7
1.6
8.2
1.7
10.6
1.8
9.3
1.9
9.2
2
10.7
arrow_forward
Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
FloorArea (Sq.Ft.)
Offices
Entrances
Age
AssessedValue ($'000)
4790
4
2
8
1796
4720
3
2
12
1544
5940
4
2
2
2094
5720
4
2
34
1968
3660
3
2
38
1567
5000
4
2
31
1878
2990
2
1
19
949
2610
2
1
48
910
5650
4
2
42
1774
3570
2
1
4
1187
2930
3
2
15
1113
1280
2
1
31
671
4880
3
2
42
1678
1620
1
2
35
710
1820
2
1
17
678
4530
2
2
5
1585
2570
2
1
13
842
4690
2
2
45
1539
1280
1
1
45
433
4100
3
1
27
1268
3530
2
2
41
1251
3660
2
2
33
1094
1110
1
2
50
638
2670
2
2
39
999
1100
1
1
20
653
5810
4
3
17
1914
2560
2
2
24
772
2340
3
1
5
890
3690
2
2
15
1282
3580
3
2
27
1264
3610
2
1
8
1162
3960
3
2
17
1447
arrow_forward
Canvas
-→ X C0| ® E
Exhibit 6
Part of an Excel regression output relating salary for programmer (dependent variable) with three independent
variables: Test score, Graduate Degree, and Experience is shown below.
Coeffic. Std. Err. t Stat
P-value
Intercept
7.9448 7.3808 1.0764 0.2977
Test Score
1.1476 0.2976 4.8561 0.0850
Grad.
0.1969 0.0899 2.1905 0.0011
Degree
Experience -2.2804 1.9866 -1.0479 0.3679
Refer to Exhibit 6. Based on the Excel output, at 95% confidence (alpha = 0.05) which of the independent
variables in the output contributes significantly to explain salary?
O Experience
Grad Degree
Intercept
Test score
arrow_forward
A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking related to the risk of strokes. The data file “Stroke.xslx” includes a portion of the data from the study. The variable “Risk of Stroke” is measured as the percentage of risk (proportion times 100) that a person will have a stroke over the next 10-year period.
Regression Analysis As Image:
1) Based on the simple regression analysis output, write the estimated regression equation.
2) What is the correlation coefficient between Risk of Stroke and Age? How do you find i
arrow_forward
(BIOSTATISTICS)
In this question , What characteristics are associated with BMI?
Use simple and multivariable linear regression analysis to complete the following table relating the characteristics listed to BMI as a continuous variable. Before conducting the analysis, be sure that all participants have complete data on all analysis variables. If participants are excluded due to missing data, the numbers excluded should be reported. Then, describe how each characteristic is related to BMI. Are crude and multivariable effects similar? What might explain or account for any differences?
Outcome Variable: BMI, kg/m2
Characteristic
Regression Coefficient Crude Models
p-value Regression Coefficient Multivariable Model
P-value
Age, years
0.0627
<0.001-0.02155
0.004
Male sex
-0.580
<0.001-09884
<0.001
Systolic blood pressure, mmHg
0.0603
<0.0010.05716
<0.001
Total serum cholesterol, mg/dL
0.0113
<0.0010.00638…
arrow_forward
Biochemical oxygen demand (BOD) measures organic pollutants in water by measuring the amount of oxygen consumed by microorganisms that break down these compounds. BOD is hard to measure accurately. Total organic carbon (TOC) is easy to measure, so it is common to measure TOC and use regression to predict BOD. A typical regression equation for water entering a municipal treatment plant is
BOD = -55.48 + 1.506 TOC
Both BOD and TOC are measured in milligrams per liter of water.
(a) What does the slope of this line say about the relationship between BOD and TOC?
BOD rises (falls) by 1.506 mg/l for every 1 mg/l increase (decrease) in TOCBOD rises (falls) by 55.48 mg/l for every 1 mg/l increase (decrease) in TOC TOC rises (falls) by 1.506 mg/l for every 55.48 mg/l increase (decrease) in BODTOC rises (falls) by 1.506 mg/l for every 1 mg/l increase (decrease) in BOD
(b) What is the predicted BOD when TOC = 0? Values of BOD less than 0 are impossible. Why do you think the prediction gives an…
arrow_forward
Canvas LMS
ple Linear Regression
X
13.1 Simple Linear Regression
https://education.wiley.com/was/ui/v2/assessment-player/index.html?launchid=87cc93b4-a591-456e-ad60-375765ba7d80#/question/2
lear
-
Question 3 of 9
W
Current Attempt in Progress
X
= i
>>
1º *
eTextbook and Media
3
An auto manufacturing company wanted to investigate how the price of one of its car models depreciates with age. The research
department at the company took a sample of eight cars of this model and collected the following information on the ages (in years) and
prices (in hundreds of dollars) of these cars.
E
Find the regression line equation in the form = a + bx. Use "Age" as the independent variable and "Price" as the dependent
variable.
4
Round your answers to four decimal places.
$
4
R
+(
Q Search
96
5
W NWP Assessment Builder Ul App X
4 6
Age
5
4
5 8
Price 54 86 65 46 59 57 47 34
i
T
2
5
6
E
!
4-
&
7
X
CH
*
8
w Question 3 of 9-13.1 Simple Lin X +
A ☆
144
(
9
101
D
-/1 = 1
O
P
(
SUR
arrow_forward
At least squares regression equation is y=681.3x+16,062 where you is the median income and x is the percentage of 25 years and older with at least a bachelor's degree in the region. The scatter diagram indicates a linear relation between the two variables with a correlation coefficient of 0.7744. Complete parts (a) through (d). Predict the median income of a region in which 30% of adults 25 years and older have at least a bachelor's degree.
arrow_forward
Suppose a multiple regression model is fitted into a variable called model. Which
Python method below returns residuals for a data set based on a multiple regression
model? Select one.
model.residualsvalues
O model.residvalues
model.residuals
model.resid
arrow_forward
Biochemical oxygen demand (BOD) measures organic pollutants in water by measuring the amount of oxygen consumed by microorganisms that break down these compounds. BOD is hard to
measure accurately. Total organic carbon (TOC) is easy to measure, so it is common to measure TOC and use regression to predict BOD. A typical regression equation for water entering a municipal
treatment plant is
BOD = -55.47 + 1.505 TOC
Both BOD and TOC are measured in milligrams per liter of water.
(a) What does the slope of this line say about the relationship between BOD and TOC?
TOC rises (falls) by 1.505 mg/l for every 55.47 mg/l increase (decrease) in BOD
BOD rises (falls) by 55.47 mg/l for every 1 mg/l increase (decrease) in TOC
TOC rises (falls) by 1.505 mg/l for every 1 mg/l increase (decrease) in BOD
BOD rises (falls) by 1.505 mg/l for every 1 mg/l increase (decrease) in TOC
arrow_forward
APP
Internet users (millions)
Year
1994
1997
2000
2003
2006
2009
2012
Users
(1)
2.6
17.6
73,3
170,7
368,5
594.7
871,4
#04161
let & represent the number of internet
users since 1994.
regression
Find
an exponenial model (y=ab^)
for the data set.
2) Find the number of users in 2022.
arrow_forward
Study 3: Age & Pain.
Using the same data set PillShapeColor - use Pearson Correlation and simple linear Regression to create results and then summarize these in APA-style.
Run a Pearson Correlation on age and pain rating. Create a scatterplot with pain rating on Y and age on X axis.
Run a simple linear regression model with pain rating as DV and age as IV.
Regression → Linear Regression, choose your DV, IV(s) –use defaults settings.
1-What is the equation for the regression line? How much variance in pain is accounted for by age (R2)? Interpret this in context of the study (is this a lot of variance explained or not much).2-Describe the regression results simply and clearly- include the slope and constant and describe what the slope means. Is age a significant predictor of perceived pain? Finally, describe what the model predicts about the pain rating of a person who is 40 years old and someone 70 years old.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill


Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Related Questions
- Excel file: https://drive.google.com/file/d/1TQG5r2wzLGk--75whZXyb0SDTHZTWS0S/view?usp=sharingarrow_forwardCOmpare and constrast the use of prediction intervals for a Single Linear Regression model having one X and Multiple Linear Regression Model having two predictors X1 and X2. WHat are the similarities/differences in process and interpretation?arrow_forwardI need help with part b please. linear regression equation: y_hat = 6.6500 + 1.7000x table of temperatures versus converted sugars: Temperature, x Converted Sugar, y 1 8.2 1.1 8.1 1.2 8.7 1.3 9.9 1.4 9.6 1.5 8.7 1.6 8.2 1.7 10.6 1.8 9.3 1.9 9.2 2 10.7arrow_forward
- Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables? FloorArea (Sq.Ft.) Offices Entrances Age AssessedValue ($'000) 4790 4 2 8 1796 4720 3 2 12 1544 5940 4 2 2 2094 5720 4 2 34 1968 3660 3 2 38 1567 5000 4 2 31 1878 2990 2 1 19 949 2610 2 1 48 910 5650 4 2 42 1774 3570 2 1 4 1187 2930 3 2 15 1113 1280 2 1 31 671 4880 3 2 42 1678 1620 1 2 35 710 1820 2 1 17 678 4530 2 2 5 1585 2570 2 1 13 842 4690 2 2 45 1539 1280 1 1 45 433 4100 3 1 27 1268 3530 2 2 41 1251 3660 2 2 33 1094 1110 1 2 50 638 2670 2 2 39 999 1100 1 1 20 653 5810 4 3 17 1914 2560 2 2 24 772 2340 3 1 5 890 3690 2 2 15 1282 3580 3 2 27 1264 3610 2 1 8 1162 3960 3 2 17 1447arrow_forwardCanvas -→ X C0| ® E Exhibit 6 Part of an Excel regression output relating salary for programmer (dependent variable) with three independent variables: Test score, Graduate Degree, and Experience is shown below. Coeffic. Std. Err. t Stat P-value Intercept 7.9448 7.3808 1.0764 0.2977 Test Score 1.1476 0.2976 4.8561 0.0850 Grad. 0.1969 0.0899 2.1905 0.0011 Degree Experience -2.2804 1.9866 -1.0479 0.3679 Refer to Exhibit 6. Based on the Excel output, at 95% confidence (alpha = 0.05) which of the independent variables in the output contributes significantly to explain salary? O Experience Grad Degree Intercept Test scorearrow_forwardA 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking related to the risk of strokes. The data file “Stroke.xslx” includes a portion of the data from the study. The variable “Risk of Stroke” is measured as the percentage of risk (proportion times 100) that a person will have a stroke over the next 10-year period. Regression Analysis As Image: 1) Based on the simple regression analysis output, write the estimated regression equation. 2) What is the correlation coefficient between Risk of Stroke and Age? How do you find iarrow_forward
- (BIOSTATISTICS) In this question , What characteristics are associated with BMI? Use simple and multivariable linear regression analysis to complete the following table relating the characteristics listed to BMI as a continuous variable. Before conducting the analysis, be sure that all participants have complete data on all analysis variables. If participants are excluded due to missing data, the numbers excluded should be reported. Then, describe how each characteristic is related to BMI. Are crude and multivariable effects similar? What might explain or account for any differences? Outcome Variable: BMI, kg/m2 Characteristic Regression Coefficient Crude Models p-value Regression Coefficient Multivariable Model P-value Age, years 0.0627 <0.001-0.02155 0.004 Male sex -0.580 <0.001-09884 <0.001 Systolic blood pressure, mmHg 0.0603 <0.0010.05716 <0.001 Total serum cholesterol, mg/dL 0.0113 <0.0010.00638…arrow_forwardBiochemical oxygen demand (BOD) measures organic pollutants in water by measuring the amount of oxygen consumed by microorganisms that break down these compounds. BOD is hard to measure accurately. Total organic carbon (TOC) is easy to measure, so it is common to measure TOC and use regression to predict BOD. A typical regression equation for water entering a municipal treatment plant is BOD = -55.48 + 1.506 TOC Both BOD and TOC are measured in milligrams per liter of water. (a) What does the slope of this line say about the relationship between BOD and TOC? BOD rises (falls) by 1.506 mg/l for every 1 mg/l increase (decrease) in TOCBOD rises (falls) by 55.48 mg/l for every 1 mg/l increase (decrease) in TOC TOC rises (falls) by 1.506 mg/l for every 55.48 mg/l increase (decrease) in BODTOC rises (falls) by 1.506 mg/l for every 1 mg/l increase (decrease) in BOD (b) What is the predicted BOD when TOC = 0? Values of BOD less than 0 are impossible. Why do you think the prediction gives an…arrow_forwardCanvas LMS ple Linear Regression X 13.1 Simple Linear Regression https://education.wiley.com/was/ui/v2/assessment-player/index.html?launchid=87cc93b4-a591-456e-ad60-375765ba7d80#/question/2 lear - Question 3 of 9 W Current Attempt in Progress X = i >> 1º * eTextbook and Media 3 An auto manufacturing company wanted to investigate how the price of one of its car models depreciates with age. The research department at the company took a sample of eight cars of this model and collected the following information on the ages (in years) and prices (in hundreds of dollars) of these cars. E Find the regression line equation in the form = a + bx. Use "Age" as the independent variable and "Price" as the dependent variable. 4 Round your answers to four decimal places. $ 4 R +( Q Search 96 5 W NWP Assessment Builder Ul App X 4 6 Age 5 4 5 8 Price 54 86 65 46 59 57 47 34 i T 2 5 6 E ! 4- & 7 X CH * 8 w Question 3 of 9-13.1 Simple Lin X + A ☆ 144 ( 9 101 D -/1 = 1 O P ( SURarrow_forward
- At least squares regression equation is y=681.3x+16,062 where you is the median income and x is the percentage of 25 years and older with at least a bachelor's degree in the region. The scatter diagram indicates a linear relation between the two variables with a correlation coefficient of 0.7744. Complete parts (a) through (d). Predict the median income of a region in which 30% of adults 25 years and older have at least a bachelor's degree.arrow_forwardSuppose a multiple regression model is fitted into a variable called model. Which Python method below returns residuals for a data set based on a multiple regression model? Select one. model.residualsvalues O model.residvalues model.residuals model.residarrow_forwardBiochemical oxygen demand (BOD) measures organic pollutants in water by measuring the amount of oxygen consumed by microorganisms that break down these compounds. BOD is hard to measure accurately. Total organic carbon (TOC) is easy to measure, so it is common to measure TOC and use regression to predict BOD. A typical regression equation for water entering a municipal treatment plant is BOD = -55.47 + 1.505 TOC Both BOD and TOC are measured in milligrams per liter of water. (a) What does the slope of this line say about the relationship between BOD and TOC? TOC rises (falls) by 1.505 mg/l for every 55.47 mg/l increase (decrease) in BOD BOD rises (falls) by 55.47 mg/l for every 1 mg/l increase (decrease) in TOC TOC rises (falls) by 1.505 mg/l for every 1 mg/l increase (decrease) in BOD BOD rises (falls) by 1.505 mg/l for every 1 mg/l increase (decrease) in TOCarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill


Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage