midterm_review_session
.pdf
keyboard_arrow_up
School
Hong Kong Polytechnic University *
*We aren’t endorsed by this school
Course
273
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
8
Uploaded by lixun73230
STAT 151A
Lab 7: Midterm Review Session
October 6, 2023
Note: there is no submission required for lab 7. This worksheet doesn’t include everything
you need to review for the midterm. Please see the midterm study guide posted on bCourses
for a more comprehensive list of concepts, examples and exercises.
1
Data transformation
Problem 1 Conceptual Review
(a) Why do we transform data?
(b) What is Box-Cox transformation on
X
?
(c) What
p
do you use to correct positive skewness (right skew)? What
p
do you use to
correct negative skewness (left skew)?
(d) A good transformation will make this ratio
UQ
−
M
M
−
LQ
close to 1.
(e) What is Tukey and Mosteller’s bulging rule and how to use it to correct monotone
non-linearity?
Problem 2 Excercise 4.1 - Fox
Creat a graph for the ordinary power transformations
X
→
X
p
for
p
=
−
1
,
0
,
1
,
2
,
3. (When
p
= 0, however, use the log transformation.) Compare the graph to Figure 4.1, and comment
on the similarities and differences between the two families of transformations
x
p
and (
x
p
−
1)
/p
.
1
STAT 151A
Lab 7: Midterm Review Session
October 6, 2023
2
Simple linear regression
Problem 3 SLR review
Consider simple linear regression
y
i
=
β
0
+
β
1
x
i
+
ϵ
i
.
(a) what are the assumptions?
(b) Derive the least squares estimates of
β
0
and
β
1
.
(c) Show that
ˆ
β
0
and
ˆ
β
1
are unbiased. What assumptions are used?
(d) Derive
var
(
ˆ
β
0
),
var
(
ˆ
β
1
) and
cov
(
ˆ
β
0
,
ˆ
β
1
). What assumptions are used?
(e) What is an unbiased estimator for
σ
2
?
Problem 4 TSS, RSS and
R
2
review
Consider simple linear regression
y
i
=
β
0
+
β
1
x
i
+
ϵ
i
under standard linear model assumptions:
(a) What is residual standard error and how to interpret it?
(b) What are total sum of squares, regression sum of squares, and residual sum of squares?
(c) Definition of R-squared and what does it represent?
Problem 5 (SP23 HW)
Consider simple linear regression where there is one response variable
y
and an explanatory
variable
x
and there are
n
subjects with values
y
1
,
·
, y
n
and
x
1
,
· · ·
, x
n
.
(a) What are the estimates for
α
0
and
α
1
if we regress
x
on
y
?
(b) Let
ˆ
β
0
and
ˆ
β
1
be the estimate from regressing
y
on
x
.
Intuition might suggest that
ˆ
α
1
= 1
/
ˆ
β
1
. Is this true?
Problem 6 Excercise 5.9
Show that in simple-regression analysis, the standardized slope coefficient
B
is equal to
the correlation coefficient
r
.
(In general, however, standardized slope coefficients are not
correlations and can be outside of the range [0, 1].)
2
STAT 151A
Lab 7: Midterm Review Session
October 6, 2023
3
Multiple regression
Problem 7 MR Review
Consider multiple regression
⃗
y
=
Xβ
+
⃗
ϵ
.
(a) what are the assumptions?
(b) Derive the least squares estimates of
β
.
(c) Show that
ˆ
β
is unbiased. What assumptions are used?
(d) Derive
cov
(
ˆ
β
). What assumptions are used?
(e) What is an unbiased estimator for
σ
2
?
Problem 8 Other concepts of MR
(a) what is adjusted R-squared? Why
R
2
can only rise?
(b) How do correlated variables impact the regression coefficient?
(c) What are the standardized coefficient and how to interpret them?
Problem 9 True/False (Past midterm)
(a)
R
2
is an effective model selection criterion for deciding the best size for a linear model.
(b) If I assume the data-generating process is
⃗
y
=
Xβ
+
⃗
ϵ
with full rank matrix
X
treated
as fixed, then the following is true:
arg min
||
Xβ
−
⃗
y
||
2
2
= (
X
T
X
)
−
1
X
T
⃗
y
regardless of the distribution of
ϵ
.
(c) The R-squared summary output will always increase if I add more covariates to the
regression.
Problem 10 SP23 midterm
In many data analyses,
⃗
y
observations are collected from various sensors with different mea-
surement variabilities. Let’s say that I know the variability of each sensor such that I can
safely assume the following model:
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
geBoard
Consumable Student Editic
Name: tyr poPrsrey Class:
Date:
Version: J
Algebra I EOC Review #3 WS Due 4-15-21(ALL WORK MUST BE SHOWN) In-person must
get approved by teacher before submitting in Classroom. Online may submit, then resubmit onc
Must show all work or explain steps(in detail) to receive any credit.
1. The data set shows the amount of funds raised and the number of participants in the fundraiser at the Family
House organization branches. Use a graphing calculator to find and graph an equation of the least-squares line
the data.(Linear Regression)
Family House Fundraiser
# of participants
Funds raised (S)
10
15
20
25
13
15
18
490 500 | 550 570 630 520 550 560
Yünu'a fomihy is staving at a campground th
arrow_forward
A magazine identified the following five "maintenance" activities often performed by people as summer approaches; 1. Prep the car for road trips; 2.
Clean up the house or apartment; 3. Groom the garden; 4. Exercise the body; 5. Organize the wardrobe. The following data are the responses of
30 people who were asked, on June 1st, how many of the five they had accomplished. Complete parts (a) through (c) below.
Click the icon to view the data.
(a) Construct frequency and relative frequency distributions.
(Round to the nearest integer as needed.)
Number x Frequency f
0
1
2345
Relative Frequency
%
%
%
%
%
%
f
n
arrow_forward
02-3 data visualalizations HW#3b F21-1. video tutorial
arrow_forward
Using the data in the first image can you give me answers to questions in 2nd image(4,5,6).Can you plz atleast explain how to do 5&6 (6 is the main one) plz help......
arrow_forward
Firsto sub-oarto?
arrow_forward
(b.) Consider the fictitious set of data shown below, where the line through the data is the fitted simple linear
regression line. Sketch a residual plot (It doesn't need to be perfect) on the right side of this graph. What type of
transformation is needed to get a proper SLR model? Write the general form of this new SLR model.
§
arrow_forward
A1
arrow_forward
How does the general classification problem discussed in Section 17-3 differ from the general
problem analyzed with regression, and why does standard regression not work for the
classification problem?
arrow_forward
First sub-orparto?
arrow_forward
Part 1 of 4
Does the length of a surgery patient's stay in the hospital depend on the length of time the
operation took? The table below gives the operative time (in hours) and the length of the hospital
stay (in days) for 10 patients. (Comma separated lists of the data are also provided below the table
to ease in copying the data to R.)
Operative Time (x)
Length of Hospital Stay (y)
5
13
13
12
12
14
17
4
12
3
12
7
15
3
x: 5, 5, 5, 2, 6, 5, 4, 3, 7, 3
y: 13, 13, 12, 12, 14, 17, 12, 12, 15, 7
Conduct a hypothesis test with a 5% level of significance to determine whether or not the operative
time and the length of a patient's hospital stay are correlated.
Step 1: State the null and alternative hypotheses.
Ho:p =
test.)
(So we will be performing a two-tailed
O Rost to forum
arrow_forward
mQEy3GTBUF2ws4S-uw9YbpJvwD7Cw/edit#slide=id.gca47a49899_0_57
1 Digital Notebook
Add-ons Help
Last edit was 19 minutes ago
Background Layout-
Theme
Transition
2. | I
3. | E4
5. | 6 I
7. | 8 9
Unit 3:
Mastery of Learning Assignment #1
Unit 3:
Mastery of Learning Assignment #1
15) Solve each system by substitution.
16) Solve each system by substitution.
-5x - 8y = 17
-2x + 6y = 6
2x - 7y = -17
-7x +8y = -5
arrow_forward
Discuss the importance of a model being well documented.
arrow_forward
Subject: Engineering Data Analysis
arrow_forward
Part 1 of 4
Does the length of a surgery patient's stay in the hospital depend on the length of time the
operation took? The table below gives the operative time (in hours) and the length of the hospital
stay (in days) for 10 patients. (Comma separated lists of the data are also provided below the table
to ease in copying the data to R.)
Operative Time (x)
Length of Hospital Stay (y)
13
13
12
12
6.
14
5
17
4
12
3.
12
7
15
7
x: 5, 5, 5, 2, 6, 5, 4, 3, 7, 3
y: 13, 13, 12, 12, 14, 17, 12, 12, 15, 7
Conduct a hypothesis test with a 5% level of significance to determine whether or not the operative
time and the length of a patient's hospital stay are correlated.
Step 1: State the null and alternative hypotheses.
Ha:p #v 0
v test.)
(So we will be performing a two-tailed
Part 2 of 4
arrow_forward
The November 24, 2001, issue of The Economist published economic data for 15
industrialized nations. Included were the percent changes in gross domestic product (GDP),
industrial production (IP), consumer prices (CP), and producer prices (PP) from Fall 2000
to Fall 2001, and the unemployment rate in Fall 2001 (UNEMP). An economist wants to
construct a model to predict GDP from the other variables. A fit of the model
GDP = , + P,IP + 0,UNEMP + f,CP + P,PP + €
yields the following output:
The regression equation is
GDP = 1.19 + 0.17 IP + 0.18 UNEMP + 0.18 CP – 0.18 PP
Predictor
Coef SE Coef
тР
Constant
1.18957 0.42180 2.82 0.018
IP
0.17326 0.041962 4.13 0.002
UNEMP
0.17918 0.045895 3.90 0.003
CP
0.17591 0.11365 1.55 0.153
PP
-0.18393 0.068808 -2.67 0.023
Predict the percent change in GDP for a country with IP = 0.5, UNEMP = 5.7, CP =
3.0, and PP = 4.1.
a.
b.
If two countries differ in unemployment rate by 1%, by how much would you predict
their percent changes in GDP to differ, other…
arrow_forward
Current Attempt in Progress
Please use the accompanying Excel data set or accompanying Text file data set when completing the following exercise.
An article in Urban Ecosystems, "Urbanization and Warming of Phoenix (Arizona, USA): Impacts, Feedbacks and Mitigation" (2002,
Vol. 6, pp. 183-203), mentions that Phoenix is ideal to study the effects of an urban heat island because it has grown from a
population of 300,000 to approximately 3 million over the last 50 years and this is a period with a continuous, detailed climate
record. The 50-year averages of the mean annual temperatures at eight sites in Phoenix are shown below. Check the assumption of
normality in the population with a probability plot. Construct a 99% confidence interval for the standard deviation over the sites of
the mean annual temperatures.
Site
Average Mean Temperature (°C)
Sky Harbor Airport 23.3
Phoenix Greenway 21.7
Phoenix Encanto 21.6
21.7
Waddell
Litchfield
Laveen
Maricopa
Harlquahala
21.3
i
20.7
20.9
20.1…
arrow_forward
AutoSave On
Assignment 1 in Excel_Updated January 17^J 2021 (1) - Saved -
O Search
ysnsjj njkbj YN
困
File
Home
Insert
Page Layout
Formulas
Data
Review
View
Help
A Share
P Comments
LO Shapes
A SmartArt
A Get Add-ins
de
시
Icons
O. Screenshot v
PivotTable Recommended Table
Pictures
Recommended
Maps PivotChart
3D
Line Column Win/
Slicer Timeline
Link
Comment
Text
Symbols
O 3D Models v
O My Add-ins v
PivotTables
Charts
Мар
Los
Tables
Illustrations
Add-ins
Charts
Tours
Sparklines
Filters
Links
Comments
H199
fx
1/36
A
В
C
E
F
G
H
J
K
M
200
201
202
|5층 53
203
204
205
206
207
208
V205.9225
209
1!
210
211
212
V1.15
213
214
(round to 4 decimal places)
11
215
216
217
218
-12
(1+0.045)2 (round to 4 decimal
places)
219
220
11
221
222
223
224
225
Q8: Compute the following
CLR 2- Obj. 4
L3
Instructions
Questions 1-11
Enter
100%
7:40 PM
O Type here to search
日
a 4) ENG
2/3/2021
arrow_forward
I need detailed visualization and explanations to construct these, i do not need by CHATGPT, solve by hand only.
arrow_forward
Wanting to study the effect of exercise on preventing the common cold, a researcher collects 500 test subjects and randomly assigns them to a treatment group instructed to exercise and a control group instructed to not exercise. He later records the number of colds for each group. This is an example of:
linear regression
an experiment
an observational study
independence
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- geBoard Consumable Student Editic Name: tyr poPrsrey Class: Date: Version: J Algebra I EOC Review #3 WS Due 4-15-21(ALL WORK MUST BE SHOWN) In-person must get approved by teacher before submitting in Classroom. Online may submit, then resubmit onc Must show all work or explain steps(in detail) to receive any credit. 1. The data set shows the amount of funds raised and the number of participants in the fundraiser at the Family House organization branches. Use a graphing calculator to find and graph an equation of the least-squares line the data.(Linear Regression) Family House Fundraiser # of participants Funds raised (S) 10 15 20 25 13 15 18 490 500 | 550 570 630 520 550 560 Yünu'a fomihy is staving at a campground tharrow_forwardA magazine identified the following five "maintenance" activities often performed by people as summer approaches; 1. Prep the car for road trips; 2. Clean up the house or apartment; 3. Groom the garden; 4. Exercise the body; 5. Organize the wardrobe. The following data are the responses of 30 people who were asked, on June 1st, how many of the five they had accomplished. Complete parts (a) through (c) below. Click the icon to view the data. (a) Construct frequency and relative frequency distributions. (Round to the nearest integer as needed.) Number x Frequency f 0 1 2345 Relative Frequency % % % % % % f narrow_forward02-3 data visualalizations HW#3b F21-1. video tutorialarrow_forward
- Using the data in the first image can you give me answers to questions in 2nd image(4,5,6).Can you plz atleast explain how to do 5&6 (6 is the main one) plz help......arrow_forwardFirsto sub-oarto?arrow_forward(b.) Consider the fictitious set of data shown below, where the line through the data is the fitted simple linear regression line. Sketch a residual plot (It doesn't need to be perfect) on the right side of this graph. What type of transformation is needed to get a proper SLR model? Write the general form of this new SLR model. §arrow_forward
- Part 1 of 4 Does the length of a surgery patient's stay in the hospital depend on the length of time the operation took? The table below gives the operative time (in hours) and the length of the hospital stay (in days) for 10 patients. (Comma separated lists of the data are also provided below the table to ease in copying the data to R.) Operative Time (x) Length of Hospital Stay (y) 5 13 13 12 12 14 17 4 12 3 12 7 15 3 x: 5, 5, 5, 2, 6, 5, 4, 3, 7, 3 y: 13, 13, 12, 12, 14, 17, 12, 12, 15, 7 Conduct a hypothesis test with a 5% level of significance to determine whether or not the operative time and the length of a patient's hospital stay are correlated. Step 1: State the null and alternative hypotheses. Ho:p = test.) (So we will be performing a two-tailed O Rost to forumarrow_forwardmQEy3GTBUF2ws4S-uw9YbpJvwD7Cw/edit#slide=id.gca47a49899_0_57 1 Digital Notebook Add-ons Help Last edit was 19 minutes ago Background Layout- Theme Transition 2. | I 3. | E4 5. | 6 I 7. | 8 9 Unit 3: Mastery of Learning Assignment #1 Unit 3: Mastery of Learning Assignment #1 15) Solve each system by substitution. 16) Solve each system by substitution. -5x - 8y = 17 -2x + 6y = 6 2x - 7y = -17 -7x +8y = -5arrow_forwardDiscuss the importance of a model being well documented.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt