5_Ch11_ISE135
pdf
keyboard_arrow_up
School
San Jose State University *
*We aren’t endorsed by this school
Course
135
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
Pages
37
Uploaded by BrigadierMusicDove37
ISE 135 ‐ Fall 2016
1
SIMPLE LINEAR
REGRESSION
Chapter 11
Dr. Supreeta Amin
ISE 135 – Fall 2016
San Jose State University
Department of Industrial and Systems Engineering Introduction
Is there a relationship between the number of hours
and your exam score?
ISE 135 ‐ Fall 2016
2
What is Regression?
Regression
¾
Is the
attempt
to
explain
the
variation
in a dependent
variable using the
variation
in
independent variables
¾
Is thus an
explanation of
causation
¾
Is a technique concerned with
predicting some variables by
knowing others
Regression Models
¾
Relationship between one dependent variable and
explanatory variable(s)
¾
Use equation to set up relationship
Numerical dependent (response) variable
1 or more numerical or categorical independent
(explanatory) variables
¾
Used mainly for prediction & estimation
ISE 135 ‐ Fall 2016
3
Regression
Simple regression
¾
Considers the relation
between a single
explanatory variable and
response variable
X
Æ
Y
Multiple regression
¾
Involve more than one
regressor variable
X
1
, X
2
,… X
n
Æ
Y
Regression
¾
In regression
–
One variable is considered
independent
(=predictor) variable (
X
) and the other the
dependent
(=outcome) variable
Y
¾
The process of predicting variable Y using variable X
–
Uses a variable (x) to predict some outcome variable
(y)
–
Tells you how values in y change as a function of
changes in values of x
¾
If the independent variable(s)
sufficiently explain
the
variation
in the dependent variable, the model
can be used for prediction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
4
Assumptions (or the fine print)
¾
Linear regression assumes that…
–
The relationship between
X and Y is linear
–
Y is distributed
normally
at each value of X
–
The
variance
of Y at every value of X is the
same
(homogeneity of variances)
–
The observations are
independent
Regression
¾
Calculates the “best‐fit” line for a certain set of data
¾
The regression line makes the sum of the squares of
the residuals smaller than for any other line
¾
Regression minimizes residuals
80
100
120
140
160
180
200
220
60
70
80
90
100
110
120
Wt (kg)
ISE 135 ‐ Fall 2016
5
Example
A family doctor wishes to examine the variables that relationship
between a patients’ age and cholesterol. He randomly selects 14 of
his female patients and obtains the data presented in the table
below. Find the least square regression equation and the coefficient
of determination.
Age
Total Cholesterol
Age
Total Cholesterol
25
180
42
185
25
195
48
204
28
186
51
221
32
180
51
243
32
210
58
208
32
197
62
228
38
239
65
269
Regression Analysis
ANOVA
df
SS
MS
F
Significance F
Regression
1
4840.062
4840.062
13.05434
0.00355946
Residual
12
4449.152
370.7627
Total
13
9289.214
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
P‐value < 0.003
Æ
model is reasonable
ISE 135 ‐ Fall 2016
6
0
50
100
150
200
250
300
0
10
20
30
40
Choletsterol
Age
Line Fit Plot
Y
Predicted Y
Linear (Predicted Y)
Regression Plots
‐40
‐20
0
20
40
0
20
40
60
80
Residuals
Age (years)
Residual Plot
Regression Equation
¾
Regression equation describes the regression line
mathematically
–
Intercept (β
0
)
Intercept is the expected mean of independent variable
(y) when dependent variable (x) is 0
–
Slope(β
1
)
β1 represents the change in y when x increases by one
unit
A slope of 2 means that every 1‐unit change in X yields
a 2‐unit change in Y
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
7
Linear Equations
Linear Regression Model
¾
Each pair of observation satisfies the relationship
࢟
ෝ ൌ ࢼ
+
ࢼ
࢞
ࢋ
Dependent
(Response) Variable
Independent (Explanatory)
Variable
Population
Slope
Population
Y‐Intercept
Random
Error
ISE 135 ‐ Fall 2016
8
Estimates of Slope and Intercept
¾
Intercept
ߚ
ߚ
ൌ ݕ
ത െ ߚ
ଵ
ݔ̅
¾
Slope
ߚ
ଵ
ߚ
ଵ
ൌ
∑
ݕ
ݔ
െ
∑
ݕ
∗
ୀଵ
∑
ݔ
ୀଵ
݊
ୀଵ
∑
ݔ
ଶ
െ
∑
ݔ
ୀଵ
ଶ
݊
ୀଵ
ൌ
ܵ
௫௬
ܵ
௫௫
ݕ
ത ൌ
ଵ
∑
ݕ
ୀଵ
and
ݔ̅ ൌ
ଵ
∑
ݔ
ୀଵ
Example
Given:
∑ ݔ
ൌ 589
∑ ݕ
ൌ 2945
ݔ̅ ൌ 42.07
;
ݕ
ത ൌ 210.35
∑ ݔ
ଶ
ൌ 27253
∑ ݕ
ଶ
ൌ 628791
∑ ݔ
ݕ
ൌ 127360
n = 14
Find the equation of the
line
Age
(x)
Total
Cholesterol (y)
Age
(x)
Total
Cholesterol (y)
25
180
42
185
25
195
48
204
28
186
51
221
32
180
51
243
32
210
58
208
32
197
62
228
38
239
65
269
ߚ
ଵ
ൌ
∑
ݕ
ݔ
െ
∑
ݕ
∗
ୀଵ
∑
ݔ
ୀଵ
݊
ୀଵ
∑
ݔ
ଶ
െ
∑
ݔ
ୀଵ
ଶ
݊
ୀଵ
ߚ
ൌ ݕ
ത െ ߚ
ଵ
ݔ̅
ISE 135 ‐ Fall 2016
9
Simple Linear Regression
¾
The output of a simple regression is the coefficient
ߚ
ଵ
and the constant
ߚ
¾
The equation is then:
࢟
ෝ ൌ ࢼ
+
ࢼ
࢞
ࢋ
Where ε is the residual error
¾
ߚ
ଵ
is the per unit change in the dependent variable for
each unit change in the independent variable,
mathematically:
ߚ
ଵ
ൌ
∆ݔ
∆ݕ
Least Square Method
¾
Least squares method
–
A procedure that minimizes the vertical deviations of
plotted points surrounding a straight line
–
Able to construct a best fitting straight line to the scatter
diagram points and then formulate a regression equation
in the form of:
࢟
ෝ ൌ ࢼ
+
ࢼ
࢞
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
10
Simple Linear Regression
The output of a regression is a function that predicts
the dependent variable based upon values of the
independent variables
Simple Linear Regression
The function will make a prediction for each observed
data point
The observation is denoted by y and the prediction is
denoted by
ݕ
ො
ISE 135 ‐ Fall 2016
11
Simple Linear Regression
For each observation, the variation can be described as
ݕ ൌ ݕ
ො ߝ
Actual = Explained + Error
Observation
(measured
Prediction
Sum of Square Error (SSE)
¾
A least squares regression selects the line with the
lowest total sum of squared prediction errors
¾
This value is called the Sum of Squares of Error, or SSE
ISE 135 ‐ Fall 2016
12
Sum of Squares Regression
(SSR)
¾
The Sum of Squares Regression (SSR) is the sum of
the squared differences between the prediction
for each observation and the population mean
Calculating Sum of Squares
Regression
The Total Sum of Squares (SST) is equal to SSR + SSE
Mathematically,
SSR =
∑
ݕ
ො െ ݕ
ത
ଶ
(measure of explained variation)
SSE =
∑
ݕ
െ ݕ
ො
2
(measure of unexplained variation)
SST = SSY = SSR + SSE =
∑
ݕ
െ ݕ
ത
ଶ
(measure of total variation in y)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
13
Example
Given:
ݕ
ത ൌ 210.35
n = 14
Age
(x)
Total
Cholesterol (y)
Age
(x)
Total
Cholesterol (y)
25
180
42
185
25
195
48
204
28
186
51
221
32
180
51
243
32
210
58
208
32
197
62
228
38
239
65
269
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
ISE 135 ‐ Fall 2016
14
Standard Error of Regression
¾
A measure of its variability
–
It can be used in a similar manner to standard deviation,
allowing for prediction intervals
¾
y
±
2 standard errors will provide approximately 95% accuracy,
–
And 3 standard errors will provide a 99% confidence interval
¾
Standard Error is calculated by taking the square root of the
average prediction error
ܵݐܽ݊݀ܽݎ݀ ܧݎݎݎ ൌ
ܵܵܧ
݊ െ ݇
¾
Where n is the number of observations in the sample and k is
the total number of variables in the model
Standard Error
¾
Shows the variability of data around regression
line
¾
Also called
–
Root Mean Square Error (RMSE)
–
Residuals Standard Error
–
Denoted by S in Minitab
࢙
ࢋ
ൌ
∑
࢟
െ ࢟
ෝ
െ
ൌ
ࡿࡿࡱ
െ
ൌ
∑
ࡾࢋ࢙ࢊ࢛ࢇ࢙
െ
ISE 135 ‐ Fall 2016
15
Standard Error
¾
The lower the value of S, the better the model
predicts the response
¾
If you compare different models, the model with
the lowest S value indicates the best fit
¾
Estimate σ
2
–
Standard deviation of the response variable y for
any given value of x
–
Unbiased estimator for σ is called the
Standard
Error of the Estimate
(s
e
)
ߪ
ො
ଶ
ൌ
ܵܵܧ
݊ െ 2
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
Standard Error =
ૠ. ૠ
= 19.2552
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
16
Slope and Intercept Properties
Slope Properties
Intercept Properties
1
1
)
ˆ
(
E
E
E
V
(
ˆ
E
1
)
V
2
S
xx
se
(
ˆ
E
1
)
V
2
S
xx
»
»
¼
º
«
«
¬
ª
±
V
E
E
E
xx
S
x
n
V
E
2
2
0
0
0
1
)
ˆ
(
and
)
ˆ
(
se
(
ˆ
E
0
)
V
2
1
n
±
x
2
S
xx
ª
¬
«
º
¼
»
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
ࡱ
ࢼ
ൌ ࢼ
ൌ . ૡૢ
࢙ࢋ
ࢼ
ൌ ૠ. ૡ
ISE 135 ‐ Fall 2016
17
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
ࡱ
ࢼ
ൌ ࢼ
ൌ . ૢૢ
࢙ࢋ
ࢼ
ൌ . ૡૠ
Inference on Slope and
Intercept
¾
Test the claim that a linear relationship exists
between the explanatory and the response
variables
¾
Do the sample provide sufficient evidence to
support the claim that a linear relationship exists
between the two variables?
¾
If there is
no linear relation
between the response
and explanatory variables
–
Slope
of the true regression line will be
zero
–
Slope of zero
Æ
x does not change our “guess” as
to the value of y
ISE 135 ‐ Fall 2016
18
Hypothesis Testing
Two‐tailed
Left – Tailed
Right –Tailed
H
o
: β
1
= β
1,0
H
1
: β
1
≠ β
1,0
H
o
: β
1
= β
1,0
H
1
: β
1
< β
1,0
H
o
: β
1
= β
1,0
H
1
: β
1
> β
1,0
Test statistic for the
slope
ܶ
ൌ
ߚ
ଵ
െ
β
1,0
ݏ
ܵ
௫௫
൘
ൌ
ߚ
ଵ
െ
β
1,0
ܵ
ሺߚ
ଵ
ሻ
Two‐tailed
Reject the null hypothesis if
|
t
0
| >
t
a/2,
n
– 2
Failure
to reject
H
0
is equivalent to concluding that there
is no linear relationship between
x
and
Y
Fail to reject
H
o
: β
1
= 0
x has little value in explaining
the variation in y
True relationship btw. x and y
is NOT linear
Reject
H
o
: β
1
= 0
Straight line relationship
btw x and y
Although x has a linear effect on y,
the relationship could be better
estimated using a polynomial model
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
19
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
T‐stat = 1.399/0.3872 = 3.613
Confidence Interval about
Slope
¾
Confidence interval
= Point Estimate ± Margin of Error
¾
(1‐ α) * 100% Confidence Interval for the slope of the
true regression line, β
1
, is given by
ߚ
ଵ
േ ݐ
ഀ
మ
⁄ ,ିଶ
∗
௦
ௌ
ೣೣ
¾
(1‐ α) * 100% Confidence Interval for the intercept of
the true regression line, β
0
, is given by
ߚ
േ ݐ
ഀ
మ
⁄ ,ିଶ
∗ ݏ
ଵ
௫̅
మ
ௌ
ೣೣ
ISE 135 ‐ Fall 2016
20
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
UCL (intercept) = 151.49 + 17.08*2.179 = 188.72
LCL (intercept) = 151.49 ‐ 17.08*2.179 = 114.27
Correlation Coefficient(R) and
Coefficient of Determination (R
2
)
¾
R
: Correlation Coefficient
–
Magnitude of the relationship
between the
dependent variable and the best linear
combination of the predictor variables
¾
R
2
:
Coefficient of Determination
–
The
proportion of variation
in Y
accounted
for by
the set of independent variables (X’s)
ISE 135 ‐ Fall 2016
21
Coefficient of Determination
¾
Coefficient of Determination
–
The proportion of total variation (SST) that is explained by
the regression (SSR) is known as the,
–
Often referred to as R
2
ܴ
ଶ
ൌ
ௌௌோ
ௌௌ்
ൌ
ௌௌோ
ௌௌோାௌௌா
ൌ 1 െ
ௌௌா
ௌௌ்
¾
Value of R
2
–
Can
range between 0 and 1
–
The higher its value the more accurate the regression model
–
Often referred to as a percentage
Coefficient of Determination
¾
Adjusted R
2
(or
ܴ
ത
ଶ
ሻ
ܴ
ௗ௨௦௧ௗ
ଶ
ൌ 1 െ
݊ െ 1
݊ െ ݇ െ 1
ሺ1 െ ܴ
ଶ
ሻ
–
Adjusted R‐squared is a modified version of R‐squared
Adjusted based on sample size n and number of
explanatory variables k
–
Compares the explanatory power of regression models
that contain different numbers of predictor
¾
The adjusted R‐squared
–
Increases only if the new term
improves
the model
more
than would be expected by chance
–
Decreases when a
predictor improves
the model by
less
than expected by chance
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
22
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
(R
2
=) 52.01% of the variation in the
response variable is explained by the
least square regression model.
R
2
= 4840.062441/9289.24286
= 0.52104
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
Adjusted R
2
=
1 െ
ିଵ
ିିଵ
1 െ ܴ
ଶ
n = 14, k = 1
Adjusted R
2
=
1 െ
ଵସିଵ
ଵସିଵିଵ
1 െ
0.521041
= 0.481128
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
23
Correlation and Regression
¾
Correlation
(Multiple R) describes the strength of a
linear
relationship between two variables
–
Correlation is not causation
¾
Linear means “straight line”
¾
Regression tells us how to draw the straight line
described by the correlation
Scatter Plots
¾
The scatterplot consists of
–
An ordered pair (x, y) of data points in a
rectangular coordinate system
–
Where x come from a different variable than y
¾
Helps to understand the association between two
variables
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
24
Scatter Plots
EXAMPLE
The following is the data of
maximum concentrations of
CO and O
3
for 2000 in 15 cities
in the U.S.
Draw an scatterplot of CO vs.
O
3
City
CO (ppm)
O3 (ppm)
Atlanta
4.1
0.023
Boston
3.4
0.029
Chicago
8.3
0.032
Dallas
3.5
0.014
Denver
2.1
0.016
Detroit
4.4
0.024
Houston
4.2
0.021
Kansas city
1.8
0.017
Los Angeles
9.5
0.044
New York
9.3
0.038
Philadelphia
5.1
0.028
Pittsburg 2.4
0.025
San Francisco
1.7
0.02
Los Angeles
2.4
0.021
Washington
4.9
0.023
Scatter Plots
EXAMPLE
The following is the data of maximum concentrations of CO and
O3 for 2000 in 15 cities in the U.S.
Draw an scatterplot of CO vs. O3
City
CO (ppm)
O3 (ppm)
Atlanta
4.1
0.023
Boston
3.4
0.029
Chicago
8.3
0.032
Dallas
3.5
0.014
Denver
2.1
0.016
Detroit
4.4
0.024
Houston
4.2
0.021
Kansas city
1.8
0.017
Los Angeles
9.5
0.044
New York
9.3
0.038
Philadelphia
5.1
0.028
Pittsburg 2.4
0.025
San Francisco
1.7
0.02
Los Angeles
2.4
0.021
Washington
4.9
0.023
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
2
4
6
8
10
O
3
CO
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
25
Scatter Plots
If the scatterplot shows a roughly elliptical cloud (instead of a
curved, fan shaped, or clustered cloud) with data points spread
throughout the ellipse, then a conclusion of
linear association
is
reasonable.
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
2
4
6
8
10
O
3
CO
Scatter Plots
If the ellipse
¾
Tilts
upward
to the
right
,
the association is
positive
¾
Tilts
downward
to the
right, the association is
negative
¾
Is
thin
and
elongated
,
the association is
strong
¾
Is
closer to a circle
or is
horizontal
, the
association is
weak
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
2
4
6
8
10
O
3
CO
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
26
Scatter Plots
Examples of different scatterplots
Linear association
Non‐linear association
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
2
4
6
8
10
O
3
CO
Scatter Plots Matrix
¾
The scatterplot matrix are a
set of scatterplots
‒
Coming from data collected
under similar experimental conditions
¾
Constructing a scatterplot matrix is the first step is modeling
associations between variables
EXAMPLE
The following is the data of
maximum concentrations of
air pollutants for 2000 in 15
cities in the U.S.
Draw a scatterplot matrix
City
CO (ppm)
O
3
(ppm)
PM
10
(
P
g/m^3)
SO
2
(ppm)
Atlanta
4.1
0.023
0.11
0.019
Boston
3.4
0.029
0.08
0.03
Chicago
8.3
0.032
0.08
0.075
Dallas
3.5
0.014
0.1
0.047
Denver
2.1
0.016
0.08
0.009
Detroit
4.4
0.024
0.08
0.043
Houston
4.2
0.021
0.12
0.031
Kansas city
1.8
0.017
0.09
0.039
Los Angeles
9.5
0.044
0.11
0.01
New York
9.3
0.038
0.09
0.046
Philadelphia
5.1
0.028
0.1
0.027
Pittsburg 2.4
0.025
0.09
0.086
San Francisco
1.7
0.02
0.05
0.007
Los Angeles
2.4
0.021
0.07
0.011
Washington
4.9
0.023
0.09
0.03
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
27
Scatter Plots
Example
Scatterplot matrix for data of maximum concentrations of air
pollutants for 2000 in 15 cities in the U.S.
What types of associations can you observe from the matrix?
Correlation Coefficient
¾
The Pearson’s correlation coefficient measures the
strength of the
linear association
between two variables
¾
It is computed by the formula:
ܴ ൌ
ܵܵ
ܵܵ
∗
ܵܵ
¾
Where
ܵܵ
ൌ ∑
ܺ
െ ܺ
ത
ሺܻ
െ ܻ
ത
ሻ
ୀଵ
ܵܵ
ൌ
ܺ
െ ܺ
ത
2
ୀଵ
ܵܵ
ൌ
ܻ
െ ܻ
ത
2
ୀଵ
Note that -1<r<1
A positive value of r implies that y increases as x increases
A negative value of r implies that y decreases as x increases
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
28
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.721832
R Square
0.521041
Adjusted R Square
0.481128
Standard Error
19.2552
Observations
14
Coefficients
Standard Error
t Stat
P‐value
Lower 95%
Upper 95%
Intercept
151.4989457
17.08383409
8.867971
1.29E‐06 114.2764688 188.7214226
X Variable 1
1.399006383
0.387206113
3.6130793
0.003559 0.555356737
2.24265603
df
SS
MS
F
Significance F
Regression
1
4840.062441
4840.0624
13.05434
0.00355946
Residual
12
4449.151844
370.76265
Total
13
9289.214286
Multiple R – Pearson’s Correlation
coefficient
Residuals and Residual Plots
¾
Most likely a linear regression will not fit the data
perfectly
¾
The
residual
(ε) for each data point is the
distance
from the data point to the regression line. It is the
error in
prediction ¾
To find the residual (e) of a data point, take the
observed y value and subtract the
predicted ࢟
ෝ
value (y value from the linear regression) (ε = y ‐
ݕ
ොሻ
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
29
Simple Linear Regression
The function will make a prediction for each observed
data point
The observation is denoted by y and the prediction is
denoted by
ݕ
ො
Simple Linear Regression
Residual of a data point
ߝ ൌ ݕ െ ݕ
ො
The sum of the residuals is equal to
zero
.
That is, Σε = 0
Observation
(measured
Prediction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
30
Residuals and Residual Plots
¾
Residuals can be plotted on a scatterplot called a
Residual plot –
The horizontal x‐axis is the same
x-value
as the
original graph
–
The vertical y‐axis is now the
residual
‐40
‐20
0
20
40
0
20
40
60
80
Residuals
Age (years)
Residual Plot
Residuals and Residual Plots
¾
Residuals are normally distributed
–
As y is normal
for any value of x
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
31
Residuals and Residual Plots
¾
When a set of data has a
linear pattern, its residual
plot will have a
random pattern
¾
If a set of data does not
have a linear pattern, its
residual plot will
not be random
, but rather, will
have a
shape
How to Use Residual Plots?
¾
If the residual plot is RANDOM:
Use Linear Regression
¾
If the residual plot is NON‐random:
DO NOT USE Linear Regression
Consider some other type of regression
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
32
Lack of Fit
¾
Lack‐of‐fit when it fails to adequately describe the
functional relationship between the experimental
factors and the response variable
¾
Can occur if important terms from the model such
as interactions or quadratic terms are not included
–
It can also occur if several, unusually large
residuals result from fitting the model
–
When data contain replicates, i.e. multiple
observations with identical x‐values
Lack of Fit
¾
To determine whether the model accurately fits
the data, compare the p‐value
–
P‐value < α
: The model does not fit the data
Model does not accurately fit the data
To get a better model, you may need to add terms
or transform your data
–
P‐value > α
: There is
no evidence
that the model
does not fit the data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
33
Example 1
A family doctor wishes to examine the variables that relationship
between a patients’ age and cholesterol. He randomly selects 14 of
his female patients and obtains the data presented in the table
below. Find the least square regression equation and the coefficient
of determination.
Age
Total Cholesterol
Age
Total Cholesterol
25
180
42
185
25
195
48
204
28
186
51
221
32
180
51
243
32
210
58
208
32
197
62
228
38
239
65
269
Example 1
70
60
50
40
30
20
270
260
250
240
230
220
210
200
190
180
Age (x)
Total Cholesterol (y)
Scatterplot of Total Cholesterol (y) vs Age (x)
n = 14
c = 10 distinct x values (25,28,
32,38,42,48, 51,58,62,65)
df Lack of fit = 10‐2 = 2
df pure error = n‐c = 14‐10 = 4
P‐value = 2.26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
34
Example 2
x
5
5
5
5
10
10
10
10
y
12.3 11.2
12.2
10.9
15.3
16.4
14.6
15.6
x
15
15
15
15
20
20
20
20
y
17.8 18.4
18.8
18.1
19.3
19.5
18.7
19.9
For a particular variety of plant, researchers wanted to develop a
formula for predicting the quality of seeds (in grams) as a function of
density of plants. They conducted a study with four levels
(5, 10, 15,
20)
of the factor x, the number of plants per pot. Four replications
were used for each level of x. The data are given below.
Example 2
n = 16
c = 4 distinct x values (5, 10,
15, 20)
df Lack of fit = c‐2 = 2
df pure error = n‐c = 16‐4 = 12
P‐value = 0.002
20.0
17.5
15.0
12.5
10.0
7.5
5.0
20
18
16
14
12
10
x
y
Scatterplot of y vs x
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
35
Example
x
5
5
5
5
10
10
10
10
y
12.3 11.2
12.2
10.9
15.3
16.4
14.6
15.6
x
15
15
15
15
20
20
20
20
y
17.8 18.4
18.8
18.1
19.3
19.5
18.7
19.9
For a particular variety of plant, researchers wanted to develop a
formula for predicting the quality of seeds (in grams) as a function of
density of plants. They conducted a study with four levels
(5, 10, 15,
20)
of the factor x, the number of plants per pot. Four replications
were used for each level of x. The data are given below.
Example
x = independent variable
Y = dependent variable
n = 16
X
Y
5
12.3
5
11.2
5
12.2
5
10.9
10
15.3
10
16.4
10
14.6
10
15.6
15
17.8
15
18.4
15
18.8
15
18.1
20
19.3
20
19.5
20
18.7
20
19.9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
36
Example
H
o
: β
1
= 0
Slope = 0, no linear
relationship
H
1
: β
1
≠ 0
Slope ≠ 0, linear
relationship
What conclusions do
you draw?
S
e
= 0.930265
Ch 11: In Class Assignment 1
x
y
1
1
2
2
3
3
4
5
5
4
Sum:
1)
State the null and alternative hypothesis
2)
What is the equation of the line?
3)
Compute R
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
ISE 135 ‐ Fall 2016
37
Ch 11: In Class Assignment 2
Student
1
2
3
4
5
6
7
8
9
10
11
12
Test Score, x
70
50
55
60
50
60
50
55
70
70
50
60
Chemistry
Grade, y
90
74
76
85
80
77
79
83
96
91
76
79
The accompanying data represent the chemistry grades for a
random sample of 12 freshmen at a certain college along with their
scores on an intelligence test administered while they were still
seniors in high school.
Ch 11: In Class Assignment 2
1) State the null
and alternative
hypothesis
2) State the
equation of the
line
3) Complete the
table on the
following slide.
4) What
conclusions do
you draw?
(α = 0.05)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help