5_Ch11_ISE135

pdf

School

San Jose State University *

*We aren’t endorsed by this school

Course

135

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

37

Uploaded by BrigadierMusicDove37

Report
ISE 135 ‐ Fall 2016 1 SIMPLE LINEAR REGRESSION Chapter 11 Dr. Supreeta Amin ISE 135 – Fall 2016 San Jose State University Department of Industrial and Systems Engineering Introduction Is there a relationship between the number of hours and your exam score?
ISE 135 ‐ Fall 2016 2 What is Regression? Regression ¾ Is the attempt to explain the variation in a dependent variable using the variation in independent variables ¾ Is thus an explanation of causation ¾ Is a technique concerned with predicting some variables by knowing others Regression Models ¾ Relationship between one dependent variable and explanatory variable(s) ¾ Use equation to set up relationship Numerical dependent (response) variable 1 or more numerical or categorical independent (explanatory) variables ¾ Used mainly for prediction & estimation
ISE 135 ‐ Fall 2016 3 Regression Simple regression ¾ Considers the relation between a single explanatory variable and response variable X Æ Y Multiple regression ¾ Involve more than one regressor variable X 1 , X 2 ,… X n Æ Y Regression ¾ In regression One variable is considered independent (=predictor) variable ( X ) and the other the dependent (=outcome) variable Y ¾ The process of predicting variable Y using variable X Uses a variable (x) to predict some outcome variable (y) Tells you how values in y change as a function of changes in values of x ¾ If the independent variable(s) sufficiently explain the variation in the dependent variable, the model can be used for prediction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 4 Assumptions (or the fine print) ¾ Linear regression assumes that… The relationship between X and Y is linear Y is distributed normally at each value of X The variance of Y at every value of X is the same (homogeneity of variances) The observations are independent Regression ¾ Calculates the “best‐fit” line for a certain set of data ¾ The regression line makes the sum of the squares of the residuals smaller than for any other line ¾ Regression minimizes residuals 80 100 120 140 160 180 200 220 60 70 80 90 100 110 120 Wt (kg)
ISE 135 ‐ Fall 2016 5 Example A family doctor wishes to examine the variables that relationship between a patients’ age and cholesterol. He randomly selects 14 of his female patients and obtains the data presented in the table below. Find the least square regression equation and the coefficient of determination. Age Total Cholesterol Age Total Cholesterol 25 180 42 185 25 195 48 204 28 186 51 221 32 180 51 243 32 210 58 208 32 197 62 228 38 239 65 269 Regression Analysis ANOVA df SS MS F Significance F Regression 1 4840.062 4840.062 13.05434 0.00355946 Residual 12 4449.152 370.7627 Total 13 9289.214 SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 P‐value < 0.003 Æ model is reasonable
ISE 135 ‐ Fall 2016 6 0 50 100 150 200 250 300 0 10 20 30 40 Choletsterol Age Line Fit Plot Y Predicted Y Linear (Predicted Y) Regression Plots ‐40 ‐20 0 20 40 0 20 40 60 80 Residuals Age (years) Residual Plot Regression Equation ¾ Regression equation describes the regression line mathematically Intercept (β 0 ) Intercept is the expected mean of independent variable (y) when dependent variable (x) is 0 Slope(β 1 ) β1 represents the change in y when x increases by one unit A slope of 2 means that every 1‐unit change in X yields a 2‐unit change in Y
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 7 Linear Equations Linear Regression Model ¾ Each pair of observation satisfies the relationship ෝ ൌ ࢼ + ൅ ࢋ Dependent (Response) Variable Independent (Explanatory) Variable Population Slope Population Y‐Intercept Random Error
ISE 135 ‐ Fall 2016 8 Estimates of Slope and Intercept ¾ Intercept ߚ ߚ ൌ ݕ ത െ ߚ ݔ̅ ¾ Slope ߚ ߚ ݕ ݔ ݕ ௜ୀଵ ݔ ௜ୀଵ ݊ ௜ୀଵ ݔ ݔ ௜ୀଵ ݊ ௜ୀଵ ܵ ௫௬ ܵ ௫௫ ݕ ത ൌ ݕ ௜ୀଵ and ݔ̅ ൌ ݔ ௜ୀଵ Example Given: ∑ ݔ ൌ 589 ∑ ݕ ൌ 2945 ݔ̅ ൌ 42.07 ; ݕ ത ൌ 210.35 ∑ ݔ ൌ 27253 ∑ ݕ ൌ 628791 ∑ ݔ ݕ ൌ 127360 n = 14 Find the equation of the line Age (x) Total Cholesterol (y) Age (x) Total Cholesterol (y) 25 180 42 185 25 195 48 204 28 186 51 221 32 180 51 243 32 210 58 208 32 197 62 228 38 239 65 269 ߚ ݕ ݔ ݕ ௜ୀଵ ݔ ௜ୀଵ ݊ ௜ୀଵ ݔ ݔ ௜ୀଵ ݊ ௜ୀଵ ߚ ൌ ݕ ത െ ߚ ݔ̅
ISE 135 ‐ Fall 2016 9 Simple Linear Regression ¾ The output of a simple regression is the coefficient ߚ and the constant ߚ ¾ The equation is then: ෝ ൌ ࢼ + ൅ ࢋ Where ε is the residual error ¾ ߚ is the per unit change in the dependent variable for each unit change in the independent variable, mathematically: ߚ ∆ݔ ∆ݕ Least Square Method ¾ Least squares method A procedure that minimizes the vertical deviations of plotted points surrounding a straight line Able to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of: ෝ ൌ ࢼ +
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 10 Simple Linear Regression The output of a regression is a function that predicts the dependent variable based upon values of the independent variables Simple Linear Regression The function will make a prediction for each observed data point The observation is denoted by y and the prediction is denoted by ݕ
ISE 135 ‐ Fall 2016 11 Simple Linear Regression For each observation, the variation can be described as ݕ ൌ ݕ ො ൅ ߝ Actual = Explained + Error Observation (measured Prediction Sum of Square Error (SSE) ¾ A least squares regression selects the line with the lowest total sum of squared prediction errors ¾ This value is called the Sum of Squares of Error, or SSE
ISE 135 ‐ Fall 2016 12 Sum of Squares Regression (SSR) ¾ The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean Calculating Sum of Squares Regression The Total Sum of Squares (SST) is equal to SSR + SSE Mathematically, SSR = ݕ ො െ ݕ (measure of explained variation) SSE = ݕ െ ݕ 2 (measure of unexplained variation) SST = SSY = SSR + SSE = ݕ െ ݕ (measure of total variation in y)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 13 Example Given: ݕ ത ൌ 210.35 n = 14 Age (x) Total Cholesterol (y) Age (x) Total Cholesterol (y) 25 180 42 185 25 195 48 204 28 186 51 221 32 180 51 243 32 210 58 208 32 197 62 228 38 239 65 269 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286
ISE 135 ‐ Fall 2016 14 Standard Error of Regression ¾ A measure of its variability It can be used in a similar manner to standard deviation, allowing for prediction intervals ¾ y ± 2 standard errors will provide approximately 95% accuracy, And 3 standard errors will provide a 99% confidence interval ¾ Standard Error is calculated by taking the square root of the average prediction error ܵݐܽ݊݀ܽݎ݀ ܧݎݎ݋ݎ ൌ ܵܵܧ ݊ െ ݇ ¾ Where n is the number of observations in the sample and k is the total number of variables in the model Standard Error ¾ Shows the variability of data around regression line ¾ Also called Root Mean Square Error (RMSE) Residuals Standard Error Denoted by S in Minitab െ ࢟ ࢔ െ ૛ ࡿࡿࡱ ࢔ െ ૛ ࡾࢋ࢙࢏ࢊ࢛ࢇ࢒࢙ ࢔ െ ૛
ISE 135 ‐ Fall 2016 15 Standard Error ¾ The lower the value of S, the better the model predicts the response ¾ If you compare different models, the model with the lowest S value indicates the best fit ¾ Estimate σ 2 Standard deviation of the response variable y for any given value of x Unbiased estimator for σ is called the Standard Error of the Estimate (s e ) ߪ ܵܵܧ ݊ െ 2 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 Standard Error = ૜ૠ૙. ૠ૟ = 19.2552
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 16 Slope and Intercept Properties Slope Properties Intercept Properties 1 1 ) ˆ ( E E E V ( ˆ E 1 ) V 2 S xx se ( ˆ E 1 ) V 2 S xx » » ¼ º « « ¬ ª ± V E E E xx S x n V E 2 2 0 0 0 1 ) ˆ ( and ) ˆ ( se ( ˆ E 0 ) V 2 1 n ± x 2 S xx ª ¬ « º ¼ » Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 ൌ ࢼ ൌ ૚૞૚. ૝ૡૢ ࢙ࢋ ൌ ૚ૠ. ૙ૡ૜
ISE 135 ‐ Fall 2016 17 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 ൌ ࢼ ൌ ૚. ૜ૢૢ ࢙ࢋ ൌ ૙. ૜ૡૠ Inference on Slope and Intercept ¾ Test the claim that a linear relationship exists between the explanatory and the response variables ¾ Do the sample provide sufficient evidence to support the claim that a linear relationship exists between the two variables? ¾ If there is no linear relation between the response and explanatory variables Slope of the true regression line will be zero Slope of zero Æ x does not change our “guess” as to the value of y
ISE 135 ‐ Fall 2016 18 Hypothesis Testing Two‐tailed Left – Tailed Right –Tailed H o : β 1 = β 1,0 H 1 : β 1 ≠ β 1,0 H o : β 1 = β 1,0 H 1 : β 1 < β 1,0 H o : β 1 = β 1,0 H 1 : β 1 > β 1,0 Test statistic for the slope ܶ ߚ β 1,0 ݏ ܵ ௫௫ ߚ β 1,0 ܵ ሺߚ Two‐tailed Reject the null hypothesis if | t 0 | > t a/2, n – 2 Failure to reject H 0 is equivalent to concluding that there is no linear relationship between x and Y Fail to reject H o : β 1 = 0 x has little value in explaining the variation in y True relationship btw. x and y is NOT linear Reject H o : β 1 = 0 Straight line relationship btw x and y Although x has a linear effect on y, the relationship could be better estimated using a polynomial model
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 19 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 T‐stat = 1.399/0.3872 = 3.613 Confidence Interval about Slope ¾ Confidence interval = Point Estimate ± Margin of Error ¾ (1‐ α) * 100% Confidence Interval for the slope of the true regression line, β 1 , is given by ߚ േ ݐ ⁄ ,௡ିଶ ೣೣ ¾ (1‐ α) * 100% Confidence Interval for the intercept of the true regression line, β 0 , is given by ߚ േ ݐ ⁄ ,௡ିଶ ∗ ݏ ௫̅ ೣೣ
ISE 135 ‐ Fall 2016 20 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 UCL (intercept) = 151.49 + 17.08*2.179 = 188.72 LCL (intercept) = 151.49 ‐ 17.08*2.179 = 114.27 Correlation Coefficient(R) and Coefficient of Determination (R 2 ) ¾ R : Correlation Coefficient Magnitude of the relationship between the dependent variable and the best linear combination of the predictor variables ¾ R 2 : Coefficient of Determination The proportion of variation in Y accounted for by the set of independent variables (X’s)
ISE 135 ‐ Fall 2016 21 Coefficient of Determination ¾ Coefficient of Determination The proportion of total variation (SST) that is explained by the regression (SSR) is known as the, Often referred to as R 2 ܴ ௌௌோ ௌௌ் ௌௌோ ௌௌோାௌௌா ൌ 1 െ ௌௌா ௌௌ் ¾ Value of R 2 Can range between 0 and 1 The higher its value the more accurate the regression model Often referred to as a percentage Coefficient of Determination ¾ Adjusted R 2 (or ܴ ܴ ௔ௗ௝௨௦௧௘ௗ ൌ 1 െ ݊ െ 1 ݊ െ ݇ െ 1 ሺ1 െ ܴ Adjusted R‐squared is a modified version of R‐squared Adjusted based on sample size n and number of explanatory variables k Compares the explanatory power of regression models that contain different numbers of predictor ¾ The adjusted R‐squared Increases only if the new term improves the model more than would be expected by chance Decreases when a predictor improves the model by less than expected by chance
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 22 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 (R 2 =) 52.01% of the variation in the response variable is explained by the least square regression model. R 2 = 4840.062441/9289.24286 = 0.52104 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 Adjusted R 2 = 1 െ ௡ିଵ ௡ି௞ିଵ 1 െ ܴ n = 14, k = 1 Adjusted R 2 = 1 െ ଵସିଵ ଵସିଵିଵ 1 െ 0.521041 = 0.481128
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 23 Correlation and Regression ¾ Correlation (Multiple R) describes the strength of a linear relationship between two variables Correlation is not causation ¾ Linear means “straight line” ¾ Regression tells us how to draw the straight line described by the correlation Scatter Plots ¾ The scatterplot consists of An ordered pair (x, y) of data points in a rectangular coordinate system Where x come from a different variable than y ¾ Helps to understand the association between two variables
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 24 Scatter Plots EXAMPLE The following is the data of maximum concentrations of CO and O 3 for 2000 in 15 cities in the U.S. Draw an scatterplot of CO vs. O 3 City CO (ppm) O3 (ppm) Atlanta 4.1 0.023 Boston 3.4 0.029 Chicago 8.3 0.032 Dallas 3.5 0.014 Denver 2.1 0.016 Detroit 4.4 0.024 Houston 4.2 0.021 Kansas city 1.8 0.017 Los Angeles 9.5 0.044 New York 9.3 0.038 Philadelphia 5.1 0.028 Pittsburg 2.4 0.025 San Francisco 1.7 0.02 Los Angeles 2.4 0.021 Washington 4.9 0.023 Scatter Plots EXAMPLE The following is the data of maximum concentrations of CO and O3 for 2000 in 15 cities in the U.S. Draw an scatterplot of CO vs. O3 City CO (ppm) O3 (ppm) Atlanta 4.1 0.023 Boston 3.4 0.029 Chicago 8.3 0.032 Dallas 3.5 0.014 Denver 2.1 0.016 Detroit 4.4 0.024 Houston 4.2 0.021 Kansas city 1.8 0.017 Los Angeles 9.5 0.044 New York 9.3 0.038 Philadelphia 5.1 0.028 Pittsburg 2.4 0.025 San Francisco 1.7 0.02 Los Angeles 2.4 0.021 Washington 4.9 0.023 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 2 4 6 8 10 O 3 CO
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 25 Scatter Plots If the scatterplot shows a roughly elliptical cloud (instead of a curved, fan shaped, or clustered cloud) with data points spread throughout the ellipse, then a conclusion of linear association is reasonable. 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 2 4 6 8 10 O 3 CO Scatter Plots If the ellipse ¾ Tilts upward to the right , the association is positive ¾ Tilts downward to the right, the association is negative ¾ Is thin and elongated , the association is strong ¾ Is closer to a circle or is horizontal , the association is weak 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 2 4 6 8 10 O 3 CO
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 26 Scatter Plots Examples of different scatterplots Linear association Non‐linear association 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 2 4 6 8 10 O 3 CO Scatter Plots Matrix ¾ The scatterplot matrix are a set of scatterplots Coming from data collected under similar experimental conditions ¾ Constructing a scatterplot matrix is the first step is modeling associations between variables EXAMPLE The following is the data of maximum concentrations of air pollutants for 2000 in 15 cities in the U.S. Draw a scatterplot matrix City CO (ppm) O 3 (ppm) PM 10 ( P g/m^3) SO 2 (ppm) Atlanta 4.1 0.023 0.11 0.019 Boston 3.4 0.029 0.08 0.03 Chicago 8.3 0.032 0.08 0.075 Dallas 3.5 0.014 0.1 0.047 Denver 2.1 0.016 0.08 0.009 Detroit 4.4 0.024 0.08 0.043 Houston 4.2 0.021 0.12 0.031 Kansas city 1.8 0.017 0.09 0.039 Los Angeles 9.5 0.044 0.11 0.01 New York 9.3 0.038 0.09 0.046 Philadelphia 5.1 0.028 0.1 0.027 Pittsburg 2.4 0.025 0.09 0.086 San Francisco 1.7 0.02 0.05 0.007 Los Angeles 2.4 0.021 0.07 0.011 Washington 4.9 0.023 0.09 0.03
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 27 Scatter Plots Example Scatterplot matrix for data of maximum concentrations of air pollutants for 2000 in 15 cities in the U.S. What types of associations can you observe from the matrix? Correlation Coefficient ¾ The Pearson’s correlation coefficient measures the strength of the linear association between two variables ¾ It is computed by the formula: ܴ ൌ ܵܵ ௑௒ ܵܵ ܵܵ ¾ Where ܵܵ ௑௒ ൌ ∑ ܺ െ ܺ ሺܻ െ ܻ ௜ୀଵ ܵܵ ൌ ෍ ܺ െ ܺ 2 ௜ୀଵ ܵܵ ൌ ෍ ܻ െ ܻ 2 ௜ୀଵ Note that -1<r<1 A positive value of r implies that y increases as x increases A negative value of r implies that y decreases as x increases
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 28 Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.721832 R Square 0.521041 Adjusted R Square 0.481128 Standard Error 19.2552 Observations 14 Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 151.4989457 17.08383409 8.867971 1.29E‐06 114.2764688 188.7214226 X Variable 1 1.399006383 0.387206113 3.6130793 0.003559 0.555356737 2.24265603 df SS MS F Significance F Regression 1 4840.062441 4840.0624 13.05434 0.00355946 Residual 12 4449.151844 370.76265 Total 13 9289.214286 Multiple R – Pearson’s Correlation coefficient Residuals and Residual Plots ¾ Most likely a linear regression will not fit the data perfectly ¾ The residual (ε) for each data point is the distance from the data point to the regression line. It is the error in prediction ¾ To find the residual (e) of a data point, take the observed y value and subtract the predicted value (y value from the linear regression) (ε = y ‐ ݕ ොሻ
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 29 Simple Linear Regression The function will make a prediction for each observed data point The observation is denoted by y and the prediction is denoted by ݕ Simple Linear Regression Residual of a data point ߝ ൌ ݕ െ ݕ The sum of the residuals is equal to zero . That is, Σε = 0 Observation (measured Prediction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 30 Residuals and Residual Plots ¾ Residuals can be plotted on a scatterplot called a Residual plot The horizontal x‐axis is the same x-value as the original graph The vertical y‐axis is now the residual ‐40 ‐20 0 20 40 0 20 40 60 80 Residuals Age (years) Residual Plot Residuals and Residual Plots ¾ Residuals are normally distributed As y is normal for any value of x
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 31 Residuals and Residual Plots ¾ When a set of data has a linear pattern, its residual plot will have a random pattern ¾ If a set of data does not have a linear pattern, its residual plot will not be random , but rather, will have a shape How to Use Residual Plots? ¾ If the residual plot is RANDOM: Use Linear Regression ¾ If the residual plot is NON‐random: DO NOT USE Linear Regression Consider some other type of regression
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 32 Lack of Fit ¾ Lack‐of‐fit when it fails to adequately describe the functional relationship between the experimental factors and the response variable ¾ Can occur if important terms from the model such as interactions or quadratic terms are not included It can also occur if several, unusually large residuals result from fitting the model When data contain replicates, i.e. multiple observations with identical x‐values Lack of Fit ¾ To determine whether the model accurately fits the data, compare the p‐value P‐value < α : The model does not fit the data Model does not accurately fit the data To get a better model, you may need to add terms or transform your data P‐value > α : There is no evidence that the model does not fit the data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 33 Example 1 A family doctor wishes to examine the variables that relationship between a patients’ age and cholesterol. He randomly selects 14 of his female patients and obtains the data presented in the table below. Find the least square regression equation and the coefficient of determination. Age Total Cholesterol Age Total Cholesterol 25 180 42 185 25 195 48 204 28 186 51 221 32 180 51 243 32 210 58 208 32 197 62 228 38 239 65 269 Example 1 70 60 50 40 30 20 270 260 250 240 230 220 210 200 190 180 Age (x) Total Cholesterol (y) Scatterplot of Total Cholesterol (y) vs Age (x) n = 14 c = 10 distinct x values (25,28, 32,38,42,48, 51,58,62,65) df Lack of fit = 10‐2 = 2 df pure error = n‐c = 14‐10 = 4 P‐value = 2.26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 34 Example 2 x 5 5 5 5 10 10 10 10 y 12.3 11.2 12.2 10.9 15.3 16.4 14.6 15.6 x 15 15 15 15 20 20 20 20 y 17.8 18.4 18.8 18.1 19.3 19.5 18.7 19.9 For a particular variety of plant, researchers wanted to develop a formula for predicting the quality of seeds (in grams) as a function of density of plants. They conducted a study with four levels (5, 10, 15, 20) of the factor x, the number of plants per pot. Four replications were used for each level of x. The data are given below. Example 2 n = 16 c = 4 distinct x values (5, 10, 15, 20) df Lack of fit = c‐2 = 2 df pure error = n‐c = 16‐4 = 12 P‐value = 0.002 20.0 17.5 15.0 12.5 10.0 7.5 5.0 20 18 16 14 12 10 x y Scatterplot of y vs x
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 35 Example x 5 5 5 5 10 10 10 10 y 12.3 11.2 12.2 10.9 15.3 16.4 14.6 15.6 x 15 15 15 15 20 20 20 20 y 17.8 18.4 18.8 18.1 19.3 19.5 18.7 19.9 For a particular variety of plant, researchers wanted to develop a formula for predicting the quality of seeds (in grams) as a function of density of plants. They conducted a study with four levels (5, 10, 15, 20) of the factor x, the number of plants per pot. Four replications were used for each level of x. The data are given below. Example x = independent variable Y = dependent variable n = 16 X Y 5 12.3 5 11.2 5 12.2 5 10.9 10 15.3 10 16.4 10 14.6 10 15.6 15 17.8 15 18.4 15 18.8 15 18.1 20 19.3 20 19.5 20 18.7 20 19.9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 36 Example H o : β 1 = 0 Slope = 0, no linear relationship H 1 : β 1 ≠ 0 Slope ≠ 0, linear relationship What conclusions do you draw? S e = 0.930265 Ch 11: In Class Assignment 1 x y 1 1 2 2 3 3 4 5 5 4 Sum: 1) State the null and alternative hypothesis 2) What is the equation of the line? 3) Compute R 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ISE 135 ‐ Fall 2016 37 Ch 11: In Class Assignment 2 Student 1 2 3 4 5 6 7 8 9 10 11 12 Test Score, x 70 50 55 60 50 60 50 55 70 70 50 60 Chemistry Grade, y 90 74 76 85 80 77 79 83 96 91 76 79 The accompanying data represent the chemistry grades for a random sample of 12 freshmen at a certain college along with their scores on an intelligence test administered while they were still seniors in high school. Ch 11: In Class Assignment 2 1) State the null and alternative hypothesis 2) State the equation of the line 3) Complete the table on the following slide. 4) What conclusions do you draw? (α = 0.05)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help