Tutorial Wk 8 Simple Linear Reg

.pdf

School

Royal Melbourne Institute of Technology *

*We aren’t endorsed by this school

Course

1607

Subject

Economics

Date

May 29, 2024

Type

pdf

Pages

12

Uploaded by BarristerGalaxy7447

1 File: Tutorial Wk 8 Simple Linear Reg ECON1607 Tutorial Simple Linear Regression Questions for in-class tutorial: The questions covered in the tutorial relate to the material covered in Chapter 12 of Berenson et al. Problems from Berenson et al. 12.3 P420 Different #s in 2010 12.11 P427 [2 Problems involving interpreting Excel output] 1. It seems logical that the more bank accounts there are, the more Automated Teller Machine (ATM) withdrawals there will be. The Reserve Bank of Australia (RBA) has performed a simple regression analysis to predict the number of ATM withdrawals by the number of bank accounts. The Excel output is given below. SUMMARY OUTPUT Regression Statistics Multiple R 0.504 R Square 0.254 Adjusted R Square 0.179 Standard Error 2889.685 Observations 12 ANOVA df SS MS F Significance F Regression 1 28403158.08 28403158.08 3.401461041 0.094926942 Residual 10 83502817.59 8350281.759 Total 11 111905975.7 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept b 0 -47109.624 63582.514 -0.741 0.476 -188780.292 94561.045 Number of accounts ('000) b 1 4.357 2.362 1.844 0.095 -0.907 9.620 a. Write down the estimated regression equation. b. Interpret the slope and intercept coefficients. c. What is the value of r 2 ? Interpret the result. What is the value of adjusted r 2 ? Interpret the result. d. Is “Number of accounts” a significant predictor (use the 0.05 significance level)? e. Predict the number of ATM withdrawals if the number of accounts is 27,700,000.
2 File: Tutorial Wk 8 Simple Linear Reg [Problem involving interpreting Excel output] What are we trying to predict? y = ATM Withdrawals What are we using to predict y? x = number of accounts 1a) Write down the estimated regression equation. P414 Slide 10 = - 47109.624 + 4.357 x number of accounts (‘000) 1b ) Interpret the slope and intercept coefficients. - The intercept term is not meaningful here. When the number of accounts is zero the number of ATM withdrawals is -47,109.624. Obviously, this is not possible. The intercept shows things other than x that affect y. Also, the y intercept is not in the range of observable values so we so not focus on it. S 18 Interpolate not extrapolate - In statistics we are more interested in the slope than the intercept. For the variable number of accounts, the slope means that when the number of accounts is increased by 1(000), the number of ATM withdrawals increases by 4.357 (slope). 1c) What is the value of r 2 ? Interpret the result. On Excel output R Square = r 2 r 2 = 0.254. It means that 25.4% of the variation in the number of ATM withdrawals can be explained by variability in the number of bank accounts. In Finance 25% is considered to be a good r 2 . r 2 is the coefficient of determination p 424 and measures the proportion of the variation in Y that is explained by X r 2 = Amount of y explained by x . r 2 is the portion of the total variation in the dependent variable y that is explained by variation in the independent variable x. Slide 21. How well can x explain y? It is between zero and one, the closer to one, the better the model at prediction. This is related to the correlation coefficient concept. What is the value of adjusted r 2 ? Interpret the result. r 2 adj = 0.179 17.9% of the variation in the number of ATM withdrawals can be explained by variability in the number of bank accounts taking into account the number of independent variables and sample size. 1 d) Is “Number of accounts” a significant predictor (use the 0.05 significance level)? ALTERNATIVE METHOD If the p value < α then reject H 0 . p value is instead of doing the hypothesis test α = 0.05 so if p value < 0.05 then reject H 0 Find and add Reject areas on Z table The p value = 0.095 on the excel output so 0.095 > 0.05 then do not reject H 0 . There is insufficient evidence to conclude that there is a linear relationship between ATM 0 1 ˆ i i Y b b X = + ˆ y
3 File: Tutorial Wk 8 Simple Linear Reg withdrawals and number of accounts . So number of accounts is not a significant predictor at the .05 level of significance. HASCCC = Hypothesis, Alpha, Statistical test, Critical values, Calculate, Conclusion Hypothesis testing is to see if Beta 1 does not equal 0. If the slope equals 0 it means that there is no relationship between x and y . If we reject the null hypothesis we conclude there is evidence of a linear relationship. P437 H 0 : 1 = 0 H 1 : 1 ≠ 0 Level of significance alpha = 0.05 α/2 = 0.025 We always use t test when we test for slope and note that we use n-2 for df Critical values: t crit 10, 0.025 = ± 2.2281 Use n - 2 = 12 – 2 = 10 df and 0.025 column Decision rule: reject H 0 if t calc > 2.2281 or t calc < - 2.2281 Testing a hypothesis for a population slope using the t -test: Slide 45 P438 b 1 is the slope, S b1 is the standard error t calc = ( 4.357 – 0) / 2.362 = 1.844 = t calc This is given in the t stat column . You can find all of this on the Excel output. Since 1.844 < 2.2281 do not reject H 0 . There is insufficient evidence to conclude that there is a linear relationship between ATM withdrawals and number of accounts . So number of accounts is not a significant predictor at the .05 level of significance. Do Not Reject Reject _______________________________ t -2.2281 1.844 2.2281 t calc tcrit 1d) What is the standard error of the estimate, s yx ? What does this value tell us? The standard error of the estimate, s yx = 2889.685. It measures variation about the prediction line and is measured in the same units as the dependent (outcome) variable y which is the number of ATM withdrawals. P427 Slide 25 By itself s yx is meaningless. It must be compared to Y values. 0 b 1 1 1 b b t S b - =
4 File: Tutorial Wk 8 Simple Linear Reg This is another measure of how good the model is. The smaller it is in comparison to the Y values the better. In this case we don’t show Y values so there is nothing to compare it to. 1e) Predict the number of ATM withdrawals if the number of accounts is 27,700,000. = -47,109.624 + 4.357*27,700 = 73,579.28 Remember x is in (000) 2) Port Kembla Golf Club wishes to predict the number of golfers per weekend based upon weather conditions. The number of golfers and temperature is measured over a sample of weekends covering all seasons. SUMMARY OUTPUT Regression Statistics Multiple R 0.380016868 R Square 0.14441282 Adjusted R Square 0.078598421 Standard Error 20.33164649 Observations 15 ANOVA df SS MS F Significance F Regression 1 907.0472938 907.0472938 2.194243557 0.162354813 Residual 13 5373.88604 413.3758492 Total 14 6280.933333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 64.18876657 19.45167504 3.299909465 0.005748721 22.16597751 106.2115556 Temperature 1.304603453 0.880716449 1.48129793 0.162354813 -0.598068759 3.207275664 a. State the estimated simple linear regression equation. b. Interpret the meaning of the intercept term, b 0, , and the slope, b 1 , in this problem. c. Construct a 95% confidence interval estimate of the population slope, . d. At the 0.05 level of significance, determine whether there is a significant relationship between the number of golfers and weather conditions. Use both, critical and p-value approach. e. Predict the mean number of golfers if the temperature is 25 degrees Celsius. f. Interpret the coefficient of determination, r 2 . What are we trying to predict? y = number of golfers What are we using to predict y? x = weather conditions or temperature a. State the estimated simple linear regression equation P414 Slide 10 ˆ y 0 1 ˆ i i Y b b X = + 1 b 0 1 ˆ i i Y b b X = +
5 File: Tutorial Wk 8 Simple Linear Reg = 64.189 + 1.304 x Temperature b. Interpret the meaning of the intercept term b 0 , and the slope b 1 , in this problem. The intercept term = 64.189 which means when the X variable of temperature = 0 the y variable, number of golfers will be 64.189. For the slope, it means that when the temperature is increased by 1, the number of golfers increases by 1.304. c. Construct a 95% confidence interval estimate of the population slope, . Confidence interval estimate of the slope : Slide 51 P440 1.304 ± (2.1604 x 0.8807) = -0.598 ≤ 1 ≤ 3.207 This is on the Excel output d. At the 0.05 level of significance, determine whether there is a significant relationship between the number of golfers and weather conditions. Use both, critical and p-value approach. HASCCC = Hypothesis, Alpha, Statistical test, Critical values, Calculate, Conclusion Hypothesis testing is to see if Beta 1 does not equal 0. If the slope equals 0 it means that there is no relationship between x and y . If we reject the null hypothesis we conclude there is evidence of a linear relationship. P437 H 0 : 1 = 0 H 1 : 1 ≠ 0 Level of significance alpha = 0.05 α/2 = 0.025 We always use t test when we test for slope and note that we use n-2 for df Critical values: t crit 13, 0.025 = ± 2.1604 Use n - 2 = 15 – 2 = 13 df and 0.025 column Decision rule: reject H 0 if t calc > 2.1604 or t calc < - 2.1604 Testing a hypothesis for a population slope using the t -test: Slide 45 P438 b 1 is the slope, S b1 is the standard error t calc = ( 1.305 – 0) / 0.881 = 1.481 = t calc This is given in the t stat column . You can find all of this on the Excel output. Since 1.48 < 2.1604 do not reject H 0 . There is insufficient evidence to conclude that there is a linear relationship between golfers and temperature . Do Not Reject Reject ˆ y 1 b 1 b 1 2 1 n b b t S - ± 0 b 1 1 1 b b t S b - =
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help