Lab-HW-2

pdf

School

University of Houston, Downtown *

*We aren’t endorsed by this school

Course

5310

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by rosecaraglia

Report
Lab and Hw2 Group ### July, 20 2020 PROBLEM-1: For the Purity Data in Problem 2.7, test if there is a significant linear relationship. Provide details in every step of your test. Interpret your results. Below are the results from Lab-Hw1 setwd ( "C:/Users/rosec/Documents/UHD Files/WorkingDirectory" ) purity<- c ( 86.91 , 89.85 , 90.28 , 86.34 , 92.58 , 87.33 , 86.29 , 91.86 , 95.61 , 89.86 , 96.73 , 99.42 , 98.66 , 96.07 , 93.65 , 87.31 , 95 , 96.85 , 85.2 , 90.56 ) hydro <- c ( 1.02 , 1.11 , 1.43 , 1.11 , 1.01 , 0.95 , 1.11 , 0.87 , 1.43 , 1.02 , 1.46 , 1.55 , 1.55 , 1.55 , 1.4 , 1.15 , 1.01 , 0.99 , 0.95 , 0.98 ) #purity = y = dependent/response variable #hydro = x = independent variable/regressor x<-hydro y<-purity fit= lm (purity ~ hydro) fit ## ## Call: ## lm(formula = purity ~ hydro) ## ## Coefficients: ## (Intercept) hydro ## 77.86 11.80 #slope= 11.80; intercept=77.86 n= length (hydro) n ## [1] 20 Sxx= sum (hydro ^ 2 ) - sum (hydro) ^ 2 / n Sxy= sum (hydro * purity) - sum (hydro) * sum (purity) / n #LS estimates B1H=Sxy / Sxx # 14.8335 B0H= mean (purity) - mean (hydro) * B1H #74.4456 We can test if there is a significant linear relationship using the Hypothesis test as below: #Test 1 Ho: B1=0 vs Ha: B1 <>0 ############################### 1
#MSres res = y - (B0H + x * B1H) SSres = sum (res ^ 2 ) MSres = SSres / (n -2 ) seB1H = sqrt (MSres / Sxx) tstat1 = (B1H -0 ) / seB1H pvalue1 = 2 * pt ( - abs (tstat1), df= n -2 ) pvalue1 ## [1] 0.003291122 From the above test, we get p value (0.003291122)< alpha (0.05). So we reject H0. At 5% significance level, there is enough evidence to reject H0. We conclude that there is a linear relationship between response and regressor. #Test 2 Ho: B0=0 vs Ha: B0 <>0 ############################### seB0H = sqrt (MSres * ( 1 / n + mean (x) ^ 2 / Sxx ) ) tstat0 = (B0H -0 ) / seB0H #two-sided P-value approach pvalue0 = 2 * pt ( - abs (tstat0), df= n -2 ) pvalue0 ## [1] 3.537382e-13 From the above test, we get p value (3.537382e-13) < alpha (0.05). So we reject H0. At 5% significance level, there is enough evidence to reject H0. We conclude that there is a signifcant linear relationship between x & y. Hydro (the regressor variable) is a good predictor of purity (the response variable). PROBLEM-2:Produce an ANOVA table. Interpret. #ANOVA yH=B0H + B1H * x SSr= sum ( (yH - mean (y)) ^ 2 ) MSr= SSr / 1 SSt= SSr + SSres F0=MSr / MSres F0 ## [1] 11.4658 #p-value pf (F0, df1= 1 , df2= n -2 , lower.tail= FALSE ) ## [1] 0.003291122 From the above test, pvalue(0.003291122) < alpha (0.5). At 5% significance level, there is enough evidence to reject H0. We conclude that there is a signifcant linear relationship between x & y. Hydro (the regressor variable) is a good predictor of purity (the response variable) We can also produce the ANOVA table by using anova(). anova ( lm (purity ~ hydro)) ## Analysis of Variance Table ## 2
## Response: purity ## Df Sum Sq Mean Sq F value Pr(>F) ## hydro 1 148.31 148.313 11.466 0.003291 ** ## Residuals 18 232.83 12.935 ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 PROBLEM 3: Are the residuals normal? Produce plot/s and interpret. The residual data of the simple linear regression model is the difference between the observed data of the dependent variable y and the fitted values yH. summary (fit) ## ## Call: ## lm(formula = purity ~ hydro) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.6724 -3.2113 -0.0626 2.5783 7.3037 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 77.863 4.199 18.544 3.54e-13 *** ## hydro 11.801 3.485 3.386 0.00329 ** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 3.597 on 18 degrees of freedom ## Multiple R-squared: 0.3891, Adjusted R-squared: 0.3552 ## F-statistic: 11.47 on 1 and 18 DF, p-value: 0.003291 res= residuals (fit) # resid(object, ...) res ## 1 2 3 4 5 6 ## -2.99033292 -1.11242546 -4.45875448 -4.62242546 2.79767736 -1.74426095 ## 7 8 9 10 11 12 ## -4.67242546 3.72982131 0.87124552 -0.04033292 1.63721468 3.26512214 ## 13 14 15 16 17 18 ## 2.50512214 -0.08487786 -0.73472363 -4.12446658 5.21767736 7.30369793 ## 19 20 ## -3.87426095 1.13170821 Residual standard error: 3.597 on 18 degrees of freedom. We now plot the residual against the variable x = hydrocarbon. plot (x, res, ylab = "Residuals" , xlab = "% hydrocarbon" , main= "Residual plot" ) abline ( 0 , 0 ) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
0.9 1.0 1.1 1.2 1.3 1.4 1.5 -4 -2 0 2 4 6 Residual plot % hydrocarbon Residuals From the above plot, it is seen there is no particular trends observed in the plot. The residuals are evenly located above and below (0,0) line. Since the points in the residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for our data. A density plot for residuals: plot ( density (res)) 4
-10 -5 0 5 10 0.00 0.02 0.04 0.06 0.08 density.default(x = res) N = 20 Bandwidth = 1.731 Density From the above plot, it seems the curve is almost bellshaped. So the residuals satisfy a normal distribution. A quantile normal plot is also a good plot for checking the normality: qqnorm (res) 5
-2 -1 0 1 2 -4 -2 0 2 4 6 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles The above Q-Q plot looks like all the residuals lie on a straight line, which confirms the normal distribution of the residuals. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help