Homework-_5,6,7

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

436

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

6

Uploaded by MateCrabMaster1890

Report
Homework #5,6,7 load ( "/Users/nika/Downloads/AmusementPark.RData" ) HW#5 Question #1 Run the multiple regression using the following independent variables: weekend, num.child, distance, rides, games, wait, and clean. #Check the prescense of the skewed information. hist (AmusementPark $ distance) Histogram of AmusementPark$distance AmusementPark$distance Frequency 0 50 100 150 200 0 50 100 150 200 250 #The given data is skewed, that’s why we need to create a new column by log transformation. AmusementPark $ logdistance= log (AmusementPark $ distance) head (AmusementPark) ## weekend num.child distance rides games wait clean overall logdistance ## 1 yes 0 114.64826 87 73 60 89 47 4.741869 ## 2 yes 2 27.01410 87 78 76 87 65 3.296359 1 Name: Petriashin Timofei id: 201902011
## 3 no 1 63.30098 85 80 70 88 61 4.147901 ## 4 yes 0 25.90993 88 72 66 89 37 3.254626 ## 5 no 4 54.71831 84 87 74 87 68 4.002198 ## 6 no 5 22.67934 81 79 48 79 27 3.121454 hist (AmusementPark $ logdistance) Histogram of AmusementPark$logdistance AmusementPark$logdistance Frequency 1 0 1 2 3 4 5 0 20 40 60 80 100 out1= lm ( formula= overall ~ weekend + num.child + logdistance + rides + games + wait + clean, data= AmusementPark) summary (out1) ## ## Call: ## lm(formula = overall ~ weekend + num.child + logdistance + rides + ## games + wait + clean, data = AmusementPark) ## ## Residuals: ## Min 1Q Median 3Q Max ## -24.0445 -6.3945 0.1814 6.6075 26.8350 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -142.05801 7.29448 -19.475 < 2e-16 *** ## weekendyes -0.72862 0.81633 -0.893 0.3725 ## num.child 3.60719 0.27174 13.274 < 2e-16 *** ## logdistance 1.04196 0.41413 2.516 0.0122 * ## rides 0.61876 0.12198 5.073 5.57e-07 *** 2
## games 0.13813 0.05915 2.335 0.0199 * ## wait 0.56225 0.04094 13.734 < 2e-16 *** ## clean 0.92165 0.13705 6.725 4.89e-11 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 9.066 on 492 degrees of freedom ## Multiple R-squared: 0.6786, Adjusted R-squared: 0.674 ## F-statistic: 148.4 on 7 and 492 DF, p-value: < 2.2e-16 #Estimated Regression line: -142.05801-0.71862weekendyes+3.60719num.child+1.04196logdistance+0.61876rides+0.13813games+0.56225wait+ 0.92165clean Question #2 Run the multiple regression using the following independent variables: rides, games, wait, and clean. out2= lm ( formula= overall ~ rides + games + wait + clean, data= AmusementPark) summary (out2) ## ## Call: ## lm(formula = overall ~ rides + games + wait + clean, data = AmusementPark) ## ## Residuals: ## Min 1Q Median 3Q Max ## -29.944 -6.841 1.072 7.167 28.618 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -131.40919 8.33377 -15.768 < 2e-16 *** ## rides 0.52908 0.14207 3.724 0.000219 *** ## games 0.15334 0.06908 2.220 0.026903 * ## wait 0.55333 0.04781 11.573 < 2e-16 *** ## clean 0.98421 0.15987 6.156 1.54e-09 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 10.59 on 495 degrees of freedom ## Multiple R-squared: 0.5586, Adjusted R-squared: 0.5551 ## F-statistic: 156.6 on 4 and 495 DF, p-value: < 2.2e-16 #Estimated Regression line: -131.40919+0.52908rides+0.15334games+0.55333wait+0.98421clean Question #3 Compare the estimated coe ffi cients of rides, games, wait, and clean between the two regressions above. anova (out1, out2) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
## Analysis of Variance Table ## ## Model 1: overall ~ weekend + num.child + logdistance + rides + games + ## wait + clean ## Model 2: overall ~ rides + games + wait + clean ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 492 40436 ## 2 495 55532 -3 -15096 61.225 < 2.2e-16 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Question #4 Why are they not very di ff erent? (with vs. without weekend, num.child, and distance) #There is no big di ff erence between the estimated coe ffi cints due to the low correlation rate between weekend, num.child, distance and overall. Thus, these variables do not influence the result that much. HW#6 \vspace{12pt} **Question #1** Run a regression model that allows you to check whether the e ff ects are di ff erent for the customers who live closer vs. further to the park. #To this end, use the median value of distance to define a new variable live with two values: “close” and “far”. #Find the median median (AmusementPark $ logdistance) ## [1] 2.945439 NewDistance = rep ( "Nothing" , dim (AmusementPark)[ 1 ]) NewDistance[AmusementPark $ logdistance < 2.45439 ] = "Close" NewDistance[AmusementPark $ logdistance >= 2.45439 ] = "Far" #Convert a column into a factor column NewDistance= as.factor (NewDistance) #Make a regression model out3= lm ( formula= overall ~ as.factor (NewDistance), data= AmusementPark) summary (out3) ## ## Call: ## lm(formula = overall ~ as.factor(NewDistance), data = AmusementPark) ## ## Residuals: ## Min 1Q Median 3Q Max ## -45.768 -10.834 -0.768 10.232 48.232 ## ## Coefficients: 4
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 50.034 1.309 38.213 <2e-16 *** ## as.factor(NewDistance)Far 1.734 1.558 1.113 0.266 ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 15.87 on 498 degrees of freedom ## Multiple R-squared: 0.002479, Adjusted R-squared: 0.0004763 ## F-statistic: 1.238 on 1 and 498 DF, p-value: 0.2664 Question #2 The p-value is too low. It means that that we can reject the Null Hypothesis. HW#7 Question#1 Run a regression model that allows you to check whether the e ff ects are di ff erent for the customers who have a child or not. #To this end, use a newly defined variable child with the values: “No” and “Yes”. NewChild = rep ( "Nothing" , dim (AmusementPark)[ 1 ]) NewChild[AmusementPark $ num.child == 0 ] = "No" NewChild[AmusementPark $ num.child > 0 ] = "Yes" #Convert a column into a factor column NewChild= as.factor (NewChild) #Make a regression model out4= lm ( formula= overall ~ as.factor (NewChild), data= AmusementPark) summary (out4) ## ## Call: ## lm(formula = overall ~ as.factor(NewChild), data = AmusementPark) ## ## Residuals: ## Min 1Q Median 3Q Max ## -48.845 -8.845 -0.845 9.344 44.155 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 40.656 1.162 34.99 <2e-16 *** ## as.factor(NewChild)Yes 15.190 1.391 10.92 <2e-16 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 14.28 on 498 degrees of freedom ## Multiple R-squared: 0.1933, Adjusted R-squared: 0.1917 ## F-statistic: 119.3 on 1 and 498 DF, p-value: < 2.2e-16 5
Question #2 Discuss the regression results. We can notice that the p-value is less than the significance value of 0.05, which also suggests that there are statistically significant relationships between the variables, ‘Overall’, and ‘Children’. It is more than likely that people with children would tend to visit the amusement park. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help