Assignment #4 - Solutions
pdf
keyboard_arrow_up
School
University of Waterloo *
*We aren’t endorsed by this school
Course
371
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
5
Uploaded by BrigadierAntelopePerson2879
STAT 371 S23 Assignment #4 (Submission deadline: 11:59 pm Mon. Jul. 31) Solutions ( /50) The traffic
time series contains the number of vehicles passing a certain intersection in a city from 7:00am to 8:00am over 608 consecutive days. 1)
[4] Create a time series plot of the daily traffic over this period. Comment on what you see. Daily traffic appears to be increasing (somewhat linearly) over time. The variance also appears to be increasing over time. 2)
[3] Suppose you were to fit a linear trend to this time series. Based on the plot in 1), what model assumption would be violated? The assumption of constant variance. As noted in the previous question, the variance appears to be increasing over time. 3)
[4] One way to stabilize the variance is through an appropriate transformation. Create a time series plot of the log transformed series. Has this transformation helped to stabilize the variance? Yes, the log transformation has appeared to render the variance more constant. Note, however, that the trend now appears to be non-linear.
Use the transformed series for all remaining questions
4)
[5] Fit a linear regression model with a quadratic term to account for the observed (non-linear) trend. Do not fit any other variables just yet. Comment on the fit of the model. t<-c(1:608) tsq<-t^2 traffic.trend.lm<-lm(log(Traffic)~t+tsq) summary(traffic.trend.lm) Call: lm(formula = log(Traffic) ~ t + tsq) Residuals: Min 1Q Median 3Q Max -0.7139 -0.1446 0.0521 0.1496 0.6269 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.550e+00 2.524e-02 101.058 < 2e-16 t 3.613e-03 1.914e-04 18.879 < 2e-16 tsq -2.113e-06 3.043e-07 -6.945 9.8e-12 --- Residual standard error: 0.2067 on 605 degrees of freedom Multiple R-squared: 0.7999, Adjusted R-squared: 0.7993 F-statistic: 1209 on 2 and 605 DF, p-value: < 2.2e-16 As expected, based on the p-values, there is a significant non-linear trend in daily traffic. Approx. 80% of the variation in daily traffic can be explained by this trend. 5)
[4] Create a correlogram (i.e. plot of the acf) of the residuals. Ignoring, of course, the acf at lag 0, What lags are associated with the largest (absolute) auto-correlation? Why might this be expected? Lags 7, 14, and 21 appear to be associated with the largest auto-correlation. This is not surprising, since we know that days of the week are associated with different traffic patterns, which would thus yield large values of k
r
for k = 7, 14, 21, …
6)
[5] Add the Day
variate to the model to address the large autocorrelations observed in the correlogram. Note that the default order for factor levels in R is alpha-numeric. To maintain the appropriate order of days of the week in the output, reorder the days using the following command before fitting the model: Day <- factor(Day,levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")) Compare on the fit of the model and compare to that of the model fit in 4). > Day <- factor(Day,levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")) > traffic.trend.day.lm<-lm(log(Traffic)~t+tsq+Day) > summary(traffic.trend.day.lm) Estimate Std. Error t value Pr(>|t|) (Intercept) 2.616e+00 2.113e-02 123.824 < 2e-16 t 3.626e-03 1.245e-04 29.116 < 2e-16 tsq -2.141e-06 1.980e-07 -10.811 < 2e-16 DayTue 7.232e-02 2.039e-02 3.546 0.000422 DayWed 6.241e-02 2.039e-02 3.060 0.002312 DayThu 2.781e-02 2.039e-02 1.364 0.173187 DayFri -1.044e-02 2.039e-02 -0.512 0.608889 DaySat -2.746e-01 2.045e-02 -13.427 < 2e-16 DaySun -3.449e-01 2.039e-02 -16.911 < 2e-16 Residual standard error: 0.1345 on 599 degrees of freedom Multiple R-squared: 0.9161, Adjusted R-squared: 0.915 F-statistic: 818 on 8 and 599 DF, p-value: < 2.2e-16 Not surprisingly, including of the day of the week has significantly improved the fit of the model (adjusted R-squared value of .915 compared to .7993) 7)
[3] What are the days of the week associated with the lowest traffic? Does this make sense, intuitively? The days associated with the lowest traffic volume (after accounting for the trend) are Sat, and Sun. Yes, this makes intuitive sense since we expect there to be less commuting traffic on weekends. 8)
[4] Plot an acf of the residuals for the model in 6). How has the autocorrelation changed? As expected, previous large autocorrelations at lags 7, 14, and 21 have been largely accounted for by including the day of the week in the model. There still remains significant positive auotocorrelation for the first 15 or so lags.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9)
[5] After accounting for trend and day, the time of year and season may be associated with traffic patterns. Account for this seasonal effect by adding the month to your model (as with the Day
variate, be sure to put the months in the appropriate order. For consistency, use January as the reference month). Comment on the fit of the model and compare to that of the model fit in 6). > Month<-factor(Month,levels =c("Jan","Feb","Mar",
…,
"Oct","Nov","Dec")) > traffic.trend.day.month.lm=lm(log(Traffic)~t+tsq+Day+Month) > summary(traffic.trend.day.month.lm) Call:lm(formula = log(Traffic) ~ t + tsq + Day + Month) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.587e+00 2.576e-02 100.428 < 2e-16 t 3.677e-03 1.421e-04 25.885 < 2e-16 tsq -2.251e-06 2.273e-07 -9.906 < 2e-16 DayMon 8.000e-03 1.946e-02 0.411 0.681093 DaySat -2.643e-01 1.950e-02 -13.552 < 2e-16 DaySun -3.355e-01 1.945e-02 -17.252 < 2e-16 DayThu 3.743e-02 1.945e-02 1.925 0.054703 DayTue 8.187e-02 1.945e-02 4.209 2.96e-05 DayWed 7.171e-02 1.945e-02 3.687 0.000248 MonthFeb 7.800e-02 2.355e-02 3.312 0.000982 MonthMar 6.091e-02 2.312e-02 2.634 0.008652 MonthApr 1.447e-02 2.340e-02 0.618 0.536559 MonthMay 3.189e-02 2.336e-02 1.365 0.172826 MonthJun -5.752e-03 2.381e-02 -0.242 0.809198 MonthJul -7.237e-02 2.921e-02 -2.478 0.013499 MonthAug -4.750e-02 2.934e-02 -1.619 0.106004 MonthSep 5.294e-02 2.969e-02 1.783 0.075036 MonthOct 7.364e-02 2.935e-02 2.509 0.012369 MonthNov 4.594e-02 2.340e-02 1.963 0.050069 MonthDec -4.321e-02 2.307e-02 -1.873 0.061599 Residual standard error: 0.1282 on 588 degrees of freedom Multiple R-squared: 0.9252, Adjusted R-squared: 0.9228 F-statistic: 382.9 on 19 and 588 DF, p-value: < 2.2e-16 Adding the month seems to contribute significantly to the fit of the model, as it yields a higher adjusted R-
squared value (.9228 vs .915).
10)
[4] After accounting for the trend and for the days of the week, which three months are associated with the lowest estimated mean traffic? Provide a reasonable explanation for this. After accounting for trend and day of the week July, August, and December are associated with the lowest estimated mean traffic volumes, which is likely due in part to less commuter traffic in the summer months and during the Christmas holiday season. 11)
[4] Plot a correlogram of the residuals for the model in 9) and compare to the correlogram in 8). Has the inclusion of the month helped to address the autocorrelation in the residuals? Yes, the including of the month has helped to describe the remaining autocorrelation in the time series, as the autocorrelation in the first 10 or 15 lags has been much reduced. 12)
[2] Comment on the assumption of independent errors with reference to the plot in 11). (
Note that at this point we would consider fitting an appropriate ARIMA model to the residual series. ARIMA models are beyond the scope of this course, but are discussed in detail in STAT 443 –
Forecasting
) Due to the appearance of the significant autocorrelation in the residuals at the first few lags, it appears that the assumption of independence has not been met. 13) [3] Can you think of any other variates related to traffic that one might wish to include in the model to provide a better fit? Identify at least one such variate and provide a brief explanation. One possible variate: Weather as a categorical variate. It seems reasonable to assume that days with adverse conditions (ice, snow, strong winds, heavy rain, etc) would be associated with lower traffic volumes.
Related Documents
Recommended textbooks for you

Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage

Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage