Lab6
pdf
keyboard_arrow_up
School
University of Southern California *
*We aren’t endorsed by this school
Course
522
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
10
Uploaded by DoctorLemur3557
DSO 522: Applied Time Series for Forecasting
Box-Jenkins aka ARMA Models
AAA Washington Case
In 1993, AAA Washington was one of the two regional automobile clubs affiliated with the
American Automobile Association (AAA or Triple A) operating in Washington State.
Club research had consistently shown that the emergency road service benefit was the
primary reason that people join AAA. Providing emergency road service was also the club’s
single largest operating expense. It was projected that delivering emergency road service
would cost $9.5 million, 37% of the club’s annual operating budget, in the next fiscal year.
Michael DeCoria objective is to find a way to predict emergency road service call volume
for future years.The data on emergency road service call volume is given in
AAA.csv
.
You have recommended Michael to model these patterns using ARIMA models. He offered
you a contract as a consultant to help him apply Box-Jenkins models.
Task:
Study the calls volume auto-correlations and build a Box-Jenkins model for the AAA call
volume.
Your Action Plan:
(1.) Visualize your data;
(2.) Find the ACF and PACF; Propose several models based on your analysis of ACF and
PACF.
(3.) Choose the model that is the most accurate, robust, and the simplest.
(4.) Test the assumptions
(5.) Provide recommendations to Mr. DeCoria.
(1.) Visualize your data
Use the monthly data of the calls volume,
Calls
, is recorded from May, 1988 till December,
1992. Produce a time plot of the data, label the axis nicely. Comment on the patterns
present in the time series. Include your script.
Comment on the scatterplot.
data
=
read.csv
(
'AAA.csv'
,
header =
T)
data.ts
=
ts
(data[,
3
],
start =
c
(
1988
,
5
),
frequency =
12
)
plot
(data.ts,
main=
'AAA monthly call volume'
,
ylab=
"Call volume"
)
The plot shows level, noise, and seasonality. Thus, we have two choices. First, de-
seasonalize first and fit the model. Second, don’t de-season first. But look into ACF and
PACF to see how to pick the number.Need to pay attention here, when we fit the model in
the first senerio, don’t process based on the de-season data but still using the original one
to do the fair comparison with the latter one. Which means the only difference will be the
number and the order in the ().
(2.) Split the time series into training and validation sets. Leave the last 12 months for the
testing set. Use the training set to get ACF and PACF plots. Based on these charts, comment
on the patterns present in the calls volume. Suggest
at least 2
Box-Jenkings models. Provide
your explanation next to each proposed model. (Don’t feel obligated to use the models
above.) Include the script, plots, and your comments.
library
(forecast)
## Registered S3 method overwritten by 'quantmod':
##
method
from
##
as.zoo.data.frame zoo
train.ts
=
window
(data.ts,
end =
c
(
1991
,
12
))
valid.ts
=
window
(data.ts,
start =
c
(
1992
,
1
))
autoplot
(train.ts)
par
(
mfrow=
c
(
2
,
2
))
Acf
(train.ts,
36
)
Pacf
(train.ts,
36
)
par
(
mfrow=
c
(
1
,
1
))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
#The training set shows seasonal part tailed off wave pattern in ACF. PACF
shows the first lag cut off (after one long bar no more behind) --> SAR(1).
*(1,0,0)
#Then we look at the non-seasonal part, which is between the seasonal
repeating bar.PACF cut off (drops after the first bar), in ACF can be tail
off
--> AR(1) (1,0,0)*
#or cut off --> ARMA(1,1) or AR(1) + MA(1)
dtrain
=
diff
(train.ts,
12
)
par
(
mfrow=
c
(
2
,
2
))
Acf
(dtrain,
36
,
main =
""
)
Pacf
(dtrain,
36
,
main =
""
)
par
(
mfrow=
c
(
1
,
1
))
#no season no trend
#no-seasonal part, not sure which one better. Can try different combination
(1,0,1)*(0,1,0); (1,0,0)*(0,1,0); (0,0,1)*(0,1,0)
(3.) Fit the suggested 2 models and access their accuracy, robustness, assumptions. Visual
analysis techniques suffice. Include your script, comments. Report all supporting material.
m1
=
arima
(train.ts,
order =
c
(
0
,
0
,
0
),
seasonal =
list
(
order=
c
(
0
,
1
,
0
),
period
=
12
))
summary
(m1)
##
## Call:
## arima(x = train.ts, order = c(0, 0, 0), seasonal = list(order = c(0, 1,
0),
##
period = 12))
##
##
## sigma^2 estimated as 2251271:
log likelihood = -279.44,
aic = 560.88
##
## Training set error measures:
##
ME
RMSE
MAE
MPE
MAPE
MASE
ACF1
## Training set -400.8296 1279.619 829.2158 -2.062086 4.053236 0.7086329
0.3618717
m1.p
=
forecast
(m1,
h=
12
)
accuracy
(m1.p, valid.ts)
##
ME
RMSE
MAE
MPE
MAPE
MASE
## Training set -400.8296 1279.619 829.2158 -2.062086 4.053236 0.7326442
## Test set
664.5000 1295.846 975.1667
2.906971 4.384444 0.8615974
##
ACF1 Theil's U
## Training set
0.3618717
NA
## Test set
-0.1267855
0.968655
checkresiduals
(m1)
##
##
Ljung-Box test
##
## data:
Residuals from ARIMA(0,0,0)(0,1,0)[12]
## Q* = 12.281, df = 9, p-value = 0.1979
##
## Model df: 0.
Total lags used: 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
m2
=
arima
(train.ts,
order =
c
(
1
,
0
,
0
),
seasonal =
list
(
order=
c
(
1
,
0
,
0
),
period
=
12
))
summary
(m2)
##
## Call:
## arima(x = train.ts, order = c(1, 0, 0), seasonal = list(order = c(1, 0,
0),
##
period = 12))
##
## Coefficients:
##
ar1
sar1
intercept
##
0.5002
0.5238
21360.1596
## s.e.
0.1302
0.1371
582.9503
##
## sigma^2 estimated as 1390592:
log likelihood = -375.7,
aic = 759.39
##
## Training set error measures:
##
ME
RMSE
MAE
MPE
MAPE
MASE
## Training set -66.93572 1179.233 931.6528 -0.6402688 4.43517 0.7961737
##
ACF1
## Training set 0.07932037
m2.p
=
forecast
(m2,
h=
12
)
accuracy
(m2.p, valid.ts)
##
ME
RMSE
MAE
MPE
MAPE
MASE
## Training set -66.93572 1179.233 931.6528 -0.6402688 4.435170 0.8231512
## Test set
436.35692 1326.482 815.2446
1.6439259 3.572109 0.7203001
##
ACF1 Theil's U
## Training set 0.07932037
NA
## Test set
0.30183881
1.00346
checkresiduals
(m2)
##
##
Ljung-Box test
##
## data:
Residuals from ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
## Q* = 4.6145, df = 7, p-value = 0.7069
##
## Model df: 2.
Total lags used: 9
Compared model 1 and model 2, we can see model 2 has a lower MAPE on testing set,
although RMSE is slightly higher. When we look into the residuals, model 2’s residuals are
more normally distributed, and no significance shows in ACF. The overall spread is pretty
random. In a word, assumptions are met better than model 1. Thus, model 2 is a better
model.
Propose the best model for the calls volume forecasting. Report its estimated accuracy,
comment on its robustness.
(4.) Use the best model to produce the forecast for 1993. Visualize your results.
m
=
arima
(data.ts,
order =
c
(
1
,
0
,
0
),
seasonal =
list
(
order=
c
(
1
,
0
,
0
),
period =
12
))
m.pred
=
forecast
(m,
h=
12
)
m.pred
##
Point Forecast
Lo 80
Hi 80
Lo 95
Hi 95
## Jan 1993
24055.18 22557.70 25552.67 21764.97 26345.39
## Feb 1993
22448.91 20752.38 24145.43 19854.29 25043.52
## Mar 1993
21970.61 20221.78 23719.45 19296.00 24645.23
## Apr 1993
20663.88 18900.49 22427.26 17967.01 23360.74
## May 1993
20756.67 18989.18 22524.16 18053.52 23459.81
## Jun 1993
20977.08 19208.43 22745.74 18272.17 23682.00
## Jul 1993
21216.07 19447.08 22985.05 18510.64 23921.49
## Aug 1993
21372.77 19603.70 23141.85 18667.21 24078.34
## Sep 1993
20889.55 19120.45 22658.65 18183.95 23595.16
## Oct 1993
21863.37 20094.26 23632.48 19157.75 24568.99
## Nov 1993
22506.40 20737.29 24275.51 19800.78 25212.02
## Dec 1993
23998.34 22229.23 25767.45 21292.72 26703.96
autoplot
(m.pred)
(5.) Write a short 1-2 paragraph report for Mr. DeCoria summarizing your findings. Provide
the best model he should use to forecast the calls volume and the estimated error rate he
should expect. Include your forecast for the calls volume for January and February 1993.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The best model to predict would be SARIMA(1,0,0)*(1,0,0)12. The prediction error should
be expected around 3.57%. Based on this model, the average January and February 1993
prediction would be 24055.18 and 22448.91. To better incorporate the budget planning,
we should also consider the range between 21764.97 to 26345.39 for January, and
between 19854.29 to 25043.52 for February.