Lab6

pdf

School

University of Southern California *

*We aren’t endorsed by this school

Course

522

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

10

Uploaded by DoctorLemur3557

Report
DSO 522: Applied Time Series for Forecasting Box-Jenkins aka ARMA Models AAA Washington Case In 1993, AAA Washington was one of the two regional automobile clubs affiliated with the American Automobile Association (AAA or Triple A) operating in Washington State. Club research had consistently shown that the emergency road service benefit was the primary reason that people join AAA. Providing emergency road service was also the club’s single largest operating expense. It was projected that delivering emergency road service would cost $9.5 million, 37% of the club’s annual operating budget, in the next fiscal year. Michael DeCoria objective is to find a way to predict emergency road service call volume for future years.The data on emergency road service call volume is given in AAA.csv . You have recommended Michael to model these patterns using ARIMA models. He offered you a contract as a consultant to help him apply Box-Jenkins models. Task: Study the calls volume auto-correlations and build a Box-Jenkins model for the AAA call volume. Your Action Plan: (1.) Visualize your data; (2.) Find the ACF and PACF; Propose several models based on your analysis of ACF and PACF. (3.) Choose the model that is the most accurate, robust, and the simplest. (4.) Test the assumptions (5.) Provide recommendations to Mr. DeCoria.
(1.) Visualize your data Use the monthly data of the calls volume, Calls , is recorded from May, 1988 till December, 1992. Produce a time plot of the data, label the axis nicely. Comment on the patterns present in the time series. Include your script. Comment on the scatterplot. data = read.csv ( 'AAA.csv' , header = T) data.ts = ts (data[, 3 ], start = c ( 1988 , 5 ), frequency = 12 ) plot (data.ts, main= 'AAA monthly call volume' , ylab= "Call volume" ) The plot shows level, noise, and seasonality. Thus, we have two choices. First, de- seasonalize first and fit the model. Second, don’t de-season first. But look into ACF and PACF to see how to pick the number.Need to pay attention here, when we fit the model in the first senerio, don’t process based on the de-season data but still using the original one to do the fair comparison with the latter one. Which means the only difference will be the number and the order in the (). (2.) Split the time series into training and validation sets. Leave the last 12 months for the testing set. Use the training set to get ACF and PACF plots. Based on these charts, comment on the patterns present in the calls volume. Suggest at least 2 Box-Jenkings models. Provide
your explanation next to each proposed model. (Don’t feel obligated to use the models above.) Include the script, plots, and your comments. library (forecast) ## Registered S3 method overwritten by 'quantmod': ## method from ## as.zoo.data.frame zoo train.ts = window (data.ts, end = c ( 1991 , 12 )) valid.ts = window (data.ts, start = c ( 1992 , 1 )) autoplot (train.ts) par ( mfrow= c ( 2 , 2 )) Acf (train.ts, 36 ) Pacf (train.ts, 36 ) par ( mfrow= c ( 1 , 1 ))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
#The training set shows seasonal part tailed off wave pattern in ACF. PACF shows the first lag cut off (after one long bar no more behind) --> SAR(1). *(1,0,0) #Then we look at the non-seasonal part, which is between the seasonal repeating bar.PACF cut off (drops after the first bar), in ACF can be tail off --> AR(1) (1,0,0)* #or cut off --> ARMA(1,1) or AR(1) + MA(1) dtrain = diff (train.ts, 12 ) par ( mfrow= c ( 2 , 2 )) Acf (dtrain, 36 , main = "" ) Pacf (dtrain, 36 , main = "" ) par ( mfrow= c ( 1 , 1 ))
#no season no trend #no-seasonal part, not sure which one better. Can try different combination (1,0,1)*(0,1,0); (1,0,0)*(0,1,0); (0,0,1)*(0,1,0) (3.) Fit the suggested 2 models and access their accuracy, robustness, assumptions. Visual analysis techniques suffice. Include your script, comments. Report all supporting material. m1 = arima (train.ts, order = c ( 0 , 0 , 0 ), seasonal = list ( order= c ( 0 , 1 , 0 ), period = 12 )) summary (m1) ## ## Call: ## arima(x = train.ts, order = c(0, 0, 0), seasonal = list(order = c(0, 1, 0), ## period = 12)) ## ## ## sigma^2 estimated as 2251271: log likelihood = -279.44, aic = 560.88 ## ## Training set error measures: ## ME RMSE MAE MPE MAPE MASE
ACF1 ## Training set -400.8296 1279.619 829.2158 -2.062086 4.053236 0.7086329 0.3618717 m1.p = forecast (m1, h= 12 ) accuracy (m1.p, valid.ts) ## ME RMSE MAE MPE MAPE MASE ## Training set -400.8296 1279.619 829.2158 -2.062086 4.053236 0.7326442 ## Test set 664.5000 1295.846 975.1667 2.906971 4.384444 0.8615974 ## ACF1 Theil's U ## Training set 0.3618717 NA ## Test set -0.1267855 0.968655 checkresiduals (m1) ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,0,0)(0,1,0)[12] ## Q* = 12.281, df = 9, p-value = 0.1979 ## ## Model df: 0. Total lags used: 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
m2 = arima (train.ts, order = c ( 1 , 0 , 0 ), seasonal = list ( order= c ( 1 , 0 , 0 ), period = 12 )) summary (m2) ## ## Call: ## arima(x = train.ts, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), ## period = 12)) ## ## Coefficients: ## ar1 sar1 intercept ## 0.5002 0.5238 21360.1596 ## s.e. 0.1302 0.1371 582.9503 ## ## sigma^2 estimated as 1390592: log likelihood = -375.7, aic = 759.39 ## ## Training set error measures: ## ME RMSE MAE MPE MAPE MASE ## Training set -66.93572 1179.233 931.6528 -0.6402688 4.43517 0.7961737 ## ACF1 ## Training set 0.07932037 m2.p = forecast (m2, h= 12 ) accuracy (m2.p, valid.ts) ## ME RMSE MAE MPE MAPE MASE ## Training set -66.93572 1179.233 931.6528 -0.6402688 4.435170 0.8231512 ## Test set 436.35692 1326.482 815.2446 1.6439259 3.572109 0.7203001 ## ACF1 Theil's U ## Training set 0.07932037 NA ## Test set 0.30183881 1.00346 checkresiduals (m2)
## ## Ljung-Box test ## ## data: Residuals from ARIMA(1,0,0)(1,0,0)[12] with non-zero mean ## Q* = 4.6145, df = 7, p-value = 0.7069 ## ## Model df: 2. Total lags used: 9 Compared model 1 and model 2, we can see model 2 has a lower MAPE on testing set, although RMSE is slightly higher. When we look into the residuals, model 2’s residuals are more normally distributed, and no significance shows in ACF. The overall spread is pretty random. In a word, assumptions are met better than model 1. Thus, model 2 is a better model. Propose the best model for the calls volume forecasting. Report its estimated accuracy, comment on its robustness. (4.) Use the best model to produce the forecast for 1993. Visualize your results. m = arima (data.ts, order = c ( 1 , 0 , 0 ), seasonal = list ( order= c ( 1 , 0 , 0 ), period = 12 )) m.pred = forecast (m, h= 12 )
m.pred ## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 ## Jan 1993 24055.18 22557.70 25552.67 21764.97 26345.39 ## Feb 1993 22448.91 20752.38 24145.43 19854.29 25043.52 ## Mar 1993 21970.61 20221.78 23719.45 19296.00 24645.23 ## Apr 1993 20663.88 18900.49 22427.26 17967.01 23360.74 ## May 1993 20756.67 18989.18 22524.16 18053.52 23459.81 ## Jun 1993 20977.08 19208.43 22745.74 18272.17 23682.00 ## Jul 1993 21216.07 19447.08 22985.05 18510.64 23921.49 ## Aug 1993 21372.77 19603.70 23141.85 18667.21 24078.34 ## Sep 1993 20889.55 19120.45 22658.65 18183.95 23595.16 ## Oct 1993 21863.37 20094.26 23632.48 19157.75 24568.99 ## Nov 1993 22506.40 20737.29 24275.51 19800.78 25212.02 ## Dec 1993 23998.34 22229.23 25767.45 21292.72 26703.96 autoplot (m.pred) (5.) Write a short 1-2 paragraph report for Mr. DeCoria summarizing your findings. Provide the best model he should use to forecast the calls volume and the estimated error rate he should expect. Include your forecast for the calls volume for January and February 1993.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The best model to predict would be SARIMA(1,0,0)*(1,0,0)12. The prediction error should be expected around 3.57%. Based on this model, the average January and February 1993 prediction would be 24055.18 and 22448.91. To better incorporate the budget planning, we should also consider the range between 21764.97 to 26345.39 for January, and between 19854.29 to 25043.52 for February.