HW3

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

443

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by JusticeIron13453

UNIVERSITY OF BRITISH COLUMBIA Department of Statistics Stat 443: Time Series and Forecasting Assignment 3 The assignment is due on Thursday, March 21 at 9:00pm . • Submit your assignment online on canvas.ubc.ca in the pdf format through Gradescope. • This assignment should be completed in RStudio and written up using R Markdown . Display all the R codes used to perform your data analysis. • Please make sure your submission is clear and neat. The student is responsible for the sub- mitted file being in good order (i.e., not corrupted). • Late submission penalty : 1% per hour or fraction of an hour. 1. The file NINO34.csv contains the monthly El Ni˜no 3.4 index from 1870 to 2023. The El Ni˜no 3.4 index represents the average equatorial sea surface temperature (in degrees Celsius) from around the international dateline to the coast of South America. It is often used to define El Ni˜no and La Ni˜na. (a) Perform exploratory data analysis. i. Import the data into R and create a time-series object for the El Ni˜no 3.4 index. Break the time series object into a training and test set. You can use the function window() on a ts object to split the data. Let the training set be from January 1870 to December 2021, and let the test set start in January 2022 and end in November 2023. ii. Plot the training data as well as its acf and pacf. Comment on what you observe. Does the series have a trend? Seasonal variation? Does the time series appear stationary? (b) Forecast sea surface temperature for 2022 and 2023 using the Box-Jenkins method and the data from 1870-2021. i. Remove any seasonal variation and trend from the training data, if there is any, using the stl function in R. Plot the filtered data set, as well as its acf and pacf. ii. Use the standard graphical tools to select the two best candidate models from the family of pure AR and MA processes for the filtered data set, justifying your selec- tions. Fit both models and report their parameter estimates. iii. Compare the AIC values of your two models. Which would you pick based on AIC? iv. Use the tsdiag function in R to plot diagnostics for your two models. What do you observe? v. Predict the sea surface temperature for Jan 2022 through Nov 2023 using both candidate models. Calculate the mean squared prediction error (MSPE) for both models. Which method performs better? 1

vi. On a single plot, display the test set of sea surface temperature, the predictions for both models, and their approximate 95% prediction intervals. (c) Forecast sea surface temperature for 2022 and 2023 using the Holt-Winters method and the data from 1870-2021. i. Use the HoltWinters function in R to fit an appropriate model to the training data. Use this model to predict sea surface temperature from Jan 2022 through Nov 2023. Calculate the mean squared prediction error. How does it compare to the Box-Jenkins models above? ii. On a single plot, display the test set of sea surface temperature, the predictions from your preferred Box-Jenkins model, the predictions from your Holt-Winters model, and relevant 95% prediction intervals. Which method performs better? 2. In this question you will predict the time series of monthly average values of the usual hours worked across all industries in Canada for the period from January 1987 until December 2023, which you explored in Assignment 1, using the file usual hours worked ca.csv . You will use the Box-Jenkins method and Holt-Winters method. Part I. Data Preparation (a) Read in the data and create a time-series object for the mean monthly working hours. Create training and test datasets: i. The training dataset should include all observations up to and including December 2020; ii. The test dataset should include all observations from January 2021 to December 2023. Plot the training data. Part II. Box-Jenkins Method In this part, you will select and fit a SARIMA( p, d, q ) × ( P, D, Q ) s model and make forecasts using the fitted model. (a) Difference the training set time series at lag 1. Plot the new time series and its correlo- gram, and comment on what you observe. (b) Apply seasonal differencing to remove seasonal variation. Plot the resulting differenced time series along with its sample acf and pacf. Comment on what you observe. (c) Based on the results of Part II (a) and (b), specify the values of d , D , and s . (d) Based on the plots in Part II (b), suggest possible values of p, P, q, and Q , justifying your choices. (e) Now use the Akaike’s Information Criterion (AIC) to select the model based on the training dataset in Part I. Fix the values of p and P as your suggestions in Part II (d), and consider q = 0 , 1 , . . . , 5 and Q = 0 , 1 , . . . , 5. Select the values of q and Q according to the AIC values. Fit the model you choose and print the values of the estimated parameters along with the AIC value for the model. 2

(f) Perform the model diagnostics for the model in (e) and comment on the goodness of fit for your chosen model. (g) Predict the mean monthly working hours records for the period from January 2021 to December 2023 based on the model you fit in Part II (f). In one figure, plot the test dataset along with your forecasts and corresponding 95% prediction intervals. (Remem- ber to include a legend for your plot and proper labels for the axes.) Comment on the performance of your forecasting procedure. Part III. Holt-Winters method (a) Use the command HoltWinters() to fit the Holt-Winters filtering based on the training dataset in Part I. Print the values of the estimated parameters. (b) Predict the mean monthly working hours for the period from January 2021 to December 2023 based on the Holt-Winters filtering. In one figure, plot the test dataset along with your forecasts and corresponding 95% prediction intervals. (Remember to include a legend for your plot and proper labels for the axes.) Comment on the performance of your forecasting procedure. (c) Compare your predictions in Part II (g) and Part III (b) using the mean squared predic- tion error (MSPE). Which method do you recommend and why? 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version