HW1

pdf

School

University of British Columbia *

*We aren’t endorsed by this school

Course

443

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by CountBoar3716

UNIVERSITY OF BRITISH COLUMBIA Department of Statistics Stat 443: Time Series and Forecasting Assignment 1: Exploratory Data Analysis The assignment is due on Thursday, February 1 at 9:00pm . • Submit your assignment online on canvas.ubc.ca in the pdf format under module “Assign- ments”. • This assignment should be completed in RStudio and written up using R Markdown . Display all the R code used to perform your data analysis. • Please make sure your submission is clear and neat. It is the student’s responsibility that the submitted file is in good order (i.e., not corrupted). • Remember to properly label all your plots and have them clearly displayed. • Late submission penalty : 1% per hour or fraction of an hour. (In the event of technical issues with submission, you can email your assignment to the instructor to get a time stamp but submit on canvas as soon as it becomes possible to make it available for grading.) 1. The file usual hours worked ca.csv contains monthly average values of the usual hours worked across all industries in Canada for the period from January 1987 until December 2023 (data source: Statistics Canada, DOI: https://doi.org/10.25318/1410003401-eng). (a) Read in the data and create a time-series object. Plot the series and comment on any features of the data that you observe. In particular, address the following points: • Does the series have a trend? • Is there seasonal variation, and if so would an additive or multiplicative model be suitable? Explain your reasoning. • Is the series stationary? Justify referring to the definition of a weakly stationary stochastic process. (b) Create training and test datasets. The training dataset should include all observations up to and including December 2021; this dataset will be used to fit (”train”) the model. The test dataset should include all observations from January 2022 to December 2023; this dataset will be used to assess forecast accuracy. You can use the command window() on a ts object to split the data. Using a suitable decomposition model and the loess method ( R function stl() ), decom- pose the training series into trend, seasonal, and error components. Plot the resulting decomposition. (c) Fit a linear model to the trend component (you can use R function lm() ). • Write down the fitted model for the trend component. • Does the linear model provide evidence of a trend at the 95% confidence level? Without doing any further analysis, would you use this trend component to make predictions? Justify your answer using the linear model results and the trend com- ponent plot. 1

(d) Predict the monthly average values of the usual hours worked in Canada for the period from January 2022 to December 2023 using your seasonal decomposition model. • Plot your predictions along with the actual observed values (on the same plot). Make sure to include a legend for your plot. • Comment on the performance of your prediction method, explaining why or why not the method worked well for this data. • How could the prediction method be improved? • As a statistician, what other information would you like to add to your forecasts in addition to the point forecasts you produced above? 2. The file NY Temperature Data.csv contains daily maximum temperature measurements from 1990-01-01 until 2024-01-02. (source: NOAA website https://www.ncei.noaa.gov/pub/data/ghcn/daily) In this question, we introduce the zoo package which is useful when working with time se- ries of irregular frequencies or aggregating high frequency data into a lower frequency (e.g., aggregating daily data into monthly means or maxima). (a) Read the data into R and create an R object called dat for the data. (b) Create zoo objects for daily Max Temperature. Create monthly maxima time series. Plot the monthly maximum temperature series and comment on any features you observe. Instructions for working with zoo objects are given below: • Load the zoo library using the command library(‘zoo’) . If you do not have this package installed, type install.packages(’zoo’) ; • Use the command zoo(x, as.Date(dat$Date)) to create zoo object x ; • Create monthly maxima from the daily data by using the command aggregate(x, as.yearmon, FUN=max) . (c) Fit a suitable seasonal decomposition model to the monthly data using the moving aver- age smoothing ( R function decompose ) and plot the estimates of the trend, seasonal and error components. Note that the moving average smoothing decomposition function, decompose , will not work on a zoo object, but the loess decomposition function, stl , will work. To convert a zoo object into a ts object, you can use the zooreg function. For example, let x be the monthly temperature series. • For the zoo object x , use the command x.ts = ts(zooreg(x), start=c(1990, 1), end=c(2024, 1), frequency=12) . • You can use the functions decompose() and window() on object x.ts . (d) Plot the correlogram for the deseasonalized series of monthly temperature maxima using the seasonal decomposition model you fit in part (c). Comment on the seriel dependence of this series. 2

3. In this question you will explore the sampling distribution of the sample autocorrelation coefficient for a white noise process through a simulation study. Recall that, for a time series of length n , from a white noise process, the sample autocorrelation coefficient at lag h approximately follows a normal distribution with mean - 1 /n and variance 1 /n : r h ∼ N ( - 1 /n, 1 /n ) for large values of n . To confirm this theoretical fact, conduct the following simulation study for lags h = 1 and h = 2: (i) Simulate a time series of length n = 2000 from a white noise process { Z t } t ∈ Z with Z t ∼ N (0 , 1) (function rnorm() ). (ii) Evaluate r h , the sample autocorrelation coefficient at lag h , for h = 1 and h = 2. Store these values. (iii) Repeat steps (i) and (ii) m = 8000 times; i.e., generate 8000 time series of length n and for each of them compute r 1 and r 2 (you can use for loop). You should now have two vectors of length m with estimates r 1 and r 2 . To summarize results of the simulation study, present the following information: • Compute the mean and variance of r 1 and r 2 values from your simulation study. • In two separate figures, plot the two histograms for the sample of r 1 and r 2 values from the simulation study (function hist() ), add the smoothed version of the histogram (function density() ) and the theoretical asymptotic normal density (function dnorm() ). Make sure your plots are well-presented, including a suitable title, axes labels, curves of different type or colour, and a legend. • Comment whether there is an agreement between the empirical estimates of the bias, variance and sampling density of the estimator of the autocorrelation at lag h and their theoretical approximation. 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version