HW1
pdf
keyboard_arrow_up
School
University of British Columbia *
*We aren’t endorsed by this school
Course
443
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
3
Uploaded by CountBoar3716
UNIVERSITY OF BRITISH COLUMBIA
Department of Statistics
Stat 443: Time Series and Forecasting
Assignment 1: Exploratory Data Analysis
The assignment is due on
Thursday, February 1
at
9:00pm
.
•
Submit your assignment online on
canvas.ubc.ca
in the
pdf format
under module “Assign-
ments”.
•
This assignment should be completed in
RStudio
and written up using
R Markdown
.
Display all the R code used to perform your data analysis.
•
Please make sure your submission is clear and neat. It is the student’s responsibility that the
submitted file is in good order (i.e., not corrupted).
•
Remember to properly label all your plots and have them clearly displayed.
•
Late submission penalty
: 1% per hour or fraction of an hour. (In the event of technical
issues with submission, you can email your assignment to the instructor to get a time stamp
but submit on canvas as soon as it becomes possible to make it available for grading.)
1. The file
usual
hours
worked
ca.csv
contains monthly average values of the usual hours
worked across all industries in Canada for the period from January 1987 until December 2023
(data source: Statistics Canada, DOI: https://doi.org/10.25318/1410003401-eng).
(a) Read in the data and create a time-series object. Plot the series and comment on any
features of the data that you observe. In particular, address the following points:
•
Does the series have a trend?
•
Is there seasonal variation, and if so would an additive or multiplicative model be
suitable? Explain your reasoning.
•
Is the series stationary?
Justify referring to the definition of a weakly stationary
stochastic process.
(b) Create training and test datasets. The training dataset should include all observations
up to and including December 2021; this dataset will be used to fit (”train”) the model.
The test dataset should include all observations from January 2022 to December 2023;
this dataset will be used to assess forecast accuracy. You can use the command
window()
on a
ts
object to split the data.
Using a suitable decomposition model and the loess method (
R
function
stl()
), decom-
pose the training series into trend, seasonal, and error components. Plot the resulting
decomposition.
(c) Fit a linear model to the trend component (you can use
R
function
lm()
).
•
Write down the fitted model for the trend component.
•
Does the linear model provide evidence of a trend at the 95% confidence level?
Without doing any further analysis, would you use this trend component to make
predictions? Justify your answer using the linear model results and the trend com-
ponent plot.
1
(d) Predict the monthly average values of the usual hours worked in Canada for the period
from January 2022 to December 2023 using your seasonal decomposition model.
•
Plot your predictions along with the actual observed values (on the same plot). Make
sure to include a legend for your plot.
•
Comment on the performance of your prediction method, explaining why or why not
the method worked well for this data.
•
How could the prediction method be improved?
•
As a statistician, what other information would you like to add to your forecasts in
addition to the point forecasts you produced above?
2. The file
NY
Temperature
Data.csv
contains daily maximum temperature measurements from
1990-01-01 until 2024-01-02.
(source: NOAA website https://www.ncei.noaa.gov/pub/data/ghcn/daily)
In this question, we introduce the
zoo
package which is useful when working with time se-
ries of irregular frequencies or aggregating high frequency data into a lower frequency (e.g.,
aggregating daily data into monthly means or maxima).
(a) Read the data into
R
and create an
R
object called
dat
for the data.
(b) Create
zoo
objects for daily Max Temperature. Create monthly maxima time series. Plot
the monthly maximum temperature series and comment on any features you observe.
Instructions for working with
zoo
objects are given below:
•
Load the
zoo
library using the command
library(‘zoo’)
. If you do not have this
package installed, type
install.packages(’zoo’)
;
•
Use the command
zoo(x, as.Date(dat$Date))
to create
zoo
object
x
;
•
Create monthly maxima from the daily data by using the command
aggregate(x, as.yearmon, FUN=max)
.
(c) Fit a suitable seasonal decomposition model to the
monthly
data using the moving aver-
age smoothing (
R
function
decompose
) and plot the estimates of the trend, seasonal and
error components.
Note that the moving average smoothing decomposition function,
decompose
, will not
work on a
zoo
object, but the loess decomposition function,
stl
, will work. To convert
a
zoo
object into a
ts
object, you can use the
zooreg
function. For example, let
x
be
the monthly temperature series.
•
For the
zoo
object
x
, use the command
x.ts = ts(zooreg(x), start=c(1990,
1), end=c(2024, 1), frequency=12)
.
•
You can use the functions
decompose()
and
window()
on object
x.ts
.
(d) Plot the correlogram for the deseasonalized series of monthly temperature maxima using
the seasonal decomposition model you fit in part (c). Comment on the seriel dependence
of this series.
2
3. In this question you will explore the sampling distribution of the sample autocorrelation
coefficient for a white noise process through a simulation study.
Recall that, for a time
series of length
n
, from a white noise process, the sample autocorrelation coefficient at lag
h
approximately follows a normal distribution with mean
-
1
/n
and variance 1
/n
:
r
h
∼ N
(
-
1
/n,
1
/n
)
for large values of
n
.
To confirm this theoretical fact, conduct the following simulation study for lags
h
= 1 and
h
= 2:
(i) Simulate a time series of length
n
= 2000 from a white noise process
{
Z
t
}
t
∈
Z
with
Z
t
∼ N
(0
,
1) (function
rnorm()
).
(ii) Evaluate
r
h
, the sample autocorrelation coefficient at lag
h
, for
h
= 1 and
h
= 2. Store
these values.
(iii) Repeat steps (i) and (ii)
m
= 8000 times; i.e., generate 8000 time series of length
n
and
for each of them compute
r
1
and
r
2
(you can use
for
loop). You should now have two
vectors of length
m
with estimates
r
1
and
r
2
.
To summarize results of the simulation study, present the following information:
•
Compute the mean and variance of
r
1
and
r
2
values from your simulation study.
•
In two separate figures, plot the two histograms for the sample of
r
1
and
r
2
values
from the simulation study (function
hist()
), add the smoothed version of the histogram
(function
density()
) and the theoretical asymptotic normal density (function
dnorm()
).
Make sure your plots are well-presented, including a suitable title, axes labels, curves of
different type or colour, and a legend.
•
Comment whether there is an agreement between the empirical estimates of the bias,
variance and sampling density of the estimator of the autocorrelation at lag
h
and their
theoretical approximation.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Please show work on paper and circle answer
arrow_forward
tion 2 of 15
Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline
in dollars per gallon varied from state to state and are listed below.
$2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31
Click to download the data in your preferred format.
CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc
Calculate the range of the price of gas. Give your solution to the nearest cent.
range:
dollars per gallon
DELL
&
4.
7
8.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Related Questions
- Please show work on paper and circle answerarrow_forwardtion 2 of 15 Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline in dollars per gallon varied from state to state and are listed below. $2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31 Click to download the data in your preferred format. CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc Calculate the range of the price of gas. Give your solution to the nearest cent. range: dollars per gallon DELL & 4. 7 8.arrow_forward
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell