HW3
pdf
keyboard_arrow_up
School
University of British Columbia *
*We aren’t endorsed by this school
Course
443
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
3
Uploaded by JusticeIron13453
UNIVERSITY OF BRITISH COLUMBIA
Department of Statistics
Stat 443: Time Series and Forecasting
Assignment 3
The assignment is due on
Thursday, March 21
at
9:00pm
.
•
Submit your assignment online on
canvas.ubc.ca
in the
pdf format
through Gradescope.
•
This assignment should be completed in
RStudio
and written up using
R Markdown
.
Display all the R codes used to perform your data analysis.
•
Please make sure your submission is clear and neat. The student is responsible for the sub-
mitted file being in good order (i.e., not corrupted).
•
Late submission penalty
: 1% per hour or fraction of an hour.
1. The file
NINO34.csv
contains the monthly El Ni˜no 3.4 index from 1870 to 2023. The El Ni˜no
3.4 index represents the average equatorial sea surface temperature (in degrees Celsius) from
around the international dateline to the coast of South America. It is often used to define El
Ni˜no and La Ni˜na.
(a) Perform exploratory data analysis.
i. Import the data into R and create a time-series object for the El Ni˜no 3.4 index.
Break the time series object into a training and test set. You can use the function
window()
on a
ts
object to split the data. Let the training set be from January 1870
to December 2021, and let the test set start in January 2022 and end in November
2023.
ii. Plot the training data as well as its acf and pacf. Comment on what you observe.
Does the series have a trend?
Seasonal variation?
Does the time series appear
stationary?
(b) Forecast sea surface temperature for 2022 and 2023 using the Box-Jenkins method and
the data from 1870-2021.
i. Remove any seasonal variation and trend from the training data, if there is any,
using the
stl
function in R. Plot the filtered data set, as well as its acf and pacf.
ii. Use the standard graphical tools to select the
two
best candidate models from the
family of pure AR and MA processes for the filtered data set, justifying your selec-
tions. Fit both models and report their parameter estimates.
iii. Compare the AIC values of your two models. Which would you pick based on AIC?
iv. Use the
tsdiag
function in R to plot diagnostics for your two models. What do you
observe?
v. Predict the sea surface temperature for Jan 2022 through Nov 2023 using both
candidate models.
Calculate the mean squared prediction error (MSPE) for both
models. Which method performs better?
1
vi. On a single plot, display the test set of sea surface temperature, the predictions for
both models, and their approximate 95% prediction intervals.
(c) Forecast sea surface temperature for 2022 and 2023 using the Holt-Winters method and
the data from 1870-2021.
i. Use the
HoltWinters
function in R to fit an appropriate model to the training
data. Use this model to predict sea surface temperature from Jan 2022 through Nov
2023.
Calculate the mean squared prediction error.
How does it compare to the
Box-Jenkins models above?
ii. On a single plot, display the test set of sea surface temperature, the predictions from
your preferred Box-Jenkins model, the predictions from your Holt-Winters model,
and relevant 95% prediction intervals. Which method performs better?
2. In this question you will predict the time series of monthly average values of the usual hours
worked across all industries in Canada for the period from January 1987 until December 2023,
which you explored in Assignment 1, using the file
usual
hours
worked
ca.csv
. You will use
the Box-Jenkins method and Holt-Winters method.
Part I. Data Preparation
(a) Read in the data and create a time-series object for the mean monthly working hours.
Create training and test datasets:
i. The training dataset should include all observations up to and including December
2020;
ii. The test dataset should include all observations from January 2021 to December
2023.
Plot the training data.
Part II. Box-Jenkins Method
In this part, you will select and fit a SARIMA(
p, d, q
)
×
(
P, D, Q
)
s
model and make forecasts
using the fitted model.
(a) Difference the training set time series at lag 1. Plot the new time series and its correlo-
gram, and comment on what you observe.
(b) Apply seasonal differencing to remove seasonal variation. Plot the resulting differenced
time series along with its sample acf and pacf. Comment on what you observe.
(c) Based on the results of Part II (a) and (b), specify the values of
d
,
D
, and
s
.
(d) Based on the plots in Part II (b), suggest possible values of
p, P, q,
and
Q
, justifying your
choices.
(e) Now use the Akaike’s Information Criterion (AIC) to select the model based on the
training dataset in Part I. Fix the values of
p
and
P
as your suggestions in Part II (d),
and consider
q
= 0
,
1
, . . . ,
5 and
Q
= 0
,
1
, . . . ,
5. Select the values of
q
and
Q
according
to the AIC values.
Fit the model you choose and print the values of the estimated
parameters along with the AIC value for the model.
2
(f) Perform the model diagnostics for the model in (e) and comment on the goodness of fit
for your chosen model.
(g) Predict the mean monthly working hours records for the period from January 2021 to
December 2023 based on the model you fit in Part II (f).
In one figure, plot the test
dataset along with your forecasts and corresponding 95% prediction intervals. (Remem-
ber to include a legend for your plot and proper labels for the axes.) Comment on the
performance of your forecasting procedure.
Part III. Holt-Winters method
(a) Use the command
HoltWinters()
to fit the Holt-Winters filtering based on the training
dataset in Part I. Print the values of the estimated parameters.
(b) Predict the mean monthly working hours for the period from January 2021 to December
2023 based on the Holt-Winters filtering. In one figure, plot the test dataset along with
your forecasts and corresponding 95% prediction intervals.
(Remember to include a
legend for your plot and proper labels for the axes.) Comment on the performance of
your forecasting procedure.
(c) Compare your predictions in Part II (g) and Part III (b) using the mean squared predic-
tion error (MSPE). Which method do you recommend and why?
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Install RStudio: Begin by installing RStudio on your computer. If you haven't done so, please refer to the official RStudio website for download and installation instructions.
Watch the Tutorial Video: Watch the provided video tutorial that explains how to run RStudio. Pay close attention to the steps for opening and managing data files. https://www.youtube.com/watch?v=RhJp6vSZ7z0
Open RStudio: Once RStudio is installed, open the application.
Load the Dataset: In RStudio, open a data file named "mtcars". To do this, type the command mtcars in the script editor and run the command.
Attach the Data: Next, attach the dataset using the command attach(mtcars).
Examine the Variables: Carefully review and note the names of all variables in the dataset. Examples of these variables include:
Mileage (mpg)
Number of Cylinders (cyl)
Displacement (disp)
Horsepower (hp)
Research: Google to understand these variables.
Statistical Analysis: Select mpg variable, and perform the following…
arrow_forward
Draw a histogram for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.
arrow_forward
Download the file interceptor.ipynb. Instructions how to download notebook files are posted here. Open this file in Jupyter Notebook and play the interceptor game. Once
you win, enter below the data displayed on the final game screen.
You can also click on the Binder button below to launch an interactive version of the interceptor notebook. The game may take a a couple minutes to load and it may run slowly in
Binder.
launch binder
Enter the intial target position in the form [p1, p2]:
50
40-
30-
20-
10-
time: 3.20
target distance: 211.95
missile en route
0
-300
-200
-100
0
100
200
300
Select missile launch time to
arrow_forward
Find the five-number summary of the data. Be sure to include the screenshot of excel of your answers and formulas/command that you use.
arrow_forward
Please show work on paper and circle answer
arrow_forward
A pediatrician records the age x (in yr) and average height y (in inches) for girls between the ages of 2 and 10.
Height of Girls vs. Age
50-
40-
(4,38)
30-
10-
0
Age(yr)
Part: 0 / 4
Part 1 of 4
(a) Use the points (4, 38) and (8, 50) to write a linear model for these data.
X
y =
Skip Part
Check
Height (in.)
(8,50)
Ⓒ2022 McGraw Hill LLC. All Rights Reserve
A
arrow_forward
Create a side-by-side boxplot for vitamin D level vs. NewAge and a side-
by-side boxplot for vitamin D level vs. country.
Create a scatterplot to show the relationship between vitamin D level
and Age.
Compare these two side-by-side boxplots and the scatterplot and explain
your findings.
• Note: Write appropriate captions for the tables, graphs, and outputs.
arrow_forward
Aplicaciones
M Gmail
YouTube
Maps
Noticias G Traducir
T&content_id%3D
* Question Completion Status:
The following set of data represents the number of orders filled by a national-chain restaurant during a two week period. Construct a five number summary
for the the data.
66, 75, 68, 89, 86, 73, 67, 75, 75, 82, 85, 74, 67, 61
(Round to the nearest hundredth, if needed).
Min
Lower Quartile
Median
Upper Quartile
Maximum
What is the range and the interquartile range (IQR)?
Range
Interquartile Range (1QR)
local, family-owned restaurant also gathered data for two weeks of orders. The following set of data represents the number of orders filled by this
Save All Ans
Click Save and Submit to save and submit. Click Save All Answers to save all answers.
Relative
Reading - Mapp.pdf
ANY
Worksheet - Py....docx
W
Worksheet - W....docx
* MLK Letter -2.pdf
ACIC
四国07A|
útv
DIC.
11
arrow_forward
I need help with this problem please.
arrow_forward
Make a frequency distribution for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.
arrow_forward
Discribe a specific time in real life where a population being surveyed is appropriate. What data are you trying to collect, and give reasons why surveying the population is appropriate.
1. Describe the situation
2. Support why surveying the population is appropriate, using details.
arrow_forward
KINDLY PLEASE ANSWER THIS IN PRECISE AND ACCURATE MANNER AND PLEASE WRITE OR TYPE LEGIBLY THANK YOU SO MUCH FOR FOLLOWING THE INSTRUCTIONS.
Write a paragraph or two that interprets and analyzes each data set represented in tabular/graphical forms. Aside from data interpretation, explain whether the data presentation effectively communicates the information.
arrow_forward
Please solve with the full step and no reject thank u
arrow_forward
please answer all of the questions in the pictures!
arrow_forward
Create a set of data that is represented by the box plot below.
arrow_forward
Please produce descriptive statistics for this data set
arrow_forward
Determine whether the value is from a discrete or continuous data set.Number of bacteria in a petri dish is 12,120Is the value from a discrete or continuous data set?
arrow_forward
IQR for data set
41, 49, 55, 82, 84, 85, 93, 103, 113, 121, 126, 127, 136, 136, 155, 166, 169, 178, 193, 204, 445
arrow_forward
The accompanying table represents the average new home size, in square feet, in a country over various years. Construct a display that best represents these data.
arrow_forward
The price per share of stock for a sample of 25 companies was recorded at the beginning of the first financial quarter and then again at the end of the first financial quarter. How stocks perform during the first
quarter is an indicator of what is ahead for the stock market and the economy. Use the sample data in the file StockQuarter to answer the following.
Click on the datafile logo to reference the data.
DATA file
a. Let di denote the percentage change in price per share for company i where
7.46
Confidence interval
Interpret this result.
The stock market experiences a decline
Company
Bank of New York Mellon
Kraft Foods
E.I. du Pont de Nemours and Company
Consolidated Edison
) (to 3 decimals)
Johnson & Johnson
Union Pacific
Comcast
Applied Materials.
Pfizer
General Electric
AT&T
Cisco Systems
Home Depot
✔
JP Morgan Chase
Procter & Gamble
Verizon
Devon Energy
Lilly
Microsoft
Coca Cola
Qualcomm
Exxon Mobil Corporation
PG&E Corporation
Oracle Corporation
Chevron
end of 1st quarter…
arrow_forward
Describe the three primary charts and graphs used to organize and display data.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- Install RStudio: Begin by installing RStudio on your computer. If you haven't done so, please refer to the official RStudio website for download and installation instructions. Watch the Tutorial Video: Watch the provided video tutorial that explains how to run RStudio. Pay close attention to the steps for opening and managing data files. https://www.youtube.com/watch?v=RhJp6vSZ7z0 Open RStudio: Once RStudio is installed, open the application. Load the Dataset: In RStudio, open a data file named "mtcars". To do this, type the command mtcars in the script editor and run the command. Attach the Data: Next, attach the dataset using the command attach(mtcars). Examine the Variables: Carefully review and note the names of all variables in the dataset. Examples of these variables include: Mileage (mpg) Number of Cylinders (cyl) Displacement (disp) Horsepower (hp) Research: Google to understand these variables. Statistical Analysis: Select mpg variable, and perform the following…arrow_forwardDraw a histogram for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.arrow_forwardDownload the file interceptor.ipynb. Instructions how to download notebook files are posted here. Open this file in Jupyter Notebook and play the interceptor game. Once you win, enter below the data displayed on the final game screen. You can also click on the Binder button below to launch an interactive version of the interceptor notebook. The game may take a a couple minutes to load and it may run slowly in Binder. launch binder Enter the intial target position in the form [p1, p2]: 50 40- 30- 20- 10- time: 3.20 target distance: 211.95 missile en route 0 -300 -200 -100 0 100 200 300 Select missile launch time toarrow_forward
- Find the five-number summary of the data. Be sure to include the screenshot of excel of your answers and formulas/command that you use.arrow_forwardPlease show work on paper and circle answerarrow_forwardA pediatrician records the age x (in yr) and average height y (in inches) for girls between the ages of 2 and 10. Height of Girls vs. Age 50- 40- (4,38) 30- 10- 0 Age(yr) Part: 0 / 4 Part 1 of 4 (a) Use the points (4, 38) and (8, 50) to write a linear model for these data. X y = Skip Part Check Height (in.) (8,50) Ⓒ2022 McGraw Hill LLC. All Rights Reserve Aarrow_forward
- Create a side-by-side boxplot for vitamin D level vs. NewAge and a side- by-side boxplot for vitamin D level vs. country. Create a scatterplot to show the relationship between vitamin D level and Age. Compare these two side-by-side boxplots and the scatterplot and explain your findings. • Note: Write appropriate captions for the tables, graphs, and outputs.arrow_forwardAplicaciones M Gmail YouTube Maps Noticias G Traducir T&content_id%3D * Question Completion Status: The following set of data represents the number of orders filled by a national-chain restaurant during a two week period. Construct a five number summary for the the data. 66, 75, 68, 89, 86, 73, 67, 75, 75, 82, 85, 74, 67, 61 (Round to the nearest hundredth, if needed). Min Lower Quartile Median Upper Quartile Maximum What is the range and the interquartile range (IQR)? Range Interquartile Range (1QR) local, family-owned restaurant also gathered data for two weeks of orders. The following set of data represents the number of orders filled by this Save All Ans Click Save and Submit to save and submit. Click Save All Answers to save all answers. Relative Reading - Mapp.pdf ANY Worksheet - Py....docx W Worksheet - W....docx * MLK Letter -2.pdf ACIC 四国07A| útv DIC. 11arrow_forwardI need help with this problem please.arrow_forward
- Make a frequency distribution for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.arrow_forwardDiscribe a specific time in real life where a population being surveyed is appropriate. What data are you trying to collect, and give reasons why surveying the population is appropriate. 1. Describe the situation 2. Support why surveying the population is appropriate, using details.arrow_forwardKINDLY PLEASE ANSWER THIS IN PRECISE AND ACCURATE MANNER AND PLEASE WRITE OR TYPE LEGIBLY THANK YOU SO MUCH FOR FOLLOWING THE INSTRUCTIONS. Write a paragraph or two that interprets and analyzes each data set represented in tabular/graphical forms. Aside from data interpretation, explain whether the data presentation effectively communicates the information.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL