HW2_q3

pdf

School

University of Texas *

*We aren’t endorsed by this school

Course

123A

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

1

Uploaded by hs.jeong

Report
Question 3 a. Consider the task of modeling this data using a SARIMA model. Based on your knowledge of monthly variation in temperature, what value would be most appropriate for the seasonal lag term, ? Answer: I believe that 12 would be the most appropriate for the seasonal lag term for monthly data. b.Using the seasonal lag selection in the previous subquestion, fit the model to the full aMDT time series for all combinations of and in {0, 1} except the four cases where , and (Hint: this means you should be checking 60 di ff erent combinations). In answering this question, you should fit the various models to the full data set (do not split it into a training/test split) and assume that (where is the constant term). Identify which of these models best fits the data and report the AICc value for this model and the estimated values of the unknown parameters. for (i in 1 :nrow(filtered_exp)){ x <- filtered_exp[i,] fit_s <- sarima(mean_dallas_temp$AvgTemp, x$Var1, x$Var2, x$Var3, x$Var4, x$Var5, x$Var6, 12 , no.constant = TRUE , details = FALSE ) count <- count + 1 print(paste(count, ' SARIMA(' , x$Var1, ',' ,x$Var2, ',' ,x$Var3, ') × (' ,x$Var4, ',' ,x$Var5, ',' ,x$Var6, '). The AICc value for this model is ' , format(fit_s$AICc, digits = 5 ))) } ## [1] "1 SARIMA( 0 , 0 , 0 ) × ( 0 , 0 , 0 ). The AICc value for this model is 11.587" ## [1] "2 SARIMA( 1 , 0 , 0 ) × ( 0 , 0 , 0 ). The AICc value for this model is 7.1067" ## [1] "3 SARIMA( 0 , 1 , 0 ) × ( 0 , 0 , 0 ). The AICc value for this model is 7.0854" ## [1] "4 SARIMA( 1 , 1 , 0 ) × ( 0 , 0 , 0 ). The AICc value for this model is 6.7098" ## [1] "5 SARIMA( 0 , 0 , 1 ) × ( 0 , 0 , 0 ). The AICc value for this model is 10.279" ## [1] "6 SARIMA( 1 , 0 , 1 ) × ( 0 , 0 , 0 ). The AICc value for this model is 6.8669" ## [1] "7 SARIMA( 0 , 1 , 1 ) × ( 0 , 0 , 0 ). The AICc value for this model is 6.8463" ## [1] "8 SARIMA( 1 , 1 , 1 ) × ( 0 , 0 , 0 ). The AICc value for this model is 6.7168" ## [1] "9 SARIMA( 0 , 0 , 0 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.2195" ## [1] "10 SARIMA( 1 , 0 , 0 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.1011" ## [1] "11 SARIMA( 0 , 1 , 0 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.2212" ## [1] "12 SARIMA( 1 , 1 , 0 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.1112" ## [1] "13 SARIMA( 0 , 0 , 1 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.1415" ## [1] "14 SARIMA( 1 , 0 , 1 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.0441" ## [1] "15 SARIMA( 0 , 1 , 1 ) × ( 1 , 0 , 0 ). The AICc value for this model is 6.0352" ## [1] "16 SARIMA( 1 , 1 , 1 ) × ( 1 , 0 , 0 ). The AICc value for this model is 5.921" ## [1] "17 SARIMA( 0 , 0 , 0 ) × ( 0 , 1 , 0 ). The AICc value for this model is 5.9442" ## [1] "18 SARIMA( 1 , 0 , 0 ) × ( 0 , 1 , 0 ). The AICc value for this model is 5.8606" ## [1] "19 SARIMA( 0 , 1 , 0 ) × ( 0 , 1 , 0 ). The AICc value for this model is 6.285" ## [1] "20 SARIMA( 1 , 1 , 0 ) × ( 0 , 1 , 0 ). The AICc value for this model is 6.084" ## [1] "21 SARIMA( 0 , 0 , 1 ) × ( 0 , 1 , 0 ). The AICc value for this model is 5.8838" ## [1] "22 SARIMA( 1 , 0 , 1 ) × ( 0 , 1 , 0 ). The AICc value for this model is 5.856" ## [1] "23 SARIMA( 0 , 1 , 1 ) × ( 0 , 1 , 0 ). The AICc value for this model is 5.946" ## [1] "24 SARIMA( 1 , 1 , 1 ) × ( 0 , 1 , 0 ). The AICc value for this model is 5.8933" ## [1] "25 SARIMA( 0 , 0 , 0 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.7658" ## [1] "26 SARIMA( 1 , 0 , 0 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.6604" ## [1] "27 SARIMA( 0 , 1 , 0 ) × ( 1 , 1 , 0 ). The AICc value for this model is 6.0617" ## [1] "28 SARIMA( 1 , 1 , 0 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.8792" ## [1] "29 SARIMA( 0 , 0 , 1 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.6881" ## [1] "30 SARIMA( 1 , 0 , 1 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.6517" ## [1] "31 SARIMA( 0 , 1 , 1 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.7278" ## [1] "32 SARIMA( 1 , 1 , 1 ) × ( 1 , 1 , 0 ). The AICc value for this model is 5.6957" ## [1] "33 SARIMA( 0 , 0 , 0 ) × ( 0 , 0 , 1 ). The AICc value for this model is 10.424" ## [1] "34 SARIMA( 1 , 0 , 0 ) × ( 0 , 0 , 1 ). The AICc value for this model is 6.6887" ## [1] "35 SARIMA( 0 , 1 , 0 ) × ( 0 , 0 , 1 ). The AICc value for this model is 6.6669" ## [1] "36 SARIMA( 1 , 1 , 0 ) × ( 0 , 0 , 1 ). The AICc value for this model is 6.5761" ## [1] "37 SARIMA( 0 , 0 , 1 ) × ( 0 , 0 , 1 ). The AICc value for this model is 9.3334" ## [1] "38 SARIMA( 1 , 0 , 1 ) × ( 0 , 0 , 1 ). The AICc value for this model is 6.6419" ## [1] "39 SARIMA( 0 , 1 , 1 ) × ( 0 , 0 , 1 ). The AICc value for this model is 6.6211" ## [1] "40 SARIMA( 1 , 1 , 1 ) × ( 0 , 0 , 1 ). The AICc value for this model is 6.5716" ## [1] "41 SARIMA( 0 , 1 , 0 ) × ( 1 , 0 , 1 ). The AICc value for this model is 5.8134" ## [1] "42 SARIMA( 1 , 1 , 0 ) × ( 1 , 0 , 1 ). The AICc value for this model is 5.6746" ## [1] "43 SARIMA( 0 , 1 , 1 ) × ( 1 , 0 , 1 ). The AICc value for this model is 5.5505" ## [1] "44 SARIMA( 1 , 1 , 1 ) × ( 1 , 0 , 1 ). The AICc value for this model is 5.4701" ## [1] "45 SARIMA( 0 , 0 , 0 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.4594" ## [1] "46 SARIMA( 1 , 0 , 0 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.3524" ## [1] "47 SARIMA( 0 , 1 , 0 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.7534" ## [1] "48 SARIMA( 1 , 1 , 0 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.5774" ## [1] "49 SARIMA( 0 , 0 , 1 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.3776" ## [1] "50 SARIMA( 1 , 0 , 1 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.3425" ## [1] "51 SARIMA( 0 , 1 , 1 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.4118" ## [1] "52 SARIMA( 1 , 1 , 1 ) × ( 0 , 1 , 1 ). The AICc value for this model is 5.3864" ## [1] "53 SARIMA( 0 , 0 , 0 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.4671" ## [1] "54 SARIMA( 1 , 0 , 0 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.3609" ## [1] "55 SARIMA( 0 , 1 , 0 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.7617" ## [1] "56 SARIMA( 1 , 1 , 0 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.586" ## [1] "57 SARIMA( 0 , 0 , 1 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.386" ## [1] "58 SARIMA( 1 , 0 , 1 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.351" ## [1] "59 SARIMA( 0 , 1 , 1 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.4191" ## [1] "60 SARIMA( 1 , 1 , 1 ) × ( 1 , 1 , 1 ). The AICc value for this model is 5.395" #Best fit model fit_best <- sarima(mean_dallas_temp$AvgTemp, 1 , 0 , 1 , 0 , 1 , 1 , 12 , no.constant = TRUE , details = FALSE ) print(paste( 'The best model fits the data is SARIMA(1,0,1) × (0,1,1). The AICc value for the model is ' , format(f it_best$AICc, digits = 5 ))) ## [1] "The best model fits the data is SARIMA(1,0,1) × (0,1,1). The AICc value for the model is 5.3425" fit_best ## $fit ## ## Call: ## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), ## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, ## REPORT = 1, reltol = tol)) ## ## Coefficients: ## ar1 ma1 sma1 ## 0.7183 -0.4535 -1.0000 ## s.e. 0.1351 0.1764 0.0589 ## ## sigma^2 estimated as 10.16: log likelihood = -637.05, aic = 1282.09 ## ## $degrees_of_freedom ## [1] 237 ## ## $ttable ## Estimate SE t.value p.value ## ar1 0.7183 0.1351 5.3158 0.0000 ## ma1 -0.4535 0.1764 -2.5709 0.0108 ## sma1 -1.0000 0.0589 -16.9895 0.0000 ## ## $AIC ## [1] 5.342055 ## ## $AICc ## [1] 5.342478 ## ## $BIC ## [1] 5.400065 Answer Let’s generate SARIMA models to the full aMDT time series and display the AICs values for the models except the four cases. In total, there are 60 models generated with the AICc values. SARIMA(1, 0, 1) × (0, 1, 1) model has the lowest the AICc value that is 5.3425. The estimated values of parameters are the below. ar1( ) : 0.7183 ma1( ) : -0.4535 seasonal ma1( ): -1.0000 c. Consider the task of forecasting the aMDT twelve months in advance. For the last five years of data (2016-2020), predict the value of aMDT using all of the data up until one year prior to the prediction (i.e. predict the aMDT for January 2016 using all of the data up to and including January 2015, then add in the observed aMDT for February 2015 and predict a MDT for February 2016, etc.). Use the values of and as determined to be best in part b, but update your coe cients at every time step using the new data. Create a plot of the one-year-in-advance predictions and 95% confidence bands superimposed on a time series plot of the observed aMDT values from January 2010 to December 2020. Report the one-year-in-advance prediction of aMDT for January 2018, along with the upper and lower bounds of the prediction interval. (Hint: Making one-year-in-advance predictions with newly added data at each time step may require a for loop) predit_fun <- function (data, p, d, q, P, D, Q, S){ n <- nrow(data) n_train <- nrow(filter(data, Month <= '2015-01-01' )) data_p <- data %>% mutate( pred = NA , se = NA , lower = NA , upper = NA ) #for(i in n_train:(n-12)) { for (i in n_train:(n- 12 )) { x_train <- data_p[ 1 :i, 2 ] forecast_3p <- sarima.for(x_train, n.ahead = 12 , p = p, d = d, q = q, P = P, D = D, Q = Q, S = S, no.constant = TRUE , plot = FALSE ) pred <- forecast_3p$pred[ 12 ] se <- forecast_3p$se[ 12 ] data_p$pred[i+ 12 ] <- pred data_p$se[i+ 12 ] <- se data_p$lower[i+ 12 ] <- pred - se data_p$upper[i+ 12 ] <- pred + se } return (data_p) } predict_data <- predit_fun(mean_dallas_temp, 1 , 0 , 1 , 0 , 1 , 1 , 12 ) pt_data <- predict_data %>% filter(predict_data$Month >= '2010-01-01' ) pt_data$Month <- as.Date(pt_data$Month) pt <- ggplot(pt_data, aes(x = Month)) + geom_line(aes(y=AvgTemp, color= 'Average Temperature' )) + geom_line(aes(y=pred, color= 'Predicted Temperature' )) + geom_ribbon(aes(y=pred, ymin = pred - 1.96 *se, ymax = pred + 1.96 *se),alpha = .2 ) + geom_vline(xintercept = as.Date( '2016-01-01' )) + ylab( "Temperature" ) + theme(legend.position= "top" , legend.title=element_blank()) pt pred_jan2018 <- filter(predict_data, predict_data$Month == '2018-01-01' ) print(paste( 'The one-year-in-advance prediction of aMDT for January 2018 is' , format(pred_jan2018$AvgTemp, digits = 5 ), 'with the upper bound' , format(pred_jan2018$upper, digits = 5 ), 'and the lower bound' , format(pred_jan2018$ low, digits = 5 ))) ## [1] "The one-year-in-advance prediction of aMDT for January 2018 is 55.613 with the upper bound 61.597 and the lower bound 54.516" d. Now consider an alternative model for the aMDT data that does not have a seasonal component. Report the AICc value for an ARIMA(3, 1, 1) model fit to the full a MDT data set. Refit the model to make one-year-in-advance predctions of aMDT for the last five years of the observation window (2016-2020) as you did in the previous subquestion. Plot your predictions and 95% confidence bounds, along with the true observed values shown. Set your x-axis to span January 2010 to December 2020. Additionally, report your one-year- in-advance prediction for the aMDT for January 2018, along with your upper and lower bounds of your prediction interval. Does the fitted model produce predictions that capture seasonal behavior? How do the predictions from the ARIMA(3,1,1) model that does not include a specific seasonal component compare to the predictions from the model fitted in part c? predit_fun_d <- function (data, p, d, q){ n <- nrow(data) n_train <- nrow(filter(data, Month <= '2015-01-01' )) data_p <- data %>% mutate( pred = NA , se = NA , lower = NA , upper = NA ) for (i in n_train:(n- 12 )) { x_train <- data_p[ 1 :i, 2 ] forecast_3p <- sarima.for(x_train, n.ahead = 12 , p=p, d=d, q=q, no.constant = TRUE , details = FALSE , plot = F ALSE ) pred <- forecast_3p$pred[ 12 ] se <- forecast_3p$se[ 12 ] data_p$pred[i+ 12 ] <- pred data_p$se[i+ 12 ] <- se data_p$lower[i+ 12 ] <- pred - se data_p$upper[i+ 12 ] <- pred + se } return (data_p) } predict_data_d <- predit_fun_d(mean_dallas_temp, 3 , 1 , 1 ) pt_data_d <- predict_data_d %>% filter(predict_data_d$Month >= '2010-01-01' ) pt_data_d$Month <- as.Date(pt_data_d$Month) pt_d <- ggplot(pt_data_d, aes(x = Month)) + geom_line(aes(y=AvgTemp, color= 'Average Temperature' )) + geom_line(aes(y=pred, color= 'Predicted Temperature' )) + geom_ribbon(aes(y=pred, ymin = pred - 1.96 *se, ymax = pred + 1.96 *se),alpha = .2 ) + geom_vline(xintercept = as.Date( '2016-01-01' )) + ylab( "Temperature" ) + theme(legend.position= "top" , legend.title=element_blank()) pt_d pred_jan2018_d <- filter(predict_data_d, predict_data_d$Month == '2018-01-01' ) print(paste( 'The one-year-in-advance prediction of aMDT for January 2018 is' , format(pred_jan2018_d$AvgTemp, digi ts = 5 ), 'with the upper bound' , format(pred_jan2018_d$upper, digits = 5 ), 'and the lower bound' , format(pred_jan 2018_d$low, digits = 5 ))) ## [1] "The one-year-in-advance prediction of aMDT for January 2018 is 55.613 with the upper bound 79.645 and the lower bound 56.212" The one-year-in-advance prediction has the same value as the prediction from the model in c. However, the 95% confidence bounds are very wide. Therefore, this model is poor to predict the future data. Additionally, this fitted model does not seem to produce the prediction that capture seasonal behaviors well. e. Report the AICc value for an ARIMA(12,1,0) model fit to the full aMDT data set. Refit the model to make one-year-in-advance predictions of aMDT for the last five years of the observation window (2016-2020) as you did in the previous subquestion. Plot your predictions and 95% confidence bounds, along with the true observed values shown in. Set your x-axis to span January 2010 to December 2020. Additionally, report your one-year-in-advance prediction for the aMDT for January 2018, along with your upper and lower bounds of your prediction interval. Does the fitted model produce predictions that capture seasonal behavior? How do the predictions from the ARIMA(12,1,0) model compare to the predictions from the models fitted in part c and d? predit_fun_e <- function (data, p, d, q){ n <- nrow(data) n_train <- nrow(filter(data, Month <= '2015-01-01' )) data_p <- data %>% mutate( pred = NA , se = NA , lower = NA , upper = NA ) for (i in n_train:(n- 12 )) { x_train <- data_p[ 1 :i, 2 ] forecast_3p <- sarima.for(x_train, n.ahead = 12 , p=p, d=d, q=q, no.constant = TRUE , details = FALSE , plot = F ALSE ) pred <- forecast_3p$pred[ 12 ] se <- forecast_3p$se[ 12 ] data_p$pred[i+ 12 ] <- pred data_p$se[i+ 12 ] <- se data_p$lower[i+ 12 ] <- pred - se data_p$upper[i+ 12 ] <- pred + se } return (data_p) } predict_data_e <- predit_fun_e(mean_dallas_temp, 12 , 1 , 0 ) pt_data_e <- predict_data_e %>% filter(predict_data_e$Month >= '2010-01-01' ) pt_data_e$Month <- as.Date(pt_data_e$Month) pt_e <- ggplot(pt_data_e, aes(x = Month)) + geom_line(aes(y=AvgTemp, color= 'Average Temperature' )) + geom_line(aes(y=pred, color= 'Predicted Temperature' )) + geom_ribbon(aes(y=pred, ymin = pred - 1.96 *se, ymax = pred + 1.96 *se),alpha = .2 ) + geom_vline(xintercept = as.Date( '2016-01-01' )) + ylab( "Temperature" ) + theme(legend.position= "top" , legend.title=element_blank()) pt_e pred_jan2018_e <- filter(predict_data_e, predict_data_e$Month == '2018-01-01' ) print(paste( 'The one-year-in-advance prediction of aMDT for January 2018 is' , format(pred_jan2018_e$AvgTemp, digi ts = 5 ), 'with the upper bound' , format(pred_jan2018_e$upper, digits = 5 ), 'and the lower bound' , format(pred_jan 2018_e$low, digits = 5 ))) ## [1] "The one-year-in-advance prediction of aMDT for January 2018 is 55.613 with the upper bound 66.472 and the lower bound 57.656" The one-year-in-advance prediction for the aMDT for January 2018 shows the same value as the models in c and d. This model is better than the fitted model in d in terms of 95% confidence bounds. However, the model in c has tighter bounds. This model produces predictions that capture seasonal behavior better than the model in d. Nonetheless, the model requires 12 autoregressive components, which is extremely complicated. Therefore, the model in c would be the best model to capture seasonality as well as to predict a future data point. S SARIMA ( p , d , q ) × ( P , D , Q ) s p , d , q , P , D , Q P = 1, D = 0, Q = 1 d = 0 δ = 0 δ ϕ 1 θ 1 Θ 1 p , d , q , P , D , Q
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help