HW4
pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
6501
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Pages
14
Uploaded by DrCrocodile2618
Question 7.1
Describe a situation or problem from your job, everyday life, current events, etc., for which exponential
smoothing would be appropriate. What data would you need? Would you expect the value of α (the first
smoothing parameter) to be closer to 0 or 1, and why?
I work for a company which manufactures laptops. We have many instances where we can use the
exponential smoothing approach for forecasting using moving averages. One such instance is to know
the quantity of returns from customers. As we sell laptops there are also instances where customers
return the laptops or exchange them from retail stores or online. We as a business need to keep track of
these returns or exchanges to be ready with an alternative product for exchange or know why a product
is being returned.
To resolve this, we can use Exponential smoothing (moving averages) using the historical data of the
return quantity of a product over its past life time and see how its trends are, if
it’s
having any
seasonality like after Labor Day or Black Friday etc. As people may buy the product during promotional
sales and then return it due to not liking the product or exchange them for defects. So, the company
needs to anticipate these returns before hand to keep inventory optimally stocked for the returning
customers to go back with a device.
Using exponential smoothing we can predict in the immediate future or short range how many returns
can be expected and how well our stores/warehouses need to be stocked to save ourselves from the
stock out situation. Here, we need the historical data of return quantities of a product in its lifetime to
date. The value of
ἀ
(alpha) is expected to be close to 1 because there is not much randomness in the
system as in the values will not fluctuate much as the data will be mostly stationary as there are very
minor chances of a mass return of products from customers.
HW7.2
2022-09-16
Question 7.2
Using the 20 years of daily high temperature data for Atlanta (July through October) from Question
6.2 (file
temps.txt
), build and use an exponential smoothing model to help make a judgment of
whether the unofficial end of summer has gotten later over the 20 years. (Part of the point of this
assignment is for you to think about how you might use exponential smoothing to answer this
question. Feel free to combine it with other models if you’d like to. There’
s certainly more than one
reasonable approach.)
Answer
:
This problem may be approached in various ways below is my approach based on what we learned.
Initially I imported the data into a dataframe df_1
df_1
<-
read.delim(
"temps.txt"
,
header=
TRUE)
names(df_1)
=
gsub(
'X'
,
''
,names(df_1))
#beautifying headers
head(df_1)
##
DAY 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2
009
## 1 1-Jul
98
86
91
84
89
84
90
73
82
91
93
95
85
95
## 2 2-Jul
97
90
88
82
91
87
90
81
81
89
93
85
87
90
## 3 3-Jul
97
93
91
87
93
87
87
87
86
86
93
82
91
89
## 4 4-Jul
90
91
91
88
95
84
89
86
88
86
91
86
90
91
## 5 5-Jul
89
84
91
90
96
86
93
80
90
89
90
88
88
80
## 6 6-Jul
93
84
89
91
96
87
93
84
90
82
81
87
82
87
##
2010 2011 2012 2013 2014 2015
## 1
87
92
105
82
90
85
## 2
84
94
93
85
93
87
## 3
83
95
99
76
87
79
## 4
85
92
98
77
84
85
## 5
88
90
100
83
86
84
## 6
89
90
98
83
87
84
df
<-
as.vector(unlist(df_1[,
2
:
21
]))
#unlisting the df and creating a vector
#converting the data into time series data format using ts.
df_ts
<-
ts(df,
frequency =
123
,
start=
1995
)
head(df_ts)
## [1] 98 97 97 90 89 93
plot.ts(df_ts,
col=
"grey"
)
Below is the plot for the original time series data. As we can see there seems to be some
seasonality in the data as there are peaks and valleys at regular time intervals over the
years.
Let us try to decompose this data and see individually if there are any trends, randomness
or seasonality in the data. Decompose function helps find the underlying components in a
time series data.
dc_ts
<-
decompose(df_ts)
plot(dc_ts)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
From the above plot we can observe that there is definitely seasonality in the data and it is
constant seasonality over the years which makes sense as the temperatures from July to
Oct usually goes from up to down. Also, we can see some randomness and trend in data.
Once we decompose the data, we can see the type of time series is additive. If you have a
seasonal time series that can be described using an additive model, you can seasonally
adjust the time series by estimating the seasonal component, and subtracting the estimated
seasonal component from the original time series. I did try removing seasonality and
randomness from the data and got the plot below which shows there has been longer
summers.
dc_ts$type
## [1] "additive"
df_ts<-df_ts-dc_ts$seasonal
#removing seasonal component
plot.ts(df_ts,col="grey")
df_ts<-df_ts-dc_ts$random
#removing random component
plot.ts(df_ts,col="grey")
However, we cannot implement Holtwinters (Triple smoothing) without the seasonality an
d trend components. So, I ended up using the original time series for applying the Holtwint
ers approach.
op_ts
<-
HoltWinters(df_ts)
#op_ts
plot(op_ts$fitted)
Holt-Winters exponential smoothing with trend and additive seasonal component
.
Call:
HoltWinters(x = df_ts)
Smoothing parameters:
alpha: 0.6610618
beta : 0
gamma: 0.6248076
Coefficients:
[,1]
a
71.477236414
b
-0.004362918
Here, we can see that alpha is 0.6 which is shows that there is randomness in the data and
the output is based on previous data points.
Below is the plot for Holt Winters approach. We can see there is no trend in the data at all.
Level below is the moving average and as shown there is cyclic seasonality in the data
#trend is constant
df_fitted
<-
as.data.frame(op_ts$fitted)
plot(df_ts,
col=
'grey'
)
lines(df_fitted$xhat,
col=
'red'
)
lines(df_fitted$level,
col=
'blue'
)
Below I plotted the original data and the output of holt winters(red) which is smoothed
observations and the level (blue) to see if there are any indications of longer summers.
The period of 2010 to 2015 shows that there is a slight increase but then the abrupt fall
may be because we do not have data points after 2015. So, it is difficult to deduce
confirmatively from the below graph.
plot(df_fitted$season,
col=
'yellow'
,
type =
'l'
) #plot of seasonality in HW
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
plot(op_ts)
Below is the graph of holt winters smoothed curve which sits well on the original data and
gets better as the more data points improve the smoothing. Here the initial year is
unavailable since we are not forecasting for 1996.
Since, we are unsure
let’s
implement the CUSUM approach we learn last week and check if
this helps in identifying any indication as to if summers have been longer in Atlanta.
I am implementing the CUSUM approach on the seasonality values we got from
Holtwinters(HW) to check if we can identify any change
length(df_fitted$ season)
## [1] 2337
#converting the season values to a matrix
mat
<-
matrix(
data =
df_fitted$season,
ncol=
123
)
dim(mat)
## [1]
19 123
#assigning row names and column names
colnames(mat)
<-
as.vector(t(df_1[,
1
]))
rownames(mat)
<-
c(
1997
:
2015
)
#head(mat)
#transposing the matrix to have months in row and years in columns
df_mat
=
t((mat))
#converting it to a data frame for easy operations
df_mat
<-
as.data.frame(df_mat)
#adding column avg which has the mean of each day across all years
df_mat$avg
<-
rowMeans(df_mat,
na.rm =
TRUE)
df_mat$cusum
<-
0
#adding column for cusum(st) calculation
df_mat$change
<-
NA
#adding column for change detection trigger
Mu
<-
mean(df_mat[
1
:
30
,
20
])
#mean of summer months July across 20 years
C
=
sd(df_mat[
1
:
30
,
20
])
#stddev for the above mean
T
=
5
*C
#I set the threshold to 5 times std dev
#implementation of cusum(max(0,st-1+mu-xi-C)) and st>=threshold(t) using for
loop
for
(i
in
2
:nrow(df_mat)){
df_mat$cusum[i]
<-
max(
0
,df_mat$cusum[i
-1
]+Mu-df_mat$avg[i]-C)
if
(df_mat$cusum[i]>=T){
df_mat$change[i]
=
'Change detected'
}
else
{
df_mat$change[i]
=
'No Change'
}
}
head(df_mat)
1997
1998
1999
2000
2001
2002
1-Jul
4.303159
8.2381188
11.0917774
9.042997
2.067387
2.116168
2-Jul
7.896655
6.7665741
1.6771432
4.766574
5.799094
6.815355
3-Jul
-2.493588
-9.5748893
-2.6561901
3.238119
5.156818
5.099907
4-Jul
8.132428
8.3438099
9.4901514
9.620233
2.770639
-3.152125
5-Jul
3.425111
-0.5748893
-0.5179787
-4.469198 -3.412288
-3.339117
6-Jul -12.192775 -16.1765153 -18.1358649 -22.103345 -4.046434 -12.013914
2003
2004
2005
2006
2007
2008
1-Jul
-6.826922
5.197468
2.2055985
5.262509
2.295029
6.327550
2-Jul
6.986086
9.099907
9.2218587
7.278769
6.238119 -9.940743
3-Jul
4.189338
5.335680
8.5226717
8.652753
6.669013
6.620233
4-Jul -10.095214
3.937306
0.9291758
3.969826
6.042997
6.189338
5-Jul
-4.322857
-1.233426
1.7990945
1.839745
4.799094
1.758444
6-Jul
-9.875702 -12.802532 -12.7862714 -8.802532 -2.835052 -0.843182
2009
2010
2011
2012
2013
2014
1-Jul
4.343810
8.2706392
9.205599
9.189338
7.14055787
7.075517
2-Jul -2.013914
1.9779562
6.051127
6.986086
5.86413510
1.717794
3-Jul
5.603973 -0.3228568
-3.168385
-1.119605
-0.03017384
1.953566
4-Jul
6.221859
8.2950294
1.343810
3.392590
5.41698063
-4.526109
5-Jul -3.314727 -7.4122877 -10.404158 -18.379767 -16.25781611 -10.209036
6-Jul -1.835052
0.1568180 -15.843182 -18.826922 -13.81879172
-2.810662
2015
avg
cusum
change
1-Jul
6.994216
5.449501 0.000000
0
2-Jul
1.538932
4.459343 0.000000 No Change
3-Jul
4.986086
2.455919 0.000000 No Change
4-Jul -3.501719
3.306369 0.000000 No Change
5-Jul
1.839745 -3.599280 0.000000 No Change
6-Jul -0.745621 -9.807238 5.121384 No Change
df_mat$change
##
[1] NA
"No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[7] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[13] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[19] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[25] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[31] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[37] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[43] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[49] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[55] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
##
[61] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[67] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[73] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[79] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[85] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[91] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
##
[97] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
## [103] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
## [109] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
## [115] "No Change" "No Change" "No Change" "No Change" "No Change" "No Chan
ge"
## [121] "No Change" "No Change" "No Change"
I am implementing the CUSUM approach on the xhat values we got from Holtwinters (HW)
to check if we can identify any change using predicted xhat values.
length(df_fitted$xhat)
## [1] 2337
#converting the xhat values to a matrix
mat
<-
matrix(
data =
df_fitted$xhat,
ncol=
123
)
dim(mat)
## [1]
19 123
#assigning row names and column names
colnames(mat)
<-
as.vector(t(df_1[,
1
]))
rownames(mat)
<-
c(
1997
:
2015
)
#head(mat)
#transposing the matrix to have months in row and years in columns
df_mat
=
t((mat))
#converting it to a data frame for easy operations
df_mat
<-
as.data.frame(df_mat)
#adding column avg which has the mean of each day across all years
df_mat$avg
<-
rowMeans(df_mat,
na.rm =
TRUE)
df_mat$cusum
<-
0
#adding column for cusum(st) calculation
df_mat$change
<-
NA
#adding column for change detection trigger
Mu
<-
mean(df_mat[
1
:
30
,
20
])
#mean of summer months July across 20 years
C
=
sd(df_mat[
1
:
30
,
20
])
#stddev for the above mean
T
=
5
*C
#I set the threshold to 5 times std dev
#implementation of cusum(max(0,st-1+mu-xi-C)) and st>=threshold(t) using for
loop
for
(i
in
2
:nrow(df_mat)){
df_mat$cusum[i]
<-
max(
0
,df_mat$cusum[i
-1
]+Mu-df_mat$avg[i]-C)
if
(df_mat$cusum[i]>=T){
df_mat$change[i]
=
'Change detected'
}
else
{
df_mat$change[i]
=
'No Change'
}
}
head(df_mat)
1997
1998
1999
2000
2001
2002
2003
2004
1-Jul 87.17619 90.32925 92.96089 90.93360 83.99752 84.04358 75.06732
87.0428
4
2-Jul 89.85831 88.81753 83.84436 87.03232 88.03911 89.02515 89.17489
91.1687
4
3-Jul 79.87808 72.87301 79.87130 85.84632 87.86225 87.89204 87.04848
88.1584
1
4-Jul 90.15381 90.25915 91.22982 91.20361 84.21506 80.79001 78.60881 100.8283
1
5-Jul 92.92855 85.66611 87.26150 85.11624 88.07513 90.07746 92.34346
93.2182
9
6-Jul 65.51275 69.77949 75.23323 75.73464 97.92898 79.42710 81.93967
79.0483
6
2005
2006
2007
2008
2009
2010
2011
2012
1-Jul 84.01829 87.05875 84.04807 88.04445 86.02696 89.93161 90.90741 90.94800
2-Jul 91.17478 89.11179 87.99288 71.81436 79.85955 83.93990 88.04844 88.94701
3-Jul 91.23632 91.20582 89.08166 88.97453 87.97075 82.05889 79.17007 81.10206
4-Jul 92.64082 87.98181 85.43521 83.30630 85.11516 89.09103 78.76996 81.62751
5-Jul 80.89773 80.34056 84.39254 69.18892 64.64756 60.11756 62.33211 64.04852
6-Jul 81.01147 83.66114 87.20402 76.49648 66.57824 62.23394 48.71918 55.83264
2013
2014
2015
avg
cusum
change
1-Jul 88.92923 88.90661 88.88268 87.32912 0.000000
0
2-Jul 87.85573 83.80040 83.74912 86.48707 0.000000 No Change
3-Jul 82.11966 84.01993 87.03491 84.91603 0.000000 No Change
4-Jul 83.89378 75.33867 84.06756 86.02928 0.000000 No Change
5-Jul 80.67739 82.96870 88.42319 80.66955 0.000000 No Change
6-Jul 69.54085 80.84814 75.73753 74.34041 3.399691 No Change
unique(df_mat$change)
[1] "0"
"No Change"
Here, we can see that there is No change detected in the df_mat$change column.
Finally, I would also check the if level component in HW output can help identify any
change.
length(df_fitted$level)
## [1] 2337
#converting the level values to a matrix
mat
<-
matrix(
data =
df_fitted$level,
ncol=
123
)
dim(mat)
## [1]
19 123
#assigning row names and column names
colnames(mat)
<-
as.vector(t(df_1[,
1
]))
rownames(mat)
<-
c(
1997
:
2015
)
#head(mat)
#transposing the matrix to have months in row and years in columns
df_mat
=
t((mat))
#converting it to a data frame for easy operations
df_mat
<-
as.data.frame(df_mat)
#adding column avg which has the mean of each day across all years
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
df_mat$avg
<-
rowMeans(df_mat,
na.rm =
TRUE)
df_mat$cusum
<-
0
#adding column for cusum(st) calculation
df_mat$change
<-
NA
#adding column for change detection trigger
Mu
<-
mean(df_mat[
1
:
30
,
20
])
#mean of summer months July across 20 years
C
=
sd(df_mat[
1
:
30
,
20
])
#stddev for the above mean
T
=
5
*C
#I set the threshold to 5 times std dev
#implementation of cusum(max(0,st-1+mu-xi-C)) and st>=threshold(t) using for
loop
for
(i
in
2
:nrow(df_mat)){
df_mat$cusum[i]
<-
max(
0
,df_mat$cusum[i
-1
]+Mu-df_mat$avg[i]-C)
if
(df_mat$cusum[i]>=T){
df_mat$change[i]
=
'Change detected'
}
else
{
df_mat$change[i]
=
'No Change'
}
}
head(df_mat)
1997
1998
1999
2000
2001
2002
2003
2004
1-Jul 82.87739 82.09550 81.87348 81.89497
81.93450 81.93177 81.89860 81.8497
4
2-Jul 81.96602 82.05532 82.17158 82.27010
82.24438 82.21416 82.19317 82.0731
9
3-Jul 82.37604 82.45227 82.53185 82.61257
82.70979 82.79649 82.86350 82.8270
9
4-Jul 82.02575 81.91971 81.74403 81.58774
81.44878 83.94650 88.70838 96.8953
7
5-Jul 89.50780 86.24536 87.78384 89.58980
91.49178 93.42094 96.67068 94.4560
8
6-Jul 77.70989 85.96037 93.37346 97.84235 101.97977 91.44537 91.81973 91.8552
5
2005
2006
2007
2008
2009
2010
2011
2012
1-Jul 81.81705 81.80060 81.75740 81.72126 81.68752 81.66533 81.70618 81.76302
2-Jul 81.95728 81.83738 81.75912 81.75947 81.87782 81.96631 82.00167 81.96529
3-Jul 82.71801 82.55743 82.41701 82.35866 82.37114 82.38611 82.34282 82.22603
4-Jul 91.71601 84.01634 79.39658 77.12133 78.89766 80.80036 77.43051 78.23928
5-Jul 79.10299 78.50518 79.59781 67.43484 67.96665 67.53421 72.74063 82.43265
6-Jul 93.80211 92.46804 90.04343 77.34403 68.41766 62.08148 64.56672 74.66393
2013
2014
2015
avg cusum
change
1-Jul 81.79304 81.83546 81.89283 81.88398
0
0
2-Jul 81.99595 82.08697 82.21455 82.03209
0 No Change
3-Jul 82.15420 82.07073 82.05319 82.46447
0 No Change
4-Jul 78.48116 79.86914 87.57364 82.72728
0 No Change
5-Jul 96.93957 93.18210 86.58781 84.27320
0 No Change
6-Jul 83.36400 83.66317 76.48751 84.15201
0 No Change
unique(df_mat$change)
"0"
"No Change"
Here also we observe there is no change detected.
However, there was a change detected (8
th
and 9
th
aug, 3-4
th
sept, 17-18
th
sept)
when the threshold was 2*C but such low threshold
didn’t
make sense to me to
have such low threshold for this use case so went with 5*C for threshold.
Conclusion
: As per my analysis we can see that there is no evidence that the summers have
been longer lately when we take into considering the data given to us and the data is small
for us to conclude anything definitively. We have tried using the original data, xhat values ,
seasonality and level to check for change but we did not find any change as per our analysis
which makes me believe that there has been no change in the length of summers.