ECON 4445 Midterm Review Note
Ch2. Time Series Graphics
1. It is always a good idea in Time Series analysis to plot the data first to see how data
change over time.
2. To do that, a convenient way in R is to define a time series objects (ts). Two important
things to specify: (1) the starting period; (2) frequency. Examples:
i.
Annual data starts at 2012.
y <- ts(c(123,39,78,52,110), start=2012)
ii.
Monthly data starts at January 2003.
y <- ts(c(123,39,78,52,110,321,432,54,42,56,77,88,22,11,232), start=2003,
frequency=12)
In the above examples, c(.) is a made up series. When dealing with actual data, just
replace c(.) with the name of your data. When we are using the data in fpp2, the data is
specified as time series object already, so we can skip this step.
3. The most common plotting function we use in this class is autoplot. Example:
autoplot(a10) +
ggtitle("Antidiabetic drug sales") +
ylab("$ million") +
xlab("Year")
In the above program, we plot the data in a10, define the title of the figure as
"Antidiabetic drug sales", label y axis as $million and x axis as Year.
4. There are three patterns that we want to identify in time series.
a. Trend: a long-term increase or decrease in the data
b. Seasonal: when a time series is affected by seasonal factors such as the time of
the year or the day of the week.
c. Cyclic: when the data exhibit rises and falls that are not of a fixed frequency.
5. To identify seasonal pattern, two useful plots can be used: ggseasonplot and
ggsubseriesplot. Examples:
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
ggsubseriesplot(a10) +
ylab("$ million") +
ggtitle("Seasonal Subseries plot: antidiabetic drug sales")
Try to apply the above two programs. Can you identify seasonal pattern based on these
plots?
6. Two other very important plots in identifying time series patterns are the lag plot and
autocorrelation plot. Example of lag plot:
beer2 <- window(ausbeer, start=1992)
gglagplot(beer2)
These plots are scatter plots of current value (y axis) against lagged value (x axis). For
example, the first plot will plot the current value against one lagged value. Four different
colors represent four different quarters. From these plots, we can see that lag2 and lag6
plot demonstrate negative correlation, and lag4 and lag8 demonstrate positive correlation.
This is evidence of seasonal pattern.
For autocorrelation plot, we first need to define autocorrelation
∑𝑇𝑡=𝑘+1(𝑦𝑡 − 𝑦̅)(𝑦𝑡−𝑘 − 𝑦̅)
𝑟𝑘 =
∑𝑇𝑡=1(𝑦𝑡 − 𝑦̅)2
k in the above formula can be equal to 1, 2, 3, …. For example, when k=1, we are
calculating the correlation coefficient between current value and one lagged value.
ggacf(beer2)
We find: (1) 𝑟4 , 𝑟8 , 𝑟12 , 𝑟16 are largely positive. (2) 𝑟2 , 𝑟6 , 𝑟10 , 𝑟14 , 𝑟18 are largely negative.
When we observe significant up and down in autocorrelation function with fixed length,
we can conclude that the data has seasonal pattern.
When data have a trend, the autocorrelations for small lags tend to be large and
positive because observations nearby in time are also nearby in size.
aelec <- window(elec, start=1980)
ggAcf(aelec, lag=48)
We find two things in the above ACF plot. First, it decays in a really slow rate. This is an
evidence of trend pattern. Second, it has significant up and down with fixed length. This
is an evidence of seasonal pattern.
The blue dash lines in ACF figure represent critical value. We didn’t get into detail of
hypothesis testing. So you are not required to write down a formal null hypothesis. Just
understand the blue dash lines as thresholds to conclude whether we have enough
evidence (from our sample) to suggest that the k-th autocorrelation is not equal to 0. For
example, in the first autocorrelation plot on this page, we see that the first vertical line
shows a negative correlation with lag 1, but it doesn’t pass the blue dash line. This means
that we don’t have enough evidence (from our sample) to suggest that the first
autocorrelation is not equal to 0. Statistically, we say the first autocorrelation is
insignificant. The second vertical line shows a negative correlation with lag 2, and this
one pass the blue dash line. This means that we have enough evidence (from our sample)
to suggest that the second autocorrelation is not equal to 0. Statistically, we say the
second autocorrelation is significant.
7. One important thing to define in time series analysis is White Noise, which is a time
series that show no autocorrelation. For example, suppose I generate a series of normal
random variables
set.seed(30)
y <- ts(rnorm(50))
autoplot(y) + ggtitle("White noise")
ggAcf(y)
The following figure demonstrates that the series we generated doesn’t have any
significant autocorrelation. This is one important feature of White Noise series.
Why we care about White Noise? The reason is: when a series is not White Noise, this
means something happened in the past will be able help us to learn about future (that is
why we observe some nonzero autocorrelation in a non-White-Noise series). When we
can learn from the past, we would like to incorporate that part into our model to improve
the forecasting performance. Therefore, a really important criterion for forecasting model
is the residuals (the difference between your forecast and the actual value) should look
like White Noise. We will talk about this more when we start to introduce our forecasting
model.
Ch3. The Forecasters’ Toolbox
1. Now we are ready to introduce our basic forecasting methods. We have four methods in
this chapter:
a. Average method: using the average of series as the forecast.
b. Naïve method: using the last observation as the forecast.
c. Seasonal naïve method: using the last observation associate with each season
(month if monthly data) as the forecast for each season (month).
d. Drift method: using extrapolate line between first and last observation as forecast.
Each one of them has their own advantage for some data set. Suppose we are trying to
forecast a series that does not change too much, doesn’t have trend, seasonal, and cyclic
pattern, then average method is a good candidate. Example series can be the US inflation
rate from 2010 to 2019. Naïve method works well in short run forecast when the series
has trend. Example series can be stock price data. Seasonal naïve works the best when
data demonstrates strong seasonality. Example series can be ice cream or beer
consumption. Drift method works well in long run forecast when the series has trend.
2. Before we start to forecast our data, one attempt that people sometimes do is to transform
the data. The most important transformation we mentioned in class is the Box-Cox
transformation. In the following formula, 𝑦𝑡 represents the original series, and 𝑤𝑡
represents the transformed series.
The purpose of doing this transformation is to make the variances of your time series at
each time point approximately the same, or at least make the variances closer to each
other. This is related to an issue in econometrics we called heteroscedasticity. We didn’t
get into detail about heteroscedasticity, so you don’t have to worry about it. Intuitively,
we would like to make sure that our forecasting model has about the same uncertainty at
each time point. This will give us a better measure of the overall uncertainty of the model,
so that we can construct a more precise prediction interval (how uncertain our forecast is).
Several important points related to Box-Cox transformation:
a. First and the most important point, Box-Cox transformation is usually applied to
time series where the variance (how large the fluctuation of the data is) changed
with the level of the data. For example, suppose your data has positive (negative)
trend, so the level of data is increasing (decreasing) over time. If you observe that
the variances of your data also increase (decrease) over time, applying Box-Cox
transformation is a good idea.
b. Box-Cox transformation is driven by one single parameter 𝜆. This parameter can
be any real number. It can be selected automatically by R. For example, suppose I
want to determine the optimal value of 𝜆 for the data series elec.
(lambda <- BoxCox.lambda(elec))
But usually the results are not too sensitive to the value of 𝜆. Therefore, people
sometimes just choose a 𝜆 that is easier to interpret. For example, 0 is a common
choice, as it represents the log transformation.
c. Once the value of 𝜆 is determined, we can use the following code to plot the
transformed series:
autoplot(BoxCox(elec,lambda))
d. If we want apply Box-Cox transformation with any of our forecast methods above.
We can just specify lambda=x, where x denotes the value you chose for 𝜆, in the
forecasting function. For example, the following codes are how you can specify
lambda in forecasting functions:
fc <- meanf(eggs, lambda=0, h=50, level=80)
fc <- naive(eggs, lambda=0, h=50, level=80)
fc <- snaive(eggs, lambda=0, h=50, level=80)
fc <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80)
e. What we did with the above forecasting method with Box-Cox transformation is
that we used the transformed series to make forecast, and then transformed
everything (including forecast) back. However, the results that R provides by
default are the median of forecasts, not the mean. This is because Box-Cox
transformation is nonlinear, and nonlinear transformation will change the mean. In
the slides I provide a detail derivation of how we can get the mean instead of
median which involves Taylor expansion. You are not required to understand those.
Just remember that the default results are median, not mean. If we want to get
mean forecast from R, we need to specify the following code in the forecasting
methods:
fc <- meanf(eggs, lambda=0, h=50, level=80, biasadj=TRUE)
fc <- naive(eggs, lambda=0, h=50, level=80, biasadj=TRUE)
fc <- snaive(eggs, lambda=0, h=50, level=80, biasadj=TRUE)
fc <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80, biasadj=TRUE)
3. As we mentioned in Chapter 2, we would like the residuals of our model (the difference
between your forecast and the actual value) to be a White Noise. Thus, the next step is to
check whether this is true. Theoretically, we need our residuals to satisfy the following
two assumptions:
a. Residuals are not autocorrelated. If residuals are autocorrelated, then there is
information left in the residual that we should use in the model.
b. Residuals have mean zero. If residuals have nonzero mean, then our forecast is
going to be biased. This means our forecast is not going to be the average of the
future values.
And in the best scenario, we would like our residuals to have the following two useful
properties (not necessary)
c. Residuals have constant variance. We will have a better way to measure the
uncertainty if this is true.
d. Residuals are normally distributed. Our prediction intervals are calculated based on
the assumption that residuals are normally distributed. So if residuals are normally
distributed, the prediction intervals are going to be more precise.
4. To test for autocorrelation, we will use a test called Ljung-Box test. This is a joint test to
test whether a set of the autocorrelations are equal to zero. Please the understand the
difference between using the blue dash line in the ACF plot and Ljung-Box test. When
we are using the blue dash line in the ACF plot, we are testing each autocorrelation
separately. For example, using the first figure on page 3, we can test whether each of the
autocorrelation is equal to zero or not by checking the blue dash line. But when we are
testing them separately, we didn’t check the other autocorrelations. For example, when
we use the second vertical line in the first figure on page 3 to check whether the second
autocorrelation is zero or not, we don’t check any other autocorrelations. This may lead
to different conclusion than joint test.
5. To check all the above properties, we can use the checkresiduals function. For example
suppose we want to check residuals from our basic forecasting methods, we can use the
following code:
checkresiduals(meanf(goog200))
checkresiduals(naive(goog200))
checkresiduals(snaive(goog200))
checkresiduals(rwf(goog200, drift=TRUE))
6. We can evaluate the forecasting performance of our model by two different methods. The
first method is used to evaluate the long term forecasting performance. The second
method, which is also called cross-validation method, is used to evaluate the short term
forecasting performance.
a. To perform the first method, we need to split the data into two parts. The first part
is called training data, and the second part is called test data. The intuition for
doing this is we want to create a situation where we can pretend to perform real
forecasting. Think about in reality, when we are making forecast, of course we
don’t know the future value. In this situation, we can only evaluate how far away
our forecast and the actual observation is after we observe the future value (which
can only happen in the future). This makes evaluating the forecasting performance
very time consuming, as we always have to wait for the future observation to
realize. Therefore, an alternative way is to split the data, using only the first part to
construct the forecast of the second part, and then compare the forecast with the
actual observation in the second part. Again, this is to mimic the situation when we
perform the real forecast. The first method can be performed in the following
procedures:
i. First, we split the data. For example:
beer2 <- window(ausbeer,start=1992,end=c(2007,4))
beer3 <- window(ausbeer,start=2008)
The above two functions create two data sets. The first is called beer2, it
starts in 1992 first quarter and ends in 2007 fourth quarter. The second is
called beer3, it starts in 2008 first quarter and ends at the end of data set. We
are going to use the first data set as training data set, and use the second data
set as test data set.
ii. Second, we apply forecasting method to the first data set. For example,
suppose I want to apply average methods, naïve methods, and seasonal naïve
method, I can use the following codes to construct the forecast of the next 10
periods after the end of the first data set (2007 fourth quarter).
beerfit1 <- meanf(beer2,h=10)
beerfit2 <- naive(beer2,h=10)
beerfit3 <- snaive(beer2,h=10)
Note that we specify h=10 to get the next 10 period forecast is because we
our test data set only has 10 observations. If test data set has different
amount of observations, then we should change the value of h.
iii. Third, we compare the forecast we got in step ii with the actual observation
in test data set. It can be done with the following codes.
accuracy(beerfit1, beer3)
accuracy(beerfit2, beer3)
accuracy(beerfit3, beer3)
iv. The accuracy function in the third step will produce a lot of different
measure to check how far away the forecast is from the actual observation. If
you are interested, please take look the textbook to check the definition of
each of the measures. In this class, we will focus on a measure called root
mean squared error (RMSE). The method which provides the smallest
RMSE is the best.
b. For the second method, we still split the data into two parts. The first part is still
called training data, but we will only use the first observation in the second part to
be our test data. And we are going to apply this method multiple times until we use
everything but the last observation as our training data, and the last observation as
our test data. This method focus on only one step ahead forecast (using up until
today’s observation to forecast tomorrow). To perform the second method, the
code is much simpler than the first one. For example, we can use the following
code to perform this cross-validation method:
e <- tsCV(goog200, rwf, drift=TRUE, h=1)
sqrt(mean(e^2, na.rm=TRUE))
#> [1] 6.233
The first line of the code specifies that we want to use cross-validation method
(CV) to perform model comparison. Inside tsCV function, we first specify the data
set. Notice that we don’t have to split data by ourselves in this method. Computer
will help us to split. So we can just specify goog200, which is our original data set.
The second and third input in tsCV function is to specify what method we want to
use to forecast. In this example, we use drift method. If we want to use naïve
method instead, for example, we can replace “rwf, drift=TRUE” by “naïve”. We
specify h=1 in the last part of the first line is because we want to focus on only
one step ahead forecast. The outcome of the first line is going to be the forecast
errors with this cross-validation method. “e” is going to be a vector.
The second line of the code is to calculate the root mean squared error using the
forecast errors we generated from the first line. The second input na.rm=TRUE
is necessary for some technical reason, please remember to include it. The
outcome we get, 6.233, is the cross-validation RMSE from drift method.
Example questions:
1. The pigs data shows the monthly total number of pigs slaughtered in Victoria, Australia,
from Jan 1980 to Aug 1995. Use mypigs <- window(pigs, start=1990) to select the data
starting from 1990. Use autoplot and ggAcf for mypigs series and compare these to white
noise.
Ans: Codes and plots are on next page.
Time plot: Monthly data in thousands. No features really jump out. Maybe a bit of a trend.
The ACF shows significant spikes at lags 1,2 and 3. Also note a large spike at the seasonal lag 12.
If we had a longer series with the significance bounds tighter this may have also been
significant indicating some seasonality. Definitely not a white noise series.
2. The arrivals data set comprises quarterly international arrivals (in thousands) to Australia
from Japan, New Zealand, UK and the US.
a. Use autoplot, ggseasonplot and ggsubseriesplot to compare the differences
between the arrivals from these four countries.
b. Can you identify any unusual observations?
Ans: We can directly apply autoplot function, and then we will get the following figure.
Or we can use an additional input facet=TRUE. Then we get the following figure
The above time plots show: a decrease in arrivals from Japan since mid-1990s, an
increase in arrivals from NZ, a downturn of arrivals from the UK since mid-2000s and a
flattening out of the arrivals from the US.
The seasonal plots show the difference in seasonal patterns from the four source countries.
The peaks for UK and the US happen in Q1 and Q4 which include the summer period in
Australia, Christmas and New Year’s holiday period with Q2 and Q3 being the troughs.
For Japan peaks occur mostly in Q1 but also Q3 reflecting both peak arrivals in summer
but also winter which possibly correspond to winter skiing season or visiting northern
Australia in during the dry season. The one source country that is noticeably different is
New Zealand. Peak arrivals from New Zealand occur during the Q3 followed by Q2 and
Q4. Unlike all other source countries, the trough clearly occurs during Q1 the January
(summer) quarter. The seasonal plots are also useful, revealing anomalies or one-off
events. For example, in the US plot, the peak arrivals for all July quarters occurred in
2000 during the Sydney Olympic games.
Unusual observation: 1991:Q3 is unusual for US (Gulf war effect?); 2001:Q3-Q4 are unusual
for US (9/11 effect).
3. For the following two series, make a graph of the data. If transforming seems appropriate,
do so and describe the effect. dole and bricksq.
Ans:
The data was transformed using Box-Cox transformation with parameter λ=0.33. The
transformation has stabilized the variance.
The time series was transformed using a Box-Cox transformation with λ=0.25 ...

Purchase answer to see full
attachment