Introduction to Time Series II

Author: Edwin Bedolla

Date: Original, 6th April 2020. This version, 7th February 2021.

In this document, the main statistics such as the mean function, autocovariance function and autocorrelation function will be described, along with some examples.

We will import all of the necessary modules first.

15.1 μs

x
 
begin
    using StatsPlots
    using Random
    using TimeSeries
    using Dates
    using Statistics
    using DataFrames
end

109 s

xxxxxxxxxx
 
gr();

4.4 ms

xxxxxxxxxx
 
# Ensure reproducibility of the results
rng = MersenneTwister(8092);

11.2 ms

Descriptive statistics and measures

A full description of a given time series is always given by the joint distribution function of the time series which is a multi-dimensional function that is very difficult to track for most of the time series that are dealt with.

Instead, we usually work with what's known as the marginal distribution function defined as

$F_{t} (x) = P {x_{t} \leq x}$

where $P {x_{t} \leq x}$ is the probability that the realization of the time series $x_{t}$ at time $t$ is less or equal that the value of $x$ . Even more common is to use a related function known as the marginal density function

$f_{t} (x) = \frac{\partial F_{t} (x)}{\partial x}$

and when both functions exist they can provide all the information needed to do meaningful analysis of the time series.

Mean function

With these functions we can now define one of the most important descriptive measures, the mean function which is defined as

$μ_{x t} = E (x_{t}) = \int_{- \infty}^{\infty} x f_{t} (x) d x$

where $E$ is the expected value operator found in classical statistics.

Autocovariance and autocorrelation

We are also interested in analyzing the dependence or lack of between realization values in different time periods, i.e. $x_{t}$ and $x_{s}$ ; in that case we can use classical statistics to define two very important and fundamental quantities.

The first one is known as the autocovariance function and it's defined as

$γ_{x} (s, t) = cov (x_{s}, x_{t}) = E [(x_{s} - μ_{s}) (x_{t} - μ_{t})]$

where $cov$ is the covariance as defined in classical statistics. A simple way of defining the autocovariance is the following

The autocovariance tells us about the linear dependence between two points on the same time series observed at different times.

Normally, we know from classical statistics that if for a given time series $x_{t}$ we should have $γ_{x} (s, t) = 0$ then it means that there is no linear dependence between $x_{t}$ and $x_{s}$ at time periods $t$ and $s$ ; but this does not mean that there is no relation between them at all. For that, we need another measure that we describe below.

We now introduce the autocorrelation function (ACF) and it's defined as

$ρ (s, t) = \frac{γ_{x} (s, t)}{\sqrt{γ_{x} (s, s) γ_{x} (t, t)}}$

which is a measure of predictability and we can define it in words as follows

The autocorrelation measures the linear predictability of a given time series $x_{t}$ at time $t$ , using values from the same time series but at time $s$ .

This measure is very much related toPearson's correlation coefficient from classical statistics, which is a way to measure the relationship between values.

The range of values for the ACF is $- 1 \leq ρ (s, t) \leq 1$ ; when $ρ (s, t) = 1$ it means that a linear model can perfectly describe the realization of the time series at time $t$ provided with the realization at time $s$ , e.g. a trend goind upwards, on the other hand, if $ρ (s, t) = - 1$ would mean that the realization of the time series $x_{t}$ decrease while the realization $x_{s}$ is increasing.

29.4 μs

Example

Let's look at an example for the particular case of the moving average. We will be working out the analytic form of the autocovariance function and ACF for the moving average while also providing the same results numerically using Julia.

Recall the 3-valued moving average to be defined as

$v_{t} = \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})$

Let's plot the moving average again. We will create a very big time series for the sake of numerical approximation below.

First, we create the white noise time series.

17.1 μs

xxxxxxxxxx
 
# Create a range of time for a year, spaced evenly every 1 minute
dates = DateTime(2018, 1, 1, 1):Dates.Minute(1):DateTime(2018, 12, 31, 24);

37.9 ms

xxxxxxxxxx
 
# Build a TimeSeries object with the specified time range and white noise
ts = TimeArray(dates, randn(rng, length(dates)));

4.8 ms

xxxxxxxxxx
 
# Create a DataFrame of the TimeSeries for easier handling
df_ts = DataFrame(ts);

5.3 ms

Then, as before, we compute the 3-valued moving average.

10.2 μs

xxxxxxxxxx
 
# Compute the 3-valued moving average
moving_average = moving(mean, ts, 3);

305 ms

xxxxxxxxxx
 
# Create a DataFrame of the TimeSeries for easier handling
df_average = DataFrame(moving_average);

73.4 ms

Recall what these look like in a plot. We just plot the first 100 elements in the time series to avoid having a very cluttered plot.

9.8 μs

xxxxxxxxxx
 
# Indices to plot
idxs = 1:100;

227 ns

x
 
begin
    @df df_ts plot(:timestamp[idxs], :A[idxs], label = "White noise")
    @df df_average plot!(:timestamp[idxs], :A[idxs], label = "Moving average")
end

32.6 ms

We are now ready to do some calculations. First, we invoke the definition of the autocovariance function and apply it to the moving average

$γ_{v} (s, t) = cov (v_{s}, v_{t}) = cov {\frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1}), \frac{1}{3} (w_{s - 1} + w_{s} + w_{s + 1})}$

and now we need to look at some special cases.

When $s = t$ we now have the following

$γ_{v} (t, t) = cov (v_{t}, v_{t}) = cov {\frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1}), \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})}$

then, by the property of covariance of linear combinations we have the following simplification

$γ_{v} (t, t) = cov (v_{t}, v_{t}) = \frac{1}{9} {cov (w_{t - 1}, w_{t - 1}) + cov (w_{t}, w_{t}) + cov (w_{t + 1}, w_{t + 1})}$

and because $cov (U, U) = var (U)$ for a random variable $U$ , for a white noise random variable we have $var (w_{t}) = σ_{w t}^{2}$ , thus

$γ_{v} (t, t) = cov (v_{t}, v_{t}) = \frac{3}{9} σ_{w t}^{2}$

In this case, recall that our white noise is normally distributed $w_{t} \sim N (0, σ_{w t}^{2})$ with $σ_{w t}^{2}$ so the true expected value is the following

21.7 μs

true_γ

0.3333333333333333

x
 
true_γ = 3 / 9

87.0 ns

We will try to compute the autocovariance function using classical statistics by means of the cov function in Julia. We need to pass it the time series like so

9.8 μs

γ_jl

0.33351433375298867

xxxxxxxxxx
 
γ_jl = cov(df_average[:, :A], df_average[:, :A])

152 ms

And we can see that the value is quite similar. The error must come from the fact that we may need a bigger ensemble of values, but this should suffice.

7.7 μs

When $s = t + 1$ we now have the following

$γ_{v} (t + 1, t) = cov (v_{t + 1}, v_{t}) = cov {\frac{1}{3} (w_{t} + w_{t + 1} + w_{t + 2}), \frac{1}{3} (w_{t - 1} + w_{t} + w_{t + 1})} γ_{v} (t + 1, t) = \frac{1}{9} {cov (w_{t}, w_{t}) + cov (w_{t + 1}, w_{t + 1})} γ_{v} (t + 1, t) = \frac{2}{9} σ_{w t}^{2}$

So the true value is now

15.4 μs

true_γ1

0.2222222222222222

xxxxxxxxxx
 
true_γ1 = 2 / 9

91.0 ns

To check this, we perform the same operations as before, but this time, we need to move the time series one time step with respect to itself.

14.6 μs

γ_jl1

0.22247427287331825

xxxxxxxxxx
 
# Remove the last element from the first and start with the second element
γ_jl1 = cov(df_average[1:(end-1), :A], df_average[2:end, :A])

49.7 ms

Great! Within a tolerance value, this is quite a nice estimate. It turns out that for the cases $s = t + h$ where $h \geq 2$ , the value for the autocovariance is zero. We'll check it numerically here.

13.2 μs

γ_jl_zero

0.00043227901205739896

xxxxxxxxxx
 
# Remove the last element from the first and start with the second element
γ_jl_zero = cov(df_average[1:(end-3), :A], df_average[4:end, :A])

4.8 ms

It's actually true, a value very close to zero but, ¿why? It's easy to see if one applies the autocovariance function definition and checks the case $s = t + 3$ , and so on.

Let's now focus on the ACF for a 3-valued moving average. We have several cases, like before.

11.2 μs

When $s = t$ we now have the following

$ρ_{v} (t, t) = \frac{γ_{v} (t, t)}{\sqrt{γ_{v} (t, t) γ_{v} (t, t)}} ρ_{v} (t, t) = \frac{γ_{v} (t, t)}{γ_{v} (t, t)} = 1$

so it turns out that the true value is $ρ_{v} (t, t) = 1$ , and we can check this using the cor function to compute the correlation coefficient in Julia as an estimate for the ACF

12.7 μs

ρ_est

1.0

x
 
ρ_est = cor(df_average[:, :A], df_average[:, :A])

25.8 ms

When $s = t + 1$ we now have the following

$ρ_{v} (t + 1, t) = \frac{γ_{v} (t + 1, t)}{\sqrt{γ_{v} (t + 1, t + 1) γ_{v} (t, t)}}$

recall from before that $γ (t, t) = 3 / 9 σ_{v t}^{2}$ for a white noise time series, and we also have $γ (t + 1, t) = 2 / 9 σ_{v t}^{2}$ , so the ACF is now

$ρ_{v} (t + 1, t) = \frac{2 / 9 σ_{v t}^{2}}{\sqrt{(3 / 9 σ_{v t}^{2}) (3 / 9 σ_{v t}^{2})}} ρ_{v} (t + 1, t) = \frac{18 σ_{v t}^{2}}{27 σ_{v t}^{2}} ρ_{v} (t + 1, t) = \frac{2}{3}$

which is the true value

16.2 μs

true_ρ2

0.6666666666666666

xxxxxxxxxx
 
true_ρ2 = 2 / 3

68.0 ns

and again, we can check this value numerically

19.9 μs

ρ_est2

0.667060850012139

xxxxxxxxxx
 
ρ_est2 = cor(df_average[1:(end-1), :A], df_average[2:end, :A])

5.5 ms

Lastly, like with the autocovariance, the ACF for the cases $s = t + h$ where $h \geq 2$ is zero as seen below

13.7 μs

ρ_est_zero2

0.0012961321840730129

xxxxxxxxxx
 
ρ_est_zero2 = cor(df_average[1:(end-3), :A], df_average[4:end, :A])

1.9 ms