itsonlyamodel

R Time Series Fit

Nonlinear least squares fitting of time series

This post is the 2nd part of a three-part series into argovis.

  1. Argovis api introduces my website www.argovis.com, and how users can access data using a python API.
  2. Linear time series analysis in R shows how time series models can be used to fit ocean temperatures from Argo data.
  3. Linear time series model fitting in python gives further detail on how non-linear least squares fitting is typically done on time series.

This notebook explains how one can create time series models using the argovis data set. The programming language R has fantastic time series fitting and diagnostic tools. Time series shall be briefly explained, A time series of sea-surface temperature will be fitted and its models. The goodness of fit of the model will be explored. The data that I am using comes from the surface temperature in a section of the atlantic ocean near Gilbralter. The data was taken from the ARGO float project.

First, I should explain a little bit about time series.

Linear time series models

Time series modeling of climate data involves taking into account noise shocks. The time series that we will be dealing with involve past values as an input. IE for temperature measurement $T$ indexed by time $t$ we have $$T_t = f(T_{t-1}, T_{t-2}, ...)$$. Time series account for white noise $a_t$ which is injected at every time index $t$ and its effect on past values {t-1}, {t-2}, et cetra.

$$T_t = f(T_{t-1}, T_{t-2}, ..., a_t, a_{t-1}, ...)$$.

Linear time series can be used to model stationary time series. Stationarity is achieved when the variance $\sigma^2_a$, covariance, or mean $\mu$ doesn't change. The mean $\mu$ is often times ommitted by expressing the measurement $T_t$ as $\dot{T_t} = T_t - \mu$.

Autoregressive (AR)

$$ \dot{T_t} = \phi_1 \dot{T_{t-1}} + \phi_2 \dot{T_{t-2}} + \phi_p \dot{T_{t-p}} + ... + a_t = \Sigma_{k=1}^p \phi_k \dot{T_{t-k}} + a_t $$

Integer p is finite and $\phi_k$ are constants, whose values are chosen so that the model is stationary. Stationarity shall be defined later. Just know that if it is required to adequately model and predict a time series.

Here the Backshift operator $B$ operates on the model like so.

$$ \dot{T_{t}} = \phi_1 \dot{T_{t-1}} + \phi_2 \dot{T_{t-2}} + ... + a_t = \phi_1 \dot{T_{t}} B + \phi_2 \dot{T_{t}} B^2 + \phi_p B^p \dot{T_{t}}... a_t = \Sigma_{k=1}^p \phi_k B^k \dot{T_{t}} + a_t $$

The backshift operator rewrites AR model. Temperature anomaly $\dot{T_{t}}$ shifts to right-hand side.

$$ \dot{T_{t}} - \Sigma_{k=1}^p \phi_k B^k \dot{T_{t}} = a_t $$

The left-hand side compacted further by the notation

$$ \dot{T_{t}} - \Sigma_{k=1}^p \phi_k B^k \dot{T_{t}} = \phi_p(B) \dot{T_{t}} $$

So that the AR model is expressed by

$$ \phi_p(B) \dot{T_{t}} = a_t $$

Moving average (MA)

Moving average models multiply the white noise variable $a_{t-k}$ by a constant $\phi_k$ for k = 1, 2, ..., q.

$$ \dot{T_{t}} = a_t + \theta a_{t-1} + \theta a_{t-2} ... \theta a_{t-q} $$

Using the backshift notation, $a_t + \theta a_{t-1} + \theta a_{t-2} ... \theta a_{t-q} = \theta_q(B)a_t$ so that the AR model is expressed as

$$ \dot{T_{t}} = \theta_q(B)a_t $$

Autoregressive moving average (ARMA)

Here the backshift operator convenience is revealed. The ARMA model is a combination of the previous two and is expressed by

$$ \phi_p(B) \dot{T_{t}} = \theta_q(B)a_t $$

Autoregressive integrated moving average (ARIMA)

At this point, we should heuristically define stationary. Our time series models are assumed to be random variables with the same probability distribution over all $t$. If random variable temperature for step $t$ = a, $\dot{T_{a}}$ has the same probability for $\dot{T_{b}}$ then it is stationary.

Imagine a linearly increasing time series; The first point is likely smaller than the last point. Then this time series is not stationary. In this case, calculus comes into play by taking the derivative. The change in temperature is constant, so its time series is stationary. In the finite realm, we take an initial differencing step corresponding to the "integrated" part of the model. We can do this d times if necessary. IE,

$$ \dot{T_{t}} - \dot{T_{t-1}} = S_{t} $$

Where, $S_t$ is a stationary time series. In our backshift notation $(1-B)^d \dot{T_{t}} = S_{t}$.

With this, we can write the ARIMA model as

$$ \phi_p(B)(1-B)^d \dot{T_{t}} = \theta_q(B)a_t $$

Deseasonalization of Gilbralter time series

Although ARIMA models can have a seasonal trend, I prefer to decompose remove the climate and look at the seasonal anomalies.

In [2]:
library(readr)
options( warn = -1 )
gilbralter <- read_csv("docs/argo/gilbralter.csv")
gilT <- ts(gilbralter$tempMean, start=c(2004,1,13), end=c(2017, 11, 25), frequency=30)
par(mfrow=c(3,1))
plot(gilT, main="Gilbralter temperature")
acf(gilT, main= "Gilbralter temperature ACF")
pacf(gilT, type="partial", main="gilbralter temp PACF")
Parsed with column specification:
cols(
  X1 = col_double(),
  doxyMean = col_double(),
  doxyStd = col_double(),
  psalMean = col_double(),
  psalStd = col_double(),
  tempMean = col_double(),
  tempStd = col_double(),
  cndcMean = col_double(),
  cndcStd = col_double(),
  presMean = col_double(),
  presStd = col_double(),
  startDate = col_date(format = ""),
  endDate = col_date(format = ""),
  nProf = col_double()
)

The time series has a seasonal trend. Time series modeling can account for seasonal trend by differencing with a seasonal lag. For example, an ARIMA(1, 0, 1) model with a monthly seasonal differencing is denoted by ARIMA(1, 0, 1)x(1, 1, 1)_{12}

$$ (1-\Phi B^{12})(1-\phi B)(1-B^{12})\dot{Z_t} = a_t (1 - \theta B)(1 - \Theta B^{12}) $$

Where $\Phi$ and $\Theta$ represent the seasonal aspects an AR(1) model. Personally, I don't see the need to complicate the model with these addtional parameters. Since I'm more interested in the high frequency model, I instead will remove the seasonal trend.

Aggregating the data into a yearly average provides the annual climate trend. Subtracting this signal from Gilbralter should show only anomalies. R forecast library uses the function seasadj on a decomposed signal to retrieve the deseasonalized trend. I included the install.packages command to show how to install packages inside a jupyter notebook.

In [3]:
install.packages("forecast", "~/anaconda3/lib/R/library")
In [4]:
library(forecast)
decomp <- stl(gilT, s.window="periodic") # breaks series into seasons
desGilT <- seasadj(decomp) # attempts to remove seasonal trend
In [5]:
par(mfrow=c(3,1))
plot(desGilT, main="Deseasonalized Gilbralter Temperatures")
acf(desGilT, main= "ACF Deseasonalized Gilbralter Temperatures")
pacf(desGilT, type="partial", main="PACF Deseasonalized Gilbralter Temperatures")

The ACF appears to decompose exponentially, suggesting that the deseasonalized time series behavior mimic's an AR process. Note that R's seasadj() function did not take out the mean values. This means that the average temperature $\mu$ is a parameter in our model.

Fitting linear time series parameters

Each of the models described is linear, but determining their parameters requires a nonlinear least squares approach (NLLS), due to the model's stochastic nature. The next notebook will further elucidate why this is.

An autocorrelation function (ACF) that exponentially decays indicates that AR(1) fits the model well. An AR(1) model also has a partial autocorrelation function (PACF) that cuts off after lag 1.

Fitting an AR(1) involves solving a non-linear least squares model. Fortunately, R has an arima() function that does this. The next notebook will explain the mechanics of this model.

R also includes time series diagnostics tools that can compare model fits with others.

In [6]:
fit1 <- arima(desGilT, order=c(1,0,0))
print(fit1)
tsdiag(fit1)
myRes1 <- residuals(fit1)
Call:
arima(x = desGilT, order = c(1, 0, 0))

Coefficients:
         ar1  intercept
      0.8311    19.4028
s.e.  0.0276     0.3626

sigma^2 estimated as 1.54:  log likelihood = -656.2,  aic = 1318.4

Function tsdiag() produces three plots and some diagnostic information. The Akaike information criterion (AIC) is a measurement of the quality of model fit.

$$ AIC = 2k - 2ln(\hat{L}) $$

Where the criterion grows with the number of parameters, k and shrinks with the maximum likelihood function $\hat{L}$. The smaller the AIC, the better the model.

The plots of residuals and ACF should behave as a white noise process. The ACF should cut off to zero statistically after lag 1.

Ljung-Box statistic computes tests for ten lags. The null hypothesis H0 states that the fit correlates the data. p-values > 0.05 support HO.

The arima fit library estimates $\phi = .8146$ $\mu = 19.4967$, and $\sigma^2_a = 0.0995$ for the AR(1) model

$$ Z_t = \phi (Z_{t-1} - \mu) + a_t + \mu $$

Where $a_t \sim WN(0, \sigma^2_a)$

An AR(1) Fit is considered a good representation of its relatively low AIC value, and that its p-values for the Ljung-Box statistic are all >.05.

The AR(4) model performes marginally better. This more complicated model has four $\phi$ weights,

$$ Z_t = \phi_1 Z_{t-1} + \phi_2 Z_{t-2} + \phi_3 Z_{t-3} + \phi_4 Z_{t-4} + a_t $$

In [7]:
fit2 <- arima(desGilT, order=c(3,0,0))
print(fit2)
tsdiag(fit2)
myRes1 <- residuals(fit2)
Call:
arima(x = desGilT, order = c(3, 0, 0))

Coefficients:
         ar1      ar2      ar3  intercept
      1.2155  -0.2649  -0.3978    19.4654
s.e.  0.0457   0.0750   0.0458     0.0577

sigma^2 estimated as 0.2657:  log likelihood = -305.66,  aic = 621.32
In [8]:
#You can also use the conditional sums of squares method.
fit3 <- arima(desGilT, order=c(3,0,0), method="CSS")
print(fit3)
tsdiag(fit3)
myRes1 <- residuals(fit3)
Call:
arima(x = desGilT, order = c(3, 0, 0), method = "CSS")

Coefficients:
         ar1      ar2      ar3  intercept
      1.2161  -0.2659  -0.3983    19.4692
s.e.  0.0457   0.0748   0.0457     0.0575

sigma^2 estimated as 0.266:  part log likelihood = -303.49

The inner workings of the arima function boil down to two fitting methods: exact maximum likelihood or conditional sum of squares.

By default, arima() uses exact maximum likelihood estimations using a Kalman filter, taken from Gardner, 1980.

Arima uses the following optimization algorithms: Nelder-Mead, BFGS, CG, L-BFGS-B, SANN, and Brent.

The details on how these models are fit remain opaque. I couldn't find much information on the gory details outside of the white papers mentioned in R's help(arima) function. Reading the source code is one option, but I find R hard to decipher. I'm more of a python programmer, and I'd be interested in creating my own optimization algorithm to find the compare the fitted parameters for these two AR models.

Conclusion

Now that we know some techniques to approach time series, we can look into other regions in argo. I'd like to see if other regions or depths are better characterized by different ARIMA or even by nonlinear models.

Argovis has the capability to explore deep regions, time series at different depths or different regions could be compared with each other in a state space time series model.

In the next notebook, I will create nonlinear least squares fits for the AR(1) and AR(4). Their results will be compared with the results from R's arima() function. I will also compare some of python's other minimizing algorithms in the scipy library.