12  MLE for ARMA Models

OLS can only be used to estimate AR models, but not MA models. MA models or ARMA models in general can be estimated using maximum likelihood approach. Maximum likelihood estimation (MLE) starts with an assumed distribution of the random variables. The parameters are chosen to maximize the likelihood of observing the data under the distribution.

Consider an ARMA(\(p\), \(q\)) model

\[ y_t = \phi_1 y_{t-1} +\dots + \phi_p y_{t-p} + u_t + \theta_1 u_{t-1} + \dots + \theta_q u_{t-q} \]

Write in the form of data matrix:

\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\y_T \end{bmatrix}= \underbrace{ \begin{bmatrix} y_0 & y_{-1} & \dots & y_{1-p} \\ y_1 & y_{0} & \dots & y_{2-p} \\ y_2 & y_{1} & \dots & y_{3-p} \\ \vdots & \vdots & \ddots & \vdots \\ y_T & y_{T-1} & \dots & y_{T-p} \\ \end{bmatrix} }_{\boldsymbol X} \begin{bmatrix} \phi_1 \\ \phi_2 \\ \phi_3 \\ \vdots \\ \phi_p \end{bmatrix} + \underbrace{ \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ \theta_1 & 1 & 0 & \dots & 0 \\ \theta_2 & \theta_1 & 1 & \dots & 0 \\ \vdots & \vdots & & \ddots & \vdots \\ 0 & \dots & \theta_2 & \theta_1 & 1 \end{bmatrix} }_{\boldsymbol\Gamma} \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_T \end{bmatrix} \]

Or compactly,

\[ \boldsymbol{y = X\phi + \Gamma u}. \]

We assume the innovations are jointly normal \(\boldsymbol u \sim N(0,\sigma^2\boldsymbol I)\). We also assume the first \(p\) observations are known initial values \(y_0, y_{-1},\dots,y_{1-p}\) and \(u_0 = u_{-1} = \dots = u_{1-q} = 0\). Therefore, the observed data are jointly normal given the initial condition,

\[ \boldsymbol{y|y_0} \sim N(\boldsymbol{X\phi}, \sigma^2\boldsymbol{\Gamma\Gamma'}). \]

The probability density function for multivariate normal is

\[ f(\boldsymbol{y} | \boldsymbol{y_0}, \boldsymbol{\phi}, \boldsymbol{\Gamma},\sigma^2) = (2\pi)^{-T/2}|\sigma^2\boldsymbol{\Gamma\Gamma'}|^{-1/2}\exp\left(-\frac{1}{2}(\boldsymbol{y-X\phi})' (\sigma^2\boldsymbol{\Gamma\Gamma'})^{-1} (\boldsymbol{y-X\phi}) \right) \]

To simplify computation, take logarithm to get the log-likelihood function

\[ \ell(\boldsymbol{\phi},\boldsymbol{\Gamma},\sigma^2|\boldsymbol{y},\boldsymbol{y_0}) = -\frac{T}{2}\ln(2\pi) -\frac{1}{2}\ln|\sigma^2\boldsymbol{\Gamma\Gamma'}|-\frac{1}{2\sigma^2}(\boldsymbol{y-X\phi})' (\boldsymbol{\Gamma\Gamma'})^{-1} (\boldsymbol{y-X\phi}). \]

The parameters are then chosen to maximize this log-likelihood function, i.e. the probability of observing the data under the assumed distribution. This can be done by conducting a grid search over the parameter space using a computer. To reduce the seach dimensions, we may concentrate the log-likelihood by computing the first-order conditions:

\[ \frac{\partial\ell}{\partial\boldsymbol\phi}=0 \implies \boldsymbol{\hat\phi} = (\boldsymbol X'(\boldsymbol{\hat\Gamma\hat\Gamma'})^{-1}\boldsymbol X)^{-1}\boldsymbol X'(\boldsymbol{\hat\Gamma\hat\Gamma'})^{-1}\boldsymbol y \]

\[ \frac{\partial\ell}{\partial\sigma^2}=0 \implies \hat\sigma^2 =\frac{1}{T} (\boldsymbol{y-X\hat\phi})' (\boldsymbol{\hat\Gamma\hat\Gamma'})^{-1} (\boldsymbol{y-X\hat\phi}) \]

This allows us to focus our search only on \(\boldsymbol\phi\).

#' --------------------------------
#'  MLE Estimation for MA(1)
#' --------------------------------

# Simulate an MA(1) process 
y = arima.sim(list(ma=0.5), n=200)

# Negative log-likelihood function for MA(1)
neg_log_likelihood <- function(params, y) {
  theta <- params[1]
  sigma <- abs(params[2])  # Ensure sigma > 0
  n <- length(y)
  
  # Initialize residuals
  e_hat <- numeric(n)
  e_hat[1] <- y[1] #/ (1 + theta^2)
  
  for (t in 2:n) {
    e_hat[t] <- y[t] - theta * e_hat[t - 1]
  }
  
  ll <- -n/2 * log(2*pi*sigma^2) - sum(e_hat^2) / (2*sigma^2)
  
  return(-ll)  # Return negative log-likelihood
}

# Initial guesses for theta and sigma
init_params <- c(0.1, 1)

# Run MLE using optim
result <- optim(par = init_params,
                fn = neg_log_likelihood,
                y = y)

# Estimated parameters
theta_hat <- result$par[1]
sigma_hat <- abs(result$par[2])

cat("Estimated theta:", theta_hat)
Estimated theta: 0.4700173
cat("Estimated sigma:", sigma_hat)
Estimated sigma: 0.9726252