28  Granger Causality

28.1 Definition

Granger causality conceptualizes the usefulness of some variables in forecasting other variables.

Definition 28.1 (Granger Causality) \(x\) fails to Granger-cause \(y\) if for all \(s >0\), the MSE of a forecast \(\hat y_{t+s}\) based on \((y_t,y_{t-1},...x_t,x_{t-1},...)\) is the same as the MSE of the forecast based on \((y_t,y_{t-1},...)\):

\[ \text{MSE}[\hat y_{t+s}|y_t,y_{t-1},...] = \text{MSE}[\hat y_{t+s}|y_t,y_{t-1},...x_t,x_{t-1},...]. \]

Equivalently, we say \(x\) is exogenous to \(y\), or \(x\) is not linearly informative about \(y\).

It must be noted that Granger causality has nothing to do with the causality defined by counterfactuals. The word “causality” is unfortunately misleading. It would be better named as “Granger predictability”.

28.2 Granger Causality Test

Consider the Granger causality test in a single equation setup:

\[ y_t = \alpha + \phi_1 y_{t-1}+\cdots+\phi_p y_{t-p} + \beta_1 x_{t-1} + \cdots +\beta_q x_{t-q} + u_t \]

Testing whether \(x\) Granger-causes \(y\) is equivalent to test

\[ H_0: \beta_1 = \beta_2=\cdots=\beta_q=0 \]

The test can be done by comparing the residual sum of squares (RSS) with and without \(x\) as the regressors. Assuming \(H_0\) holds, we would have the restricted model:

\[ y_t = \alpha + \phi_1 y_{t-1} + \cdots + \phi_p y_{t-p} + u_t^R \]

Compute RSS for the restricted model:

\[ \text{RSS}_0 = \sum_{t=1}^{T} (\hat u_t^{R})^2 \]

Also compute the RSS for the unrestricted model, i.e. including all \(x\) as the regressors:

\[ \text{RSS}_1 = \sum_{t=1}^{T} \hat u_t^2 \]

The joint significance can be tested by the \(F\) ratio:

\[ S = \frac{(\text{RSS}_0 - \text{RSS}_1)/q}{\text{RSS}_1/(T-2q-1)} \sim F(q, T-2q-1). \]

28.3 Granger Causality in VAR

In a VAR setting, \(x\) does not Granger-cause \(y\) if the coefficient matrix are lower triangular for all \(j\):

\[ \begin{aligned} \begin{bmatrix}y_t \\ x_t\end{bmatrix} &= \begin{bmatrix}\alpha_1 \\ \alpha_2\end{bmatrix} + \begin{bmatrix}\phi_{1,11} & 0 \\ \phi_{1,21} & \phi_{1,22} \end{bmatrix} \begin{bmatrix}y_{t-1} \\ x_{t-1}\end{bmatrix} + \begin{bmatrix}\phi_{2,11} & 0 \\ \phi_{2,21} & \phi_{2,22}\end{bmatrix} \begin{bmatrix}y_{t-2} \\ x_{t-2}\end{bmatrix} + \cdots \\[1em] &= \begin{bmatrix}\alpha_1 \\ \alpha_2\end{bmatrix} + \begin{bmatrix}\phi_{11}(L) & 0 \\ \phi_{21}(L) & \phi_{22}(L)\end{bmatrix} \begin{bmatrix}y_{t} \\ x_{t} \end{bmatrix} + \begin{bmatrix}u_{t} \\ v_{t} \end{bmatrix} \end{aligned} \]

It is equivalent to the MA representation:

\[ \begin{bmatrix}y_t \\ x_t\end{bmatrix} = \begin{bmatrix}\mu_1 \\ \mu_2\end{bmatrix} + \begin{bmatrix}\theta_{11}(L) & 0 \\ \theta_{21}(L) & \theta_{22}(L)\end{bmatrix} \begin{bmatrix}u_t \\ v_t\end{bmatrix} \]

To test Granger causality in a VAR setting, we need to introduce the likelihood ratio (LR) test.

28.4 Likelihood Ratio Test

Let’s quickly derive the maximum likelihood estimator (MLE) for the VAR. The joint probability density function is

\[ \begin{aligned} f(y_T,y_{T-1},\dots,y_1 | y_0,\dots,y_{1-p};\Theta) &= f(y_T|y_{T-1}\dots)f(y_{T-1}|y_{T-2}\dots)\cdots f(y_1|y_0\dots) \\[1em] &= \prod_{t=1}^{T} f(y_t|y_{t-1},y_{t-2},... ,y_{1-p};\Theta) \end{aligned} \]

Define \(x_t'=\begin{bmatrix}1 & y_{t-1} & \dots & y_{t-p}\end{bmatrix}\) the collection of all regressors, and \(B'=\begin{bmatrix}\alpha & \Phi_1 & \dots & \Phi_p\end{bmatrix}\) the collection of all parameters. The log-likelihood function can be written as

\[ \begin{aligned} \ell(\theta) &= \sum_{t=1}^T \log f(y_t|y_{t-1},y_{t-2},... ,y_{1-p};\theta) \\ &= -\frac{T}{2}\log |2\pi\Omega^{-1}| - \frac{1}{2}\sum_{t=1}^{T}[(y_t - B'x_t)'\Omega^{-1}(y_t-B'x_t)] \end{aligned} \]

where \(\Omega\) is the variance-covariance matrix of the residuals. Maximizing the log-likelihood function gives the ML estimator:

\[ \begin{aligned} \hat B'_{n \times (np+1)} &= \left[\sum_{t=1}^{T} y_tx_t'\right]\left[\sum_{t=1}^{T} x_t x_t'\right]^{-1} \\ \hat\Omega_{n \times n} &= \frac{1}{T} \sum_{t=1}^{T}\hat e_t \hat e_t' \end{aligned} \]

where \(e_t=[u_t\ v_t]'\).

The likelihood ratio (LR) test is motivated by the fact that different specifications give different likelihood evaluates. By comparing the likelihood difference, we can test the significance of one specification versus the alternative.

The null hypothesis (\(H_0\)) could be a specification with a particular lag length, or a particularization with certain exogeneity restrictions. We compute the covariance matrix under the null hypothesis (\(H_0\)) and the alternative (\(H_1\)) respectively:

\[ \begin{aligned} \hat\Omega_0 &= \frac{1}{T} \sum_t \hat e_t(H_0) \hat e_t(H_0)' \\ \hat\Omega_1 &= \frac{1}{T} \sum_t \hat e_t(H_1) \hat e_t(H_1)' \\ \end{aligned} \]

The corresponding log-likelihoods are

\[ \begin{aligned} \ell_0^* &= -\frac{T}{2} \log |2\pi\hat\Omega_0^{-1}| - \frac{Tn}{2} \\ \ell_1^* &= -\frac{T}{2} \log |2\pi\hat\Omega_1^{-1}| - \frac{Tn}{2} \\ \end{aligned} \]

The difference between the log-likelihoods

\[ 2(\ell_1^* - \ell_0^*) = T(\log |\hat\Omega_0| - \log |\hat\Omega_1|) \]

has a \(\chi^2\) distribution with degree of freedom \(n^2(p_1 - p_0)\).

Now consider Granger causality test in a VAR setting

\[ \begin{bmatrix}y_t \\ x_t\end{bmatrix} = \begin{bmatrix}\alpha_1 \\ \alpha_2\end{bmatrix} + \begin{bmatrix}\phi_{11}(L) & \phi_{12}(L) \\ \phi_{21}(L) & \phi_{22}(L)\end{bmatrix} \begin{bmatrix}y_t \\ x_t\end{bmatrix} + \begin{bmatrix}u_{t} \\ v_{t}\end{bmatrix} \]

Testing \(x\) fails to Granger-cause \(y\) is equivalent to test \(\phi_{12}=0\). Therefore, the restricted regression under \(H_0\) is

\[ y_t = \alpha_1 + \phi_{11}(L) y_t + u_t^{R} \]

The unrestricted regression is

\[ y_t = \alpha_1 + \phi_{11}(L) y_t + \phi_{12}(L) x_t + u_{t}^{U} \]

Estimate the variance-covariance matrices

\[ \begin{aligned} \hat\Omega^U &= \frac{1}{T}\sum_t u_t^U {u_t^U}' \\ \hat\Omega^R &= \frac{1}{T}\sum_t u_t^R {u_t^R}' \\ \end{aligned} \]

Form the LR test statistics

\[ \text{LR} = T(\log |\hat\Omega^U| - \log |\hat\Omega^R|) \sim \chi^2 \]

If the LR statistics is significant, it would mean \(x\) is informative about \(y\) (\(x\) Granger-causes \(y\)). Otherwise, if it makes no difference including \(x\) as regressors, it would mean \(x\) fails to Granger-cause \(y\).