34 Intro to Bayes
We have completed our venture on classical time series topics covering univariate and multivariate, stationary and non-stationary models. This chapter taps into the Bayesian approach to time series analysis, which is not an essential component for classical treatment of the subject. But given its rising popularity and importance, it is an exciting topic that cannot be missed. Bayesian statistics is a whole new world for frequentist statisticians. We introduce the Bayesian approach with an example.
34.1 The Sunrise Problem
Question: What is the probability that the sun will rise tomorrow?
We do not consider the physics here. Suppose we want to answer this question by purely statistics. What we need to do is to observe how many days the sun had risen in the past, and make some inference about the future. If we had collect the data on the past
Let’s have a look at how Laplace in the 18th century solves this problem. Let
In other words,
Suppose we have observed the data for
We know
Our goal is to find:
Recall that the Bayesian rule allows us to invert the conditional probability:
Using this formula, we have
If
for
As
34.2 The Bayesian Approach
This illustration literally shows every tenet of the Bayesian approach. We start with an prior distribution about an unknown parameter
The principle of Bayesian analysis is then to combine the prior information with the information contained in the data to obtain an updated distribution accounting for both sources of information, known as the posterior distribution. This is done by using the Bayes rule:
The posterior distribution
One difficulty of Bayesian inference is that the denominator
But the relative frequencies of parameter values are easy to compute
This allows us to sample from the posterior distribution even the
The relative weight of the prior versus the data in determining the posterior depends on (i) how strong the prior is, and (ii) how many data we have. If the prior is so strong (very small variance / uncertainty) that seeing the data will not change our beliefs, the posterior would be mostly determined by the prior. On the contrary, if the data is so abundant that the evidence overwhelms any prior belief, the impact of prior would be negligible.
34.3 Frequentist vs Bayesian
Frequentists and Bayesians hold different philosophy about statistics. Frequentists view our sample as the result of one of an infinite number of exactly repeated experiments. The data are randomly sampled from a fixed population distribution. The unknown parameters are properties of the population, and therefore are fixed. The purpose of statistics is to make inference about the population parameters (the ultimate truth) with limited samples. The uncertainty associated with this process arises from sampling. Because we do not have the entire population, each sample only tells partial truth about the population. Therefore our inference about the parameters can never be perfect due to sampling errors. Frequentists conduct hypothesis tests assuming a hypothesis (about the population parameter) is true and calculating the probability of obtaining the observed sample data.
In Bayesians’ world view, probability is an expression of subjective beliefs (a measure of certainty in a belief), which can be updated in light of new data. Parameters are probabilistic rather than fixed, which reflects the uncertainties about the parameters. The essence of Bayesian inference is to update the probability of a ‘hypothesis’ given the data we have obtained. The Bayes’ rule is all we need. All information is summarized in the posterior probability and there is not need for explicit hypothesis testing.
Frequentist | Bayesian |
---|---|
Probability is the limit of frequency | Probability is uncertainty |
Parameters are fixed unknown numbers | Parameters are random variables |
Data is a random sample from the population | Data is fixed/given |
LLN/CLT | Bayes’ rule |
In time series analysis, there are good reasons to be Bayesian. Perhaps the frequentist perspective makes sense in a cross section, where it is intuitive to image taking different samples from the population. However, in time series we have only one realization. It is difficult to imagine where we would obtain another sample. It is more natural to take a Bayesian perspective. For example, we have some prior belief on how inflation and unemployment might be related (the Phillips curve), then we update our belief with data.
Frequentists often criticize Bayesians’ priors as entirely subjective. Bayesians would respond that frequentists also have prior assumptions that they are not even aware of. Frequentist inference utilizes the LLN and CLT, which inevitably assumes the speed of convergence. In settings like VAR models, where there are a large number of parameters to estimate but only a limited amount of observations. Are the asymptotically properties really plausible? Bayesians believe it would be better to make our assumptions explicit.
Apart from the philosophical difference, in practice Frequentists and Bayesians might well give similar results (though the results should be interpreted differently). After all, if the data is plenty, the influence of priors would diminish to zero.