Conditional Independence

Two random variables \(X\) and \(Y\) can be conditionally independent given the value of a third random variable \(Z\), while remaining dependent variables not given \(Z\). I came across this idea while reading a paper called “The Wisdom of Competitive Crowds” by Lichtendahl, Grushka-Cockayne, and Pfeifer (abstract here). I’m sure it is a familiar idea to those with more of a formal background in statistics than me, but it was the first time I had seen it.

The background is a competition set up to predict a quantity \(x\), where \(x\) is conditionally distributed as \[(x \mid \theta) \sim \mathcal{N}\left(\theta, \frac{1}{\lambda}\right).\] (Note: I will write all normal distributions in the form \(\mathcal{N}(\mu, \sigma^2)\).) The competitors all know the exact value of \(\lambda > 0\), but \(\theta\) is only known to have the distribution \[ \theta \sim \mathcal{N}\left(\mu_0, \frac{1}{m_0 \lambda}\right) .\]

The \(k\) competitors, or forecasters, each receive two pieces of information (signals) in the competition. The first signal \(s\) is public (known to all the competitors), and comes from the distribution \[ (s \mid \theta) \sim \mathcal{N}\left(\theta, \frac{1}{m_1 \lambda}\right). \] The second signal \(s_j\) is privately known to each forecaster \(j \in 1, \dots ,k\), and is distributed as \[ (s_j \mid \theta) \sim \mathcal{N}\left(\theta, \frac{1}{n \lambda} \right). \]

The competitors all know the above distributional structure, and the values of \(\mu_0\), \(m_0\), \(\lambda\), \(m_1\), and \(n\).

Now, if you know the value of \(\theta\) (remember that the competitors do not know this value), then the private signals are conditionally independent: \[s_i \perp s_j \mid \theta.\] In other words, if you know what value \(\theta\) is, then knowing one private signal \(s_i\) does not give you any new information about what another private signal \(s_j\) is. The private signals are drawn from the same distribution centered on \(\theta\), but the draws are independent. Even knowing all except one of the private signals does not allow you to predict with any more confidence the value of the last private signal.

The interesting thing is that if you do not know what value \(\theta\) is, then the private signals \(s_j\) are no longer independent variables. If you think about it this makes sense. If you do not know \(\theta\), than you do not know where the center of the distribution that the \(s_j\) are drawn from is. But if you find out one of the \(s_j\), then that gives you some information about the mean \(\theta\), and thus gives you some information allowing you to make a prediction about the other \(s_j\). And the more of the \(s_j\) you know, the more you know about \(\theta\), and the more accurately you can predict any unknown \(s_j\).

We can formalize this by calculating the correlation of two of the private signals \(s_i\) and \(s_j\). For random variables \(s_i\) and \(s_j\), the correlation is given by \[ \text{corr}(s_i,s_j) = \frac{\text{cov}(s_i,s_j)}{\sigma_{s_i} \sigma_{s_j}} = \frac{\mathbb{E}[(s_i-\mu_{s_i})(s_j-\mu_{s_j})]}{\sigma_{s_i} \sigma_{s_j}}.\] Here we first need \(\sigma_{s_j}\), and to get this we need to convert from a conditional distribution of \(s_j\) to an unconditional distribution. This can be done by integrating over \(\theta\): \[ \int_{-\infty}^{\infty} \text{pdf}(s_j \mid \theta)\, \text{pdf}(\theta)\, \text{d}\theta \sim \mathcal{N}\left(\mu_0, \frac{1}{\lambda}\left(\frac{1}{m_0} + \frac{1}{n} \right)\right).\] So \(\sigma_{s_i} = \sigma_{s_j} = \sqrt{(m_0+n)/(\lambda m_0 n)}\).

Since we know the conditional distributions of the private signals, we need to use the law of total covariance: \[ \text{cov}(s_i,s_j) = \mathbb{E}_\theta [ \text{cov}(s_i, s_j \mid \theta)] + \text{cov}[\mathbb{E}(s_i \mid \theta), \mathbb{E}(s_j \mid \theta)] .\] The first term on the righthand side is zero because of the conditional independence. The second term is \(\text{cov}(\theta, \theta) = \text{var}(\theta) = 1/(m_0\lambda)\). So \[ \text{corr}(s_i, s_j) = \frac{1}{m_0 \lambda} \frac{\lambda m_0 n}{m_0 + n} = \frac{n}{m_0 + n}. \] As \(n\rightarrow \infty\) while holding \(m_0\) fixed, \(\text{corr}(s_i, s_j) \rightarrow 1\). Intuitively this makes sense: large values of \(n\) correspond to private signals having lower variance, so they are clustered more tightly around \(\theta\), and knowing one private signal tells you more about \(\theta\) and thus more about other private signals.

Landon Lehman
Data Scientist

My research interests include data science, statistics, physics, and applied math.