Statistics

Conditional Independence

Two random variables (X) and (Y) can be conditionally independent given the value of a third random variable (Z), while remaining dependent variables not given (Z). I came across this idea while reading a paper called “The Wisdom of Competitive Crowds” by Lichtendahl, Grushka-Cockayne, and Pfeifer (abstract here). I’m sure it is a familiar idea to those with more of a formal background in statistics than me, but it was the first time I had seen it.

Mean Distances in p-dimensions

In my last post, I calculated the mean distance to the nearest point to the origin among (N) points taken from a uniform distribution on ([-1, 1]). This turned out to be (r = 1/(N+1)), which is close to but greater than the median distance (m = 1 - (12)^{1/N}). In this post, I want to generalize the calculation of the mean to a (p)-dimensional uniform distribution over the unit ball.

(Mean) Distances in Uniform Distributions

Consider the uniform distribution on ([-1, 1]), and take (N) points from this distribution. What is the mean distance from the origin to the nearest point? If you take the median instead of the mean, you get the answer outlined in my last post. The mean makes things more challenging. Here is a solution that makes sense to me. I am sure there is a more formalized way to go about this, but I was trained as a physicist, so I tend to use “informal” mathematics.

(Median) Distances in Uniform Distributions

Take a uniform distribution on ([-1, 1]), so (p(x) = 12). Pick a single point from this distribution. The probability that this point is within a distance (m) from the origin is (2m/2 = m). The probability that this point is not within a distance (m) from the origin is then (1-m). Now consider picking (N) points from the distribution. The probability that all (N) points are further than (m) from the origin is ((1-m)^N).