In chapter 1 of their 2016 book Computer Age Statistical Inference, Bradley Efron and Trevor Hastie use as an example a study of kidney function, comparing results from linear regression and lowess (locally weighted scatterplot smoothing). The purpose of this blog post is to reproduce their results using R.
I will not copy the plots from the book here (for copyright reasons), but you can download a pdf of the book for free from a website created by the authors: https://web.

Consider a multinomial distribution with \(c\) categories and associated probabilities \(\pi_i\), where \(i = 1,...,c\). It is often useful to be able to estimate the ratio of the probabilities of two of the categories from observed data. Without loss of generality, let these two categories be 1 and 2, so that the desired ratio is \(\phi = \pi_1/\pi_2\).
Putting a Dirichlet prior on the multinomial priors \(\pi_i\) results in a Dirichlet posterior, since the Dirichlet distribution is a conjugate prior for the multinomial likelihood.

Once upon a time, in a small town, there were two carpenters: Bob and Tom. Bob was an honest, hardworking carpenter with a good reputation. Tom was a somewhat incompetent carpenter with shady business practices.
At one point in time, Bob and Tom had each completed 100 carpentry jobs. Bob’s customers were satisfied with 90 out of his 100 jobs. Tom’s customers were satisfied with 95 out of his 100 jobs.

I’m reading a book that deals with the history of probability theory (Classic Problems of Probability by Prakash Gorroochurn). It is interesting to look back from a modern perspective and realize how many heated arguments could have been resolved in a few minutes if the participants would have had access to a computer.
For example, Chapter 1 deals with Cardano and how he (understandably) confused probability with expectation. If you roll a die 3 times, the expectation value of the number of sixes is 0.

Two random variables \(X\) and \(Y\) can be conditionally independent given the value of a third random variable \(Z\), while remaining dependent variables not given \(Z\). I came across this idea while reading a paper called “The Wisdom of Competitive Crowds” by Lichtendahl, Grushka-Cockayne, and Pfeifer (abstract here). I’m sure it is a familiar idea to those with more of a formal background in statistics than me, but it was the first time I had seen it.

In my last post, I calculated the mean distance to the nearest point to the origin among \(N\) points taken from a uniform distribution on \([-1, 1]\). This turned out to be \(r = 1/(N+1)\), which is close to but greater than the median distance \(m = 1 - (1/2)^{1/N}\).
In this post, I want to generalize the calculation of the mean to a \(p\)-dimensional uniform distribution over the unit ball.

Consider the uniform distribution on \([-1, 1]\), and take \(N\) points from this distribution. What is the mean distance from the origin to the nearest point?
If you take the median instead of the mean, you get the answer outlined in my last post. The mean makes things more challenging. Here is a solution that makes sense to me. I am sure there is a more formalized way to go about this, but I was trained as a physicist, so I tend to use “informal” mathematics.

Take a uniform distribution on \([-1, 1]\), so \(p(x) = 1/2\). Pick a single point from this distribution. The probability that this point is within a distance \(m\) from the origin is \(2m/2 = m\). The probability that this point is not within a distance \(m\) from the origin is then \(1-m\).
Now consider picking \(N\) points from the distribution. The probability that all \(N\) points are further than \(m\) from the origin is \((1-m)^N\).

Powered by the Academic theme for Hugo.