Two random variables \(X\) and \(Y\) can be conditionally independent given the value of a third random variable \(Z\), while remaining dependent variables not given \(Z\). I came across this idea while reading a paper called “The Wisdom of Competitive Crowds” by Lichtendahl, Grushka-Cockayne, and Pfeifer (abstract here). I’m sure it is a familiar idea to those with more of a formal background in statistics than me, but it was the first time I had seen it.

In my last post, I calculated the mean distance to the nearest point to the origin among \(N\) points taken from a uniform distribution on \([-1, 1]\). This turned out to be \(r = 1/(N+1)\), which is close to but greater than the median distance \(m = 1 - (1/2)^{1/N}\).
In this post, I want to generalize the calculation of the mean to a \(p\)-dimensional uniform distribution over the unit ball.

Consider the uniform distribution on \([-1, 1]\), and take \(N\) points from this distribution. What is the mean distance from the origin to the nearest point?
If you take the median instead of the mean, you get the answer outlined in my last post. The mean makes things more challenging. Here is a solution that makes sense to me. I am sure there is a more formalized way to go about this, but I was trained as a physicist, so I tend to use “informal” mathematics.

Take a uniform distribution on \([-1, 1]\), so \(p(x) = 1/2\). Pick a single point from this distribution. The probability that this point is within a distance \(m\) from the origin is \(2m/2 = m\). The probability that this point is not within a distance \(m\) from the origin is then \(1-m\).
Now consider picking \(N\) points from the distribution. The probability that all \(N\) points are further than \(m\) from the origin is \((1-m)^N\).

What is Linda? Data Cleaning Linda Finish and Bodyweight (Men) Linda Finish and Bodyweight (Women) What is Linda? Linda is the name of a CrossFit benchmark workout. The original form of the workout is 10-9-8-7-6-5-4-3-2-1 reps of the triplet
deadlift at 1.5 times bodyweight bench press at bodyweight clean at 0.75 times bodyweight Linda was included in the 2018 CrossFit regionals in a standardized form. Based off the average bodyweight of CrossFit games athletes, it was 10-9-8-7-6-5-4-3-2-1 reps of the triplet (weights listed are male/female)

The CrossFit Regionals are finished and the Games athletes have been selected! Here are the visualizations for the third and final week of the competition, containing the Atlantic, Meridian, and Pacific Regionals. As in the previous 2 weeks, everything was done using R. Also, just like last week with the Latin America Regional, I chose to include all top 5 athletes even in the case of the Meridian Regional which had only 4 CrossFit Games qualifying spots.

Week 3 of the CrossFit Regionals is currently underway (you can watch online here), and I am just finishing up some visualizations for Week 2. I’m also looking forward to when regionals are finished and I can do some cross-regional comparisions.
Week 2 contained competitions for the Central, Latin America, and West regions. I will follow the same format I did for Week 1: first showing event finishes for all the regions, then cumulative points.

If you have visited my website before, you might notice that things look a bit different. I wanted more control over some things, so I decided to switch from wordpress to a static site using `blogdown`

.

Over the next few days I will be filling in old blog posts from my wordpress site, as well as experimenting with various settings on this new site.

There is one weekend of CrossFit Regionals competition left in the 2018 season, and I thought it would be interesting to do some visualizations in order to get practice using R. I scraped the data from the CrossFit Games website and used the library jsonlite to parse it.
There were 3 regionals during the first weekend of competition (May 18-20): South, Europe, and East. I first decided to look at event finishes.

I started reading The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, and was curious about how to reproduce Figure 2.5. (The book is made available as a free and legal pdf here.)
So I figured out how to produce similar figures using Mathematica. I assume that this is also fairly straightforward to do in R, but I don’t yet know enough R.
The authors explain the sampling method on pages 16 and 17.