Consider $n$ cars arranged in random order along a road, all moving in the same direction. Each car has its own distinct ideal speed, and will travel at that speed unless blocked by another car ahead. No passing is allowed (this is a windy, mountain road, say). After some time, the cars will have settled into a series of chains of cars, each composed of a bunch of cars trapped behind a slower car in the front of the chain. The chains will be sorted in order of speed with the fastest chain in front.

The question: what is the expected number of chains?

Rather than giving the answer directly, I’ll describe the steps it took me to get to the exact answer, because I think they’re an interesting example of how one moves through guessing to approximations to precision.

So, let’s see…what about the slowest car? The slowest car will be exactly in the middle on average, and will turn all the cars behind it into a single, gigantic chain. Only $n/2$ cars remain in front, so the answer should be $O(\log n)$.

What’s the base? $\log$ is concave down, so it matters more if the slowest car happens to be closer to the front than it matters to be closer to the back. Thus the number of chains is probably smaller than $\log_2 n$, which is what we’d get if the slowest car was always exactly in the middle. That means the base should be greater than 2. There are only two sensible bases for logarithms: 2 and $e$, one of which is conveniently greater than 2. So we expect the answer is about $\log n = \log_e n$.

The answer also has to be a rational number, so we need something rational close to $\log n$. That usually means harmonic numbers: $H_n = 1 + ½ + \cdots + 1/n$. To check: If there is one car, the number of chains is $1 = H_1$. If there’s two cars, the faster car will be ahead half the time, so the expected number of chains is $½ (1 + 2) = 3/2 = H_2$. So far so good.

Now to prove it. This step took me a rather long time given the simplicity of the answer (still an enjoyable way to highlight part of a hike, though). Here’s the proof: Consider the $k$th car, counting from 1. This car will be the leader of a chain (a dubious distinction) if and only if it’s slower than all cars in front. But the relative speed order of the cars from 1 to $k$ is purely random, so this happens with probability $1/k$. Summing over all positions of cars, we get $H_n$.

Fun. This is one of the nicest examples I’ve seen of how natural logs crop up (cough) naturally in purely combinatorial settings. Another cool example is the treap data structure (a combination of an ordered binary search tree and a random heap), where the analysis actually involves both the harmonic numbers and the sums of the squares (which are asymptotic to $\pi^2/6 = O(1)$).

Which raises the obvious followup question: is it possible to tweak the problem so that the expected number of chains is also asymptotic to $\pi^2/6$? E.g., by allowing passing in certain situations?