# Heavy-tails and world records

In honor of the upcoming olympics, I figured I’d write a post highlighting something that JK, Bert, and I came up with in the process of writing our book on heavy tails.

One of the topics that is interwoven throughout the book is a connection between “extremal processes” and heavy-tails. In case you’re not familiar with extremal processes, the idea is that the process evolves as the max/min of a sequence of random variables. So, for example, $\displaystyle M_n = \min\{X_1, \ldots, X_n\}.$

Of course, the canonical example of such processes is the evolution of world records. So, it felt like a good time to post about them here…

What’s the connection between heavy tails and world records?

The New York Times has a great visualization of world records that helps to make this clear… You should click on over to it to play around, but the following screenshot highlights the point I want to make. It shows the progression of track running world records.  (On a related note, I’m very curious to see the results of the LetsRun.com clean/dirty poll for world records.)  In any case, though, in the plot below shows the pace (sec/100m) that each record decreases for all running track events (each line corresponds to a different event, in order of distance except for the 100/200, which are on top of each other). When you look at this, you can’t help but notice that the times between improvements look sort of heavy-tailed. Basically, there are lots of periods where the record changes frequently, but there are also others where the records are unchanged for long times.

In fact, this has been observed and tested empirically in a large number of contexts; for example, records for rainfall, earthquakes, and other extreme events.  You can find lots of papers in the literature identifying such things… Very commonly, the times between “records” are, in some sense, heavy-tailed.

Coming across this, it is of course natural to try and come up with a theoretical “explanation” for the observations. We looked around and couldn’t find one, so we tried to derive one ourselves…  Indeed, we did find a simple theoretical “explanation” for the connection, which I’ll prove for you in the following.

Some theory

To formalize the setting, let’s consider the following. Suppose we observe a sequence ${\{X_i\}_{i \geq 1}}$ of i.i.d. random variables, with distribution ${F.}$

Let ${L_k}$ denote the instant of the ${k}$‘th record, i.e., ${L_1 = 1}$ and ${L_{k+1} = \min\{i > L_{k} \ |\ X_i > M_{i-1}\}}$ for ${k \geq 1.}$ For ${k \geq 1,}$ let ${T_k := L_{k+1}-L_k}$ denote the time between the ${k+1}$st record and the ${k}$th record.

Then, we can prove the following theorem, which I think is new…

Theorem 1 Suppose that ${F}$ is continuous. For any ${k \geq 1,}$ ${T_k}$ is heavy-tailed, with $\displaystyle \mathop{\mathbb P}\{T_k > n\} \sim \frac{2^{k-1}}{n}.$

Note that this is a somewhat delicate situation that we’re trying to study, because ${T_k}$ is not stationary with respect to ${k.}$ Indeed, you should expect that as the record gets bigger, the time it takes to break it gets larger.

Also, a fun observation about the theorem is that it isn’t even required that ${F}$ have infinite support for ${T_k}$ to be heavy-tailed! …so looking at records can create heavy tails from things that are extremely light-tailed.

I think the proof is kind of interesting, so I figure it’s nice to include the idea of it here…let me know if you can find a simpler argument!

Proof:

The first part of the proof is to show that the distribution of ${T_k}$ is insensitive to the random variables we’re considering, i.e., to ${F}$> In particular, the following shows that we may assume that ${F}$ is exponentially distributed with no loss of generality, which makes things a lot easier!

To see this, define the function ${Q(x) = -\log \bar{F}(x).}$ Note that ${Q}$ is simply the cumulative hazard function corresponding to the distribution ${F.}$ The key of the argument is to show that the random variable ${Q(X_1)}$ is exponentially distributed with mean 1. This actually follows easily from the (standard) fact that ${U := \bar{F}(X_1)}$ is a uniform random variable over ${[0,1]:}$ $\displaystyle \mathop{\mathbb P}\{Q(X_1) > x\} = \mathop{\mathbb P}\{-\log(U)>x\} = \mathop{\mathbb P}\{U < e^{-x}\} = e^{-x}.$

Now, Let ${E_i = Q(X_i).}$ Clearly, ${\{E_i\}}$ is an i.i.d. sequence of exponential random variables. Moreover, since ${Q}$ is non-decreasing, records of the sequence ${\{X_i\}}$ coincide with those of the sequence ${\{E_i\}.}$ Thus, for the purpose of studying the time between records, we may assume without loss of generality that ${F}$ is an exponential distribution with mean 1.

Now, using the above, we can study the distribution of the records in a much simpler setting. What we’ll show is that the ${k}$th record ${M_{(k)} := X_{L_k}}$ has an Erlang distribution with shape parameter ${k}$ and rate parameter 1 (i.e., ${M_{(k)}}$ is distributed as the sum of ${k}$ i.i.d. exponential random variables, each having mean 1). To do this, we’ll proceed inductively. Clearly, the above claim is true for ${k=1,}$ since ${M_{(1)} = X_1.}$ Assume that the claim is true for some ${k \in {\mathbb N}.}$ Note that for ${x>0,}$ $\displaystyle \mathop{\mathbb P}\{M_{(k+1)} > M_{(k)} + x\} = \mathop{\mathbb P}\{E > M_{(k)} + x\ |\ E > M_{(k)}\},$

where ${E}$ is exponentially distributed with mean 1, and independent of ${M_{(k)}.}$ From the memoryless property of the exponential distribution, it now follows that ${\mathop{\mathbb P}\{M_{(k+1)} > M_{(k)} + x\} = e^{-x},}$ which implies that ${M_{(k+1)} = M_{(k)} +E,}$ in distribution. This proves our claim that ${M_{(k+1)}}$ has an Erlang distribution.

We are now ready to analyze the tail of ${T_k.}$ Once again, we proceed inductively, and first consider the case ${k=1.}$ Note that conditioned on the value of ${X_1,}$ ${T_1}$ is a geometrically distributed with $\displaystyle \mathop{\mathbb P}\{T_1 > n\ |\ X_1=x\} = (1-e^{-x})^n.$

Therefore, unconditioning with respect to ${X_1,}$ $\displaystyle \mathop{\mathbb P}\{T_1 > n\} = \int_{0}^{\infty} (1-e^{-x})^n e^{-x} dx.$

Making the substitution ${y = e^{-x},}$ we get $\displaystyle \begin{array}{rcl} \mathop{\mathbb P}\{T_1 > n\} = \int_{0}^{1} (1-y)^n dy = \frac{1}{n+1}. \end{array}$

It therefore follows that ${\mathop{\mathbb P}\{T_1 > n\} \sim \frac{1}{n}.}$

Next, we assume that for some ${k\in {\mathbb N},}$ ${\mathop{\mathbb P}\{T_k > n\} \sim \frac{2^{k-1}}{n}}$ and analyze the tail of ${T_{k+1}.}$ Recall that ${M_{(k+1)} = M_{(k)} +E}$ in distribution, where ${E}$ is exponentially distributed with mean 1, and independent of ${M_{(k)}.}$ Therefore, we can think of ${T_{k+1}}$ as the time until a new sample exceeds ${M_{(k)} +E.}$ Note that the time until a new sample exceeds ${M_{(k)}}$ is distributed as ${T_{k}.}$ Moreover, conditioned on a new sample ${X_i}$ exceeding ${M_{(k)},}$ the probability that it exceeds ${M_{(k)} +E}$ equals $\displaystyle \mathop{\mathbb P}\{X_i > M_{(k)} +E \ |\ X_i > M_{(k)}\} = \mathop{\mathbb P}\{X_i > E\} = 1/2.$

Note that the above calculation exploits the memoryless property of the exponential distribution, and the fact that ${X_i}$ and ${E}$ are i.i.d. Thus, when a new sample exceeds ${M_{(k)},}$ it also exceeds ${M_{(k)} +E}$ (and thus sets a new record) with probability 1/2. Therefore, ${T_{k+1}}$ is simply distributed as a geometric random sum of i.i.d. random variables, each distributed as ${T_{k},}$ i.e., $\displaystyle T_{k+1} = \sum_{i=1}^N Y_{k}(i)$

in distribution, where ${\{Y_{k}(i)\}}$ is an i.i.d. sequence of random variables with the same distribution as ${T_k,}$ and ${N}$ is a geometric random variable independent of ${\{Y_{k}(i)\}}$ with success probability 1/2. Finally, since ${T_k}$ is assumed to be regularly varying (and therefore subexponential, which I’ve talked about in an earlier post), we have $\displaystyle \mathop{\mathbb P}\{T_{k+1} > n\} \sim \mathop{\mathbb E[N]} \mathop{\mathbb P}\{T_{k} > n\} = 2\ \mathop{\mathbb P}\{T_{k} > n\},$

which proves our desired induction step. $\Box$