In honor of the upcoming olympics, I figured I’d write a post highlighting something that JK, Bert, and I came up with in the process of writing our book on heavy tails.
One of the topics that is interwoven throughout the book is a connection between “extremal processes” and heavy-tails. In case you’re not familiar with extremal processes, the idea is that the process evolves as the max/min of a sequence of random variables. So, for example,
Of course, the canonical example of such processes is the evolution of world records. So, it felt like a good time to post about them here…
What’s the connection between heavy tails and world records?
The New York Times has a great visualization of world records that helps to make this clear… You should click on over to it to play around, but the following screenshot highlights the point I want to make. It shows the progression of track running world records. (On a related note, I’m very curious to see the results of the LetsRun.com clean/dirty poll for world records.) In any case, though, in the plot below shows the pace (sec/100m) that each record decreases for all running track events (each line corresponds to a different event, in order of distance except for the 100/200, which are on top of each other).
When you look at this, you can’t help but notice that the times between improvements look sort of heavy-tailed. Basically, there are lots of periods where the record changes frequently, but there are also others where the records are unchanged for long times.
In fact, this has been observed and tested empirically in a large number of contexts; for example, records for rainfall, earthquakes, and other extreme events. You can find lots of papers in the literature identifying such things… Very commonly, the times between “records” are, in some sense, heavy-tailed.
Coming across this, it is of course natural to try and come up with a theoretical “explanation” for the observations. We looked around and couldn’t find one, so we tried to derive one ourselves… Indeed, we did find a simple theoretical “explanation” for the connection, which I’ll prove for you in the following.
To formalize the setting, let’s consider the following. Suppose we observe a sequence of i.i.d. random variables, with distribution
Let denote the instant of the ‘th record, i.e., and for For let denote the time between the st record and the th record.
Then, we can prove the following theorem, which I think is new…
Theorem 1 Suppose that is continuous. For any is heavy-tailed, with
Note that this is a somewhat delicate situation that we’re trying to study, because is not stationary with respect to Indeed, you should expect that as the record gets bigger, the time it takes to break it gets larger.
Also, a fun observation about the theorem is that it isn’t even required that have infinite support for to be heavy-tailed! …so looking at records can create heavy tails from things that are extremely light-tailed.
I think the proof is kind of interesting, so I figure it’s nice to include the idea of it here…let me know if you can find a simpler argument!
The first part of the proof is to show that the distribution of is insensitive to the random variables we’re considering, i.e., to > In particular, the following shows that we may assume that is exponentially distributed with no loss of generality, which makes things a lot easier!
To see this, define the function Note that is simply the cumulative hazard function corresponding to the distribution The key of the argument is to show that the random variable is exponentially distributed with mean 1. This actually follows easily from the (standard) fact that is a uniform random variable over
Now, Let Clearly, is an i.i.d. sequence of exponential random variables. Moreover, since is non-decreasing, records of the sequence coincide with those of the sequence Thus, for the purpose of studying the time between records, we may assume without loss of generality that is an exponential distribution with mean 1.
Now, using the above, we can study the distribution of the records in a much simpler setting. What we’ll show is that the th record has an Erlang distribution with shape parameter and rate parameter 1 (i.e., is distributed as the sum of i.i.d. exponential random variables, each having mean 1). To do this, we’ll proceed inductively. Clearly, the above claim is true for since Assume that the claim is true for some Note that for
where is exponentially distributed with mean 1, and independent of From the memoryless property of the exponential distribution, it now follows that which implies that in distribution. This proves our claim that has an Erlang distribution.
We are now ready to analyze the tail of Once again, we proceed inductively, and first consider the case Note that conditioned on the value of is a geometrically distributed with
Therefore, unconditioning with respect to
Making the substitution we get
It therefore follows that
Next, we assume that for some and analyze the tail of Recall that in distribution, where is exponentially distributed with mean 1, and independent of Therefore, we can think of as the time until a new sample exceeds Note that the time until a new sample exceeds is distributed as Moreover, conditioned on a new sample exceeding the probability that it exceeds equals
Note that the above calculation exploits the memoryless property of the exponential distribution, and the fact that and are i.i.d. Thus, when a new sample exceeds it also exceeds (and thus sets a new record) with probability 1/2. Therefore, is simply distributed as a geometric random sum of i.i.d. random variables, each distributed as i.e.,
in distribution, where is an i.i.d. sequence of random variables with the same distribution as and is a geometric random variable independent of with success probability 1/2. Finally, since is assumed to be regularly varying (and therefore subexponential, which I’ve talked about in an earlier post), we have
which proves our desired induction step.