Catastrophes, Conspiracies, and Subexponential Distributions (Part III)

In part I and part II of this post, I went over the conspiracy and catastrophe principles informally and formally… But, since the book we’re writing is on heavy-tails, I figured I’d dwell a little longer on the catastrophe principle before moving on. In particular, I still have to get to the third part of the title: “subexponential distributions.”

Subexponential distributions

Subexponential distributions are a subclass of heavy-tailed distributions that correspond exactly to those distributions that satisfy the catastrophe principle. The most classical definition of the class of subexponential distributions is the following:

Definition 1 A distribution {F} with support {{\mathbb R}_+} is said to be subexponential {(F\in\mathcal{S})} if, for all {n \geq 2} independent random variables {X_1,X_2,\cdots,X_n} with distribution {F,}

\displaystyle \mathop{\mathbb P}\{X_1+X_2+\cdots+X_n > t\} \sim n \mathop{\mathbb P}\{X_1>t\}, \text{ i.e., }\bar{F}^{n*}(t) \sim n \bar{F}(t).

As stated above, it is not immediately clear how subexponential distributions are related to the catastrophe principle. However, with a simple calculation, it is easy to see that they are intimately related. In particular, {n\mathop{\mathbb P}\{X_1>t\}} is asymptotically equivalent to the {\mathop{\mathbb P}\{\max(X_1,\dots,X_n)>t\}}. To see this, we simply need to expand {\mathop{\mathbb P}\{\max(X_1,\dots,X_n)>t\}} as follows:

\displaystyle \begin{array}{rcl} \lim_{t\rightarrow\infty} \frac{\mathop{\mathbb P}\{\max(X_1,\ldots,X_n)>t\}}{\mathop{\mathbb P}\{X_1>t\}} &=& \lim_{t\rightarrow\infty} \frac{1 - (1-\bar{F}(t))^n}{\bar{F}(t)} \\ &=& \lim_{t\rightarrow\infty} \frac{1- (1-n\bar{F}(t)+{n\choose 2} \bar{F}(t)^2-\ldots)}{\bar{F}(t)} \\ &=& \lim_{t\rightarrow\infty} \frac{n\bar{F}(t)+o(\bar{F}(t))}{\bar{F}(t)} = n. \end{array}

The above highlights that the tail of the max of {n} random variables is proportional to {n} times the tail of a single random variable, i.e.,

\displaystyle \mathop{\mathbb P}\{\max(X_1,\ldots,X_n)>t\} \sim n \mathop{\mathbb P}\{X_1>t\}, \ \ \ \ \ (1)

and so, the definition of subexponential distributions exactly corresponds to the catastrophe principle.

We have already pointed out that most common heavy-tailed distributions satisfy the catastrophe principle, so a consequence of the above is that they are subexponential distributions. That is, the class of subexponential distributions includes all regularly varying distributions (including the Pareto), the LogNormal, the Weibull (with {\alpha<1}), and many others.

Note that I have not actually proven to you that the LogNormal, the Weibull, or any other distribution satisfies the catastrophe principle. This is not an accident — it is not straightforward to prove inclusion of these distributions directly from the definitions of the we have seen so far. In particular, though the definitions we have given have intuitive forms and provide useful structure for the analysis of sums of subexponential distributions, they do not provide an easy approach for proving that distributions are subexponential in the first place.

However, there are a number of more easily verifiable conditions that can be used to show that distributions are subexponential, the first of which is actually yet another equivalent definition of the class of subexponential distributions. Specifically, it turns out that it is not required for the definition of subexponentiality to hold for all {n\geq 2}; if it holds for {n=2}, it then necessarily holds for all {n\geq 2}. Further, if it can be shown that if it holds for some {n\geq 2}, it necessarily holds for {n=2} and, consequently, for all {n\geq 2}. This is all summarized in the following lemma.

Lemma 2 Consider {X_1, X_3, \ldots} independent random variables with distribution {F} having support {{\mathbb R}_+}. The following statements are equivalent.

  1. {F} is subexponential, i.e., {\mathop{\mathbb P}\{X_1+X_2+\cdots+X_n > t\} \sim n \mathop{\mathbb P}\{X_1>t\}} for all {n\geq 2}.
  2. {\mathop{\mathbb P}\{X_1+X_2 > t\} \sim 2 \mathop{\mathbb P}\{X_1>t\}}.
  3. {\mathop{\mathbb P}\{X_1+X_2+\cdots+X_n > t\} \sim n \mathop{\mathbb P}\{X_1>t\}} for some {n\geq 2}.

Clearly, Lemma 2 makes the task of verifying subexponentiality easier; however, it can still be difficult to work with. Often, the most natural approach for verifying subexponentiality comes though the use of the hazard rate. Recall that the hazard rate, a.k.a., the failure rate, of a distribution {F} with density {f} is defined as {q(x) = \frac{f(x)}{1-F(x)}.} Heavy-tailed distributions are often categorized via a decreasing hazard rate, and we will discuss the hazard rate in detail in the book. However, for now, the role of the hazard rate is simply as a tool for checking subexponentiality — as long as it decays to zero quickly enough, the distribution is subexponential.

Lemma 3 Suppose that {q(x)} is eventually decreasing, with {q(x) \rightarrow 0} as {x \rightarrow \infty.} If {\int_{0}^{\infty}e^{x q(x)} f(x) dx < \infty,} then {X} is subexponential.

The form of the condition above is not particularly intuitive; however it provides a clear approach for verifying that a distribution is subexponential. In particular, it is quite effective for showing that the Weibull (with {\alpha>1}) and the LogNormal distributions are subexponential.

An example: Random sums

Since the class of subexponential distributions serves as a formalization of the catastrophe principle, it is natural that it finds application most readily in settings that are fundamentally related to some form of a random sum. Of course, such applications are common in settings such as finance, insurance, and queueing (among others).

In many such settings the core of the analysis relies on understanding a very simple process — a sum of a random number of independent and identically distributed random variables. For example, if one considers the total amount of money paid by an insurance company in a year, there are a random number of events requiring a payout and each payout might be assumed to be independent and identically distributed. And, of course, either (or both) the distribution of the number of events and the payout of each event could be heavy-tailed.

More formally, to illustrate the power of the class of subexponential distributions, we consider the following: Suppose {\{X_i\}_{i \geq 1}} is a sequence of independent and identically distributed random variables with mean {\mathop{\mathbb E}[X]} and the random variable {N} takes values in {{\mathbb N}} and is independent of {\{X_i\}_{i \geq 1}.} Our goal will be to characterize

\displaystyle S_N=\sum_{i=1}^N X_i.

You have likely studied the expectation of this random sum in an introductory probability course. In particular, Wald’s equation gives us a simple and pleasing formula for {\mathop{\mathbb E}[S_N]}.

Theorem 4 (Wald’s Equation) { \mathop{\mathbb E}[S_N] = \mathop{\mathbb E}[\sum_{i=1}^N X_i] = \mathop{\mathbb E}[N] \mathop{\mathbb E}[X].}

Wald’s equation is particularly pleasing because it tells us that, with respect to the expectation, we can basically ignore the fact that {N} is random. That is, if we had just considered {S_n} for some fixed constant {n}, then {\mathop{\mathbb E}[S_n]=n \mathop{\mathbb E}[X]}, and Wald’s equation simply replaces {n} with {\mathop{\mathbb E}[N]}.

While Wald’s equation is a particularly useful result; it is not always enough to have a characterization of the mean. We often want to understand the variance of {S_N}, or even the distribution of {S_N}.

Luckily, it is not hard to generalize Wald’s equation. For example, the variance of the random sum {S_N} still has a pleasing form:

\displaystyle Var[\sum_{i=1}^N X_i] = \mathop{\mathbb E}[N]Var[X]+(\mathop{\mathbb E}[X])^2 Var[N].

In fact, it is even possible to go much further than just the variance and to derive Wald’s-like inequalities for the tail of random sums. However, results about the tail of {S_N} are not as general as Wald’s equation and rely on using particular properties of distributions. In fact, the tail of random sums can behave very differently depending on whether {X_i} and/or {N} are heavy-tailed or light-tailed. It is in deriving these results that the class of subexponential distributions will show its value (as will the class of regularly varying distributions).

Tails of Random sums

Before talking about formal results about the tail of random sums it is useful to think about how we should expect that tail to behave. One natural suggestion is that we should expect something like we saw with Wald’s equation — it should be “as if” the {N} was simply a constant {n}. If this were the case, and if the {X_i} are subexponential, we would have very simple equation for the tail of the sum:

\displaystyle \mathop{\mathbb P}\{\sum_{i=1}^n X_i>t\} \sim n \mathop{\mathbb P}\{X_1>t\}.

Thus, one might guess that this would hold for the tail of the random sum as well with {n} replaced by {\mathop{\mathbb E}[N]}.

The above intuition is indeed correct when {N} is light-tailed. In this case, since the {X_i} are heavy-tailed they “dominate” the behavior of the tail of the random sum, and so only the expectation of {N} plays a role.

Theorem 5 Consider an infinite i.i.d. sequence of subexponential random variables {X_1,X_2,\ldots}, and a light-tailed random variable {N\in{\mathbb N}} that is independent of {\{X_i\}_{i \geq 1}.} Then,

\displaystyle \mathop{\mathbb P}\{\sum_{i=1}^N X_i > t\} \sim \mathop{\mathbb E}[N] \mathop{\mathbb P}\{X_1 > t\}.

The characterization of random sums we have given so far has relied on the fact that the distribution of {X_i} is dominant, i.e., {X_i} are heavy-tailed and {N} is light-tailed. In this case, we were able to give a simple form for the tail of the random sum that parallels Wald’s equation. But, we have not yet understood what happens when things are reversed and the distribution of {N} is dominant, i.e., when {N} is heavy-tailed and {X_i} is light-tailed.

Intuitively, in this case, we should expect the tail of the random sum to be determined by the tail of {N}. To get intuition, let us consider what would happen if the {X_i} were deterministically {x}. In this case, the behavior of the sum is simple

\displaystyle \mathop{\mathbb P}\{\sum_{i=1}^N x>t\} = \mathop{\mathbb P}\{Nx >t\}= \mathop{\mathbb P}\{N> t/x\}.

Thus, one might guess that this would hold for the tail of the random sum as well with {x} replaced by {\mathop{\mathbb E}[X]}.

This simple intuition is again correct more generally when {X_i} is light-tailed and {N} is heavy-tailed, specifically when {N} is regularly varying (which is another important class of heavy-tailed distributions).

Theorem 6 Consider an infinite i.i.d. sequence of light-tailed random variables {X_1,X_2,\ldots}, and a regularly varying random variable {N} that is independent of {\{X_i\}_{i \geq 1}}. Then

\displaystyle \mathop{\mathbb P}\{\sum_{i=1}^N X_i > t\} \sim \mathop{\mathbb P}\{N > t/\mathop{\mathbb E}[X]\}.


13 thoughts on “Catastrophes, Conspiracies, and Subexponential Distributions (Part III)

  1. Pingback: Rigor + Relevance | Heavy-tails and world records

  2. Pingback: Discontinuous Phase Transitions | Eventually Almost Everywhere

  3. This discussion is highly relevant to my research. Thanks! Just for clarification, what does the symbol “~” stand for in your notes (proportionality, convergence as t goes to infinity, other type of convergence)? Also, are there other sources, books or papers (apart from your soon-to-be-published book) that can serve as a reference regarding the “conspiracy” and “catastrophe” principles? Thanks again.

    • Thanks for the comment Andreas. I use f(x)~g(x) to mean that lim_x\to\infty f(x)/g(x) = 1. As for other references, there aren’t too many that really focus on this distinction. For the catastrophe principle, if you look up the “principle of a single big jump” you’ll find some related discussion, but for the conspiracy principle the discussions tend to get lumped into large deviations theory and not contrasted with heavy-tailed phenomena.

      • Thanks for the clarification. The only other reference I’ve found, is a very brief discussion in D. Sornette’s book “Critical Phenomena in Natural Sciences” (Section 3.4.1), and apparently he refers to the conspiracy principle as the “Democratic” result, but again, he does not list many other references.

        I have a question regarding the subexponentiality of the Lognormal: Intuitively, how can one understand that the subexponentiality of the Lognormal is not dependent on its parameters, given the observation that a Lognormal(mu, sigma^2) is almost indistinguishable from a Normal(e^mu, (sigma*e^mu)^2) distribution when sigma<<1 ?

  4. Thanks for the pointer Andres… and as for the subexponentiality of the lognormal. It’s not too hard to show it using Lemma 3 above. The calculation gives some interpretation, but not too much. Intuitively though, I guess the main issue is that one has to be really careful using the Normal(e^mu,…) approximation of a logNormal if one cares about the behavior of rare events.

  5. Why is your definition of sub-exponentiality so drastically different from a common one – existence of MGF for $t>0$?

    • I think you’re mixing up the definition of “heavy-tailed” with the definition of “subexponential”. Subexponential distributions are a sub-class of heavy-tailed distributions. (You’ll find a couple of other series’ of posts here on other subclasses as well — long tailed and regularly varying.)

      • No, I am not mixing up anything. Sub-Gaussian = dominated by some Gaussian. Sub-exponential = dominated by some exponential (i.e. MGF is defined at some $t>0$). Sub = under, smaller, dominated by. Standard definitions.

        Calling ANY heavy-tailed distribution SUB-exponential is a VERY bad choice since ALL heavy-tailed distributions are SUPER-exponential.

  6. I completely understand your point — but, this definition of subexponential distributions is well-known, established, and completely standard. This is not a name I have made up, there are whole books about it! I agree, it’s an example of the crazy (and often confusing) zoo of names that are used for various heavy-tailed distributions, but it’s one that we’re stuck with!

  7. …oh, and to provide the initial motivation for the name: the “sub” here refers to the tails decaying slower than an exponential (as opposed to the tail being lighter than an exponential).

    • Gotcha. Sub-exponential *decay*. But then it would make sense to call all heavy-tail distributions (no MGF) “sub-exponential” – not this specific sub-class. I do see that per Wikipedia page (and other references) your usage is also also standard usage, but this is, in words of Dave Chappelle, “fucking confusing”. It’s like “literally” meaning “literally” and “figuratively” depending on context

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s