Catastrophes, Conspiracies, and Subexponential Distributions (Part II)

In part I of this post, I described the conspiracy and catastrophe principles informally. However, as I mentioned, these principles can be made rigorous, and can serve as powerful analytic tools when studying heavy-tailed and light-tailed distributions.

It is important to note that there is not really one catastrophe principle and one conspiracy principle. Instead, there are many variations of these principles that can be defined and used, each with varying strengths and generality. In this post, I’ll introduce the simplest statements of each in order to highlight how these properties can be formalized. You can see the book for other versions…

A catastrophe principle

The idea behind the catastrophe principle is that an unexpectedly large sum of random variables is likely the result of a catastrophe, i.e, a result of one unexpectedly large event (sample). Said differently, this means that the tail of the sum of random variables is on the same order as the tail of the maximum element in the sum, which leads immediately to the following formalization.

Definition. A distribution {F} over the non-negative reals is said to satisfy the catastrophe principle if, for all {n\geq 2} independent random variables {X_1, X_2, \ldots, X_n} with distribution {F},

\displaystyle \mathop{\mathbb P}\{\max(X_1,X_2,\ldots,X_n) > t\} \sim \mathop{\mathbb P}\{X_1+X_2+\ldots+X_n > t\}.

The catastrophe principle is a particularly surprising phenomenon since, a priori, there are many ways things that could have led to the sum being large — every sample could have been slightly bigger than average, a few samples could have all been fairly large, etc. However, the catastrophe principle gives us that the sum is large because of exactly one very large sample. Note that, assuming that {X_i\geq 0}, the catastrophe principle has another very intuitive form

\displaystyle \begin{array}{rcl} \mathop{\mathbb P}\{\max(X_1,\ldots,X_n)>t|X_1 + \ldots + X_n>t\} &=& \frac{\mathop{\mathbb P}\{\max(X_1,\ldots,X_n)>t \cap X_1 + \ldots + X_n>t\}}{\mathop{\mathbb P}\{X_1 + \ldots + X_n>t\}} \\ &=& \frac{\mathop{\mathbb P}\{\max(X_1,\ldots,X_n)>t\}}{\mathop{\mathbb P}\{X_1 + \ldots + X_n>t\}} \rightarrow 1 \text{ as } t\rightarrow\infty. \end{array}

This highlights that, if the catastrophe principle holds, then the maximum of {n} samples very likely to be bigger than {t} given that the sum of {n} samples is bigger than {t}.

The definition of the catastrophe principle is simple and general enough that it is satisfied by almost all common heavy-tailed distributions. For example, the Pareto distribution satisfies the catastrophe principle, as does the Weibull (with {\alpha<1}) and the LogNormal.

In fact, the notion of the catastrophe principle is important enough that there is a formal class of heavy-tailed distributions defined around it — the class of subexponential distributions — and I’ll probably discuss this class (and its properties) in a later post.

Though most common heavy-tailed distributions satisfy the catastrophe principle, not all heavy-tailed distributions do. Thus, one needs to be careful to separate the catastrophe principle property from notion heavy-tailed distributions as a whole. In particular, it is also not too difficult to construct heavy-tailed distributions that do not satisfy the catastrophe principle…but I’ll leave that as an exercise.

A conspiracy principle

In contrast to the catastrophe principle, the conspiracy principle says that an unexpectedly large sum of random variables is likely the result of a conspiracy, i.e., a result of multiple larger-than-average events (samples). Said differently, this means that the tail of the sum of random variables dominates the tail of the maximum element in the sum, which leads immediately to the following formalization.

Definition. A distribution {F} over the non-negative reals is said to satisfy the conspiracy principle if, for all {n\geq 2} independent random variables {X_1, X_2, \ldots, X_n} with distribution {F,}

\displaystyle \mathop{\mathbb P}\{\max(X_1,X_2,\ldots,X_n) > t\}= o(\mathop{\mathbb P}\{X_1+X_2+\ldots+X_n > t\}).

The above definition of a conspiracy principle is simple and broad enough that it is typically easy to show that it is satisfied by common light-tailed distributions. For example, the Exponential distribution, the Normal distribution, and the Weibull distribution ({\alpha\geq1}) can easily be shown to satisfy this conspiracy principle.

The Weibull distribution is a nice example to use to contrast the conspiracy principle with the catastrophe principle, since depending on {\alpha}, it can satisfy either. It also highlights that much stronger versions of the conspiracy principle are often possible. For example, the following result shows that, if a sum of two light-tailed Weibull random variables is unexpectedly large, then it is most likely that each of the random variables contributes equally to the sum; i.e., they have “conspired” to make the sum large.

Proposition. Suppose {X_1} and {X_2} are independent and identically distributed Weibull random variables with shape parameter {\alpha > 1.} Then, for any {\delta \in (1/2,1),}

\displaystyle \mathop{\mathbb P}\{X_1+X_2 > t,\ X_1 > \delta t\} = o(\mathop{\mathbb P}\{X_1+X_2 > t\}).

A final remark about the conspiracy principle is that, though most common light-tailed distributions satisfy the conspiracy principle, not all light-tailed distributions do (as is the case for the catastrophe principle and heavy-tailed distributions). For example, it is not hard to find light-tailed distributions that do not satisfy the conspiracy principle by mixing heavy-tailed and light-tailed distributions, e.g., if {X=\min(Y,Z)}, where {Y\sim\text{Exponential}(\mu)} and {Z\sim \text{Pareto}(x_m,\alpha)}, then {X} is light-tailed and does not satisfy the conspiracy principle.

Advertisements

4 thoughts on “Catastrophes, Conspiracies, and Subexponential Distributions (Part II)

  1. Pingback: Means and Markov’s Inequality | Eventually Almost Everywhere

  2. Adam,

    Nice explanation. It strikes me that catastrophe versus conspiracy is really a design issue. A system designed from only fail-stop, centralized single points of failure would exhibit catastrophic failures. Distributed peer-to-peer models will suffer from conspiracies. Perhaps, empirical studies on outliers should include discussion on whether the system design favors catastrophes or conspiracies.

  3. Nice point Chris. Yes, heavy-tail/light-tail and conspiracy/catastrophe can definitely be translated into design issues. Predrag Jelenkovic et al. have some nice papers recently that highlight that in the context of network protocols — they explain when retransmissions, etc, can lead to conspiracy vs. catastrophe. There’s not a good theory about this in general yet though!

  4. Pingback: Rigor + Relevance | Catastrophes, Conspiracies, and Subexponential Distributions (Part III)

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s