In part I of this post, I described the conspiracy and catastrophe principles informally. However, as I mentioned, these principles can be made rigorous, and can serve as powerful analytic tools when studying heavy-tailed and light-tailed distributions.

It is important to note that there is not really one catastrophe principle and one conspiracy principle. Instead, there are many variations of these principles that can be defined and used, each with varying strengths and generality. In this post, I’ll introduce the simplest statements of each in order to highlight how these properties can be formalized. You can see the book for other versions…

** A catastrophe principle **

The idea behind the catastrophe principle is that an unexpectedly large sum of random variables is likely the result of a catastrophe, i.e, a result of one unexpectedly large event (sample). Said differently, this means that the tail of the sum of random variables is *on the same order as* the tail of the maximum element in the sum, which leads immediately to the following formalization.

Definition.A distribution over the non-negative reals is said to satisfy the catastrophe principle if, for all independent random variables with distribution ,

The catastrophe principle is a particularly surprising phenomenon since, a priori, there are many ways things that could have led to the sum being large — every sample could have been slightly bigger than average, a few samples could have all been fairly large, etc. However, the catastrophe principle gives us that the sum is large because of exactly one very large sample. Note that, assuming that , the catastrophe principle has another very intuitive form

This highlights that, if the catastrophe principle holds, then the maximum of samples very likely to be bigger than given that the sum of samples is bigger than .

The definition of the catastrophe principle is simple and general enough that it is satisfied by almost all common heavy-tailed distributions. For example, the Pareto distribution satisfies the catastrophe principle, as does the Weibull (with ) and the LogNormal.

In fact, the notion of the catastrophe principle is important enough that there is a formal class of heavy-tailed distributions defined around it — the class of subexponential distributions — and I’ll probably discuss this class (and its properties) in a later post.

Though most common heavy-tailed distributions satisfy the catastrophe principle, not all heavy-tailed distributions do. Thus, one needs to be careful to separate the catastrophe principle *property* from notion heavy-tailed distributions as a whole. In particular, it is also not too difficult to construct heavy-tailed distributions that do not satisfy the catastrophe principle…but I’ll leave that as an exercise.

** A conspiracy principle **

In contrast to the catastrophe principle, the conspiracy principle says that an unexpectedly large sum of random variables is likely the result of a conspiracy, i.e., a result of multiple larger-than-average events (samples). Said differently, this means that the tail of the sum of random variables *dominates* the tail of the maximum element in the sum, which leads immediately to the following formalization.

Definition.A distribution over the non-negative reals is said to satisfy the conspiracy principle if, for all independent random variables with distribution

The above definition of a conspiracy principle is simple and broad enough that it is typically easy to show that it is satisfied by common light-tailed distributions. For example, the Exponential distribution, the Normal distribution, and the Weibull distribution () can easily be shown to satisfy this conspiracy principle.

The Weibull distribution is a nice example to use to contrast the conspiracy principle with the catastrophe principle, since depending on , it can satisfy either. It also highlights that much stronger versions of the conspiracy principle are often possible. For example, the following result shows that, if a sum of two light-tailed Weibull random variables is unexpectedly large, then it is most likely that each of the random variables contributes equally to the sum; i.e., they have “conspired” to make the sum large.

Proposition.Suppose and are independent and identically distributed Weibull random variables with shape parameter Then, for any

A final remark about the conspiracy principle is that, though most common light-tailed distributions satisfy the conspiracy principle, not all light-tailed distributions do (as is the case for the catastrophe principle and heavy-tailed distributions). For example, it is not hard to find light-tailed distributions that do not satisfy the conspiracy principle by mixing heavy-tailed and light-tailed distributions, e.g., if , where and , then is light-tailed and does not satisfy the conspiracy principle.

Pingback: Means and Markov’s Inequality | Eventually Almost Everywhere

Adam,

Nice explanation. It strikes me that catastrophe versus conspiracy is really a design issue. A system designed from only fail-stop, centralized single points of failure would exhibit catastrophic failures. Distributed peer-to-peer models will suffer from conspiracies. Perhaps, empirical studies on outliers should include discussion on whether the system design favors catastrophes or conspiracies.

Nice point Chris. Yes, heavy-tail/light-tail and conspiracy/catastrophe can definitely be translated into design issues. Predrag Jelenkovic et al. have some nice papers recently that highlight that in the context of network protocols — they explain when retransmissions, etc, can lead to conspiracy vs. catastrophe. There’s not a good theory about this in general yet though!

Pingback: Rigor + Relevance | Catastrophes, Conspiracies, and Subexponential Distributions (Part III)