Residual Lives, Hazard Rates, and Long Tails (Part II)

This is part II in a series on residual life, hazard rates, and long-tailed distributions. If you haven’t read part I yet, read that first! The previous post in this series highlighted that one must be careful in connecting “heavy-tailed” with the concepts of “increasing mean residual life” and “decreasing hazard rate.”

In particular, there are many examples of light-tailed distributions that are IMRL and DHR. However, if we think again about the informal examples that we discussed in the previous post, it becomes clear that IMRL and DHR are too “precise” to capture the phenomena that we were describing. For example, if we return to the case of waiting for a response to an email, it is not that we expect our remaining waiting time to be monotonically increasing as we wait. If fact, we are very likely to get a response quickly, so the expected waiting time should drop initially (and the hazard rate should increase initially). It is only after we have waited a “long” time already, in this case a few days, that we expect to see a dramatic increase in our residual life. Further, in the extreme, if we have not received a response in a month, we can reasonably expect that we may never receive a response, and so the mean residual life is, in some sense, growing unboundedly, or equivalently, the hazard rate is decreasing to zero. The example of waiting for a subway train highlights the same issues. Initially, we expect that the mean residual life should decrease, because if the train is on schedule, things are very predictable. However, once we have waited a long time beyond when the train was supposed to arrive, it likely means something went wrong, and could mean the train has had some sort of mechanical problem and will never arrive.

These examples highlight two important aspects that need to be captured in a formalization of this phenomena. First, they highlight that strict monotonicity of the residual life is not crucial (or desirable), and that we should instead focus on the behavior of the tail. And, second, they highlight that the phenomena we would like to capture include the fact that the residual life distribution “blows up” in the sense that if we have waited a very long time, we should expect to wait forever. Note that this property is true of heavy-tailed Weibull and Pareto distributions, but is not true of the light-tailed distributions that are a part of the IMRL and DHR classes. For example, the Hyperexponential distribution with rate parameters {\mu_1,\ldots,\mu_n} has a mean residual life that is upper bounded by {\max_i(1/\mu_i)}.

These two observations lead us to the following definition of the class of long-tailed distributions.

Definition 1 A distribution {F} over the nonnegative reals is said to be long-tailed, denoted by {F\in\mathcal{L}}, if

\displaystyle \lim_{x \rightarrow \infty} \bar{R}_x(t) = \lim_{x\rightarrow\infty} \frac{\bar{F}(x+t)}{\bar{F}(x)} = 1 \ \ \ \ \ (1)

for all {t > 0,} i.e., {\bar{F}(x+t) \sim \bar{F}(x)} as {x\rightarrow\infty}. A non-negative random variable is said to be long-tailed if its distribution function is long-tailed.

The definition of long-tailed distributions exactly parallels our discussion above. In particular, long-tailed distributions are those where the distribution of residual life “blows up,” i.e., for any finite {t}, the probability that the residual life is larger than {t} goes to 1 as {x\rightarrow\infty}. Naturally, this leads to the immediate consequence that the mean residual life grows unboundedly.

Theorem 2 Suppose that the distribution {F} is long-tailed. Then,

\displaystyle \lim_{x \rightarrow \infty} m(x) = \infty.

This result highlight that the definition of long-tailed is both “weaker” than IMRL/DHR in that it focuses only on the tail, and “stronger” then IMRL/DHR in that it requires the mean residual life to grow unboundedly.

Though the name “long-tailed” may initially seem strange, since it has no connection to the idea of residual life, it is actually natural to see from the definition why the name “long-tailed” is indeed appropriate. In particular, {\bar{F}(x+t) \sim \bar{F}(x)} as {x\rightarrow\infty} for all {t} means that the tail stretches out with seemingly no decay over any finite range {t}. Thus, the tail of the distribution is indeed quite long. As this suggests, long-tailed distributions are a subclass of heavy-tailed distributions.

Theorem 3 All long-tailed distributions are heavy-tailed.

The class of long-tailed distributions is an extremely broad subclass of heavy-tailed distributions. The class of long-tailed distributions contains the class of subexponential distributions, which in turn contains the class of regularly varying distributions. As a result, all common heavy-tailed distributions are long-tailed, e.g., the Pareto, the Weibull (with {\alpha<1}), the LogNormal, the Burr, etc.

Though the class of long-tailed distributions is quite broad, it turns out that there is a very clean representation theorem that characterizes the form of all long-tailed distributions.

Theorem 4 (Representation theorem) A random variable {X} with distribution function {F} is long-tailed if and only if {\bar{F}} can be represented as a monotonically decreasing function of the form

\displaystyle \bar{F}(x) = \bar{c}(x) e^{\int_0^x\ \bar{\beta}(t) dt}, \ \ \ \ \ (2)

where {\lim_{x \rightarrow \infty}\bar{c}(x) = c \in(0,\infty)} and {\lim_{x \rightarrow \infty} \bar{\beta}(x)=0.}

This representation theorem for long-tailed distributions is especially useful in many situations. As an example, it highlights very clearly the generality of the class of long-tailed distributions as compared to the class of regularly varying distributions. In particular, one can simply contrast the two representation theorems, which give

\displaystyle \text{Regularly varying with index } \rho: \bar{F}(x) = c(x) \text{exp}\left\{\int_1^x \frac{\beta(t)}{t} dt \right\}

\displaystyle \text{Long-tailed: } \bar{F}(x) = \bar{c}(x) e^{\int_0^x\ \bar{\beta}(t) dt},

where {\lim_{x \rightarrow \infty} c(x) = c \in (0,\infty)}, {\lim_{x \rightarrow \infty} \beta(x) = \rho}, {\lim_{x \rightarrow \infty}\bar{c}(x) = c \in(0,\infty)}, and {\lim_{x \rightarrow \infty} \bar{\beta}(x)=0.}

While the representation theorems above provide a clear contrast between long-tailed and regularly varying distributions, the contrast between subexponential and long-tailed distributions is less clear at this point, as is the contrast between long-tailed distributions and heavy-tailed distributions. However, it turns out that a clear contrast in each of these cases can be obtained using characterizations of long-tailed distributions in terms of the hazard rate.

In particular, to distinguish the long-tailed distributions from heavy-tailed distributions, we can rephrase the definition of heavy-tailed distributions in terms of the cumulative hazard as

\displaystyle \liminf_{x\rightarrow\infty} \frac{\log \bar{F}(x)}{x} = \liminf_{x\rightarrow\infty} \frac{Q(x)}{x} = 0

In contrast, for long-tailed distributions, we have the following:

Lemma 5 If {F \in \mathcal{L},} then {\lim_{x \rightarrow \infty} \frac{Q(x)}{x} = 0.}

Thus, in order to construct a heavy-tailed distribution that is not long-tailed, all that is necessary is to ensure that the limit of {\frac{Q(x)}{x}} does not exist as {x \rightarrow \infty,} but {\liminf_{x \rightarrow \infty} \frac{Q(x)}{x} = 0.}

An example: Random extrema

While the above is all well and good, it does not highlight when the class of long-tailed distributions is useful. However, since the class of long tailed distributions can be defined in terms of the limiting behavior of the hazard rate and the residual life, it is natural that it finds application most readily in the study of extremes, i.e., the maximum and minimum of a set of random variables. Of course, the study of extremes is crucial to a wide variety of areas such as statistics, when identifying outliers; risk management, when determining the likelihood of extreme event; and many more, including both applications in both the physical and social sciences.

In many such settings, the core of the analysis relies on understanding a very simple process — the maximum or minimum of a random number of independently and identically distributed random variables. For example, if one wants to predict the size of the maximal earthquake damage in the US in a particular year, then there is a random number of earthquakes in a year, each of which may be assumed to be independent and identically distributed. Of course, either (or both) the distribution of the number of events and the payout of each event could be heavy-tailed.

More formally, let us consider the following setting. Suppose {\{X_i\}_{i \geq 1}} is a sequence of independent and identically distributed random variables with mean {\mathop{\mathbb E}[X]} and the random variable {N} is positive, integer valued and is independent of {\{X_i\}_{i \geq 1}.} Our goal will be to characterize

\displaystyle M_N = \max(X_1,\ldots,X_N) \text{ and } m_N = \min(X_1,\ldots,X_N).

While you may not have studied the random extrema {M_N} and {m_N} before, you have almost certainly studied the version where the number of random variables is fixed at {n}, i.e., {M_n} and {m_n}. In particular, for each of these, it is simple to characterize the distribution function

\displaystyle \bar{F}_{m_n}(x)=\mathop{\mathbb P}\{X_1>x,\ldots,X_n>x\} = \bar{F}(x)^n

\displaystyle F_{M_n}(x) = \mathop{\mathbb P}\{X_i<x,\ldots,X_n<x\} = F(x)^n.

Using this, it is not difficult to see that the class of long-tailed distributions is well-behaved with respect to extrema. In particular, the class is closed with respect to max and min.

Lemma 6 For {n \geq 2,} suppose that {X_1,X_2,\cdots,X_n} are independent, long-tailed random variables. Then

  1. {\max(X_1,X_2,\cdots,X_n)} is long-tailed, and
  2. {\min(X_1,X_2,\cdots,X_n)} is long-tailed.

Given the above, one would expect the same to be true for random extrema. In fact, it is possible to say quite a bit more about the behavior of the tail of random extrema. In particular, it is possible to derive precise characterizations of the tail behavior, in quite general settings with only elementary analytic techniques.

Theorem 7 Consider an infinite i.i.d. sequence of random variables {X_1,X_2,\ldots}, and a random variable {N\in{\mathbb N}} that is independent of {\{X_i\}_{i \geq 1}} and has {\mathop{\mathbb E}[N]<\infty}. Define {n_0} such that {\mathop{\mathbb P}\{N\geq n_0\}=1} and {\mathop{\mathbb P}\{N=n_0\}>0}. Then,

\displaystyle \mathop{\mathbb P}\{\max(X_1,X_2,\cdots,X_N) > t\} \sim \mathop{\mathbb E}[N] \mathop{\mathbb P}\{X_1 > t\}


\displaystyle \mathop{\mathbb P}\{\min(X_1,X_2,\cdots,X_N) > t\} \sim \mathop{\mathbb P}\{N=n_0\} \mathop{\mathbb P}\{X_1 > t\}^{n_0}.

Thus, if {X_i} are long-tailed then both {\max(X_1,X_2,\cdots,X_N)} and {\min(X_1,X_2,\cdots,X_N)} are also long-tailed.

To get intuition for this theorem, an interesting special case to consider is when {N\sim\text{Geometric}(p)}. In this case, {\mathop{\mathbb P}\{N=i\}=p(1-p)^{i-1}} for {i\geq1} and {\mathop{\mathbb E}[N]=1/p}. So, in the context of Theorem 7 we have {n=1} and {p_n=p}, and thus we obtain

\displaystyle \mathop{\mathbb P}\{\max(X_1,X_2,\cdots,X_N) > t\} \sim \frac{1}{p} \mathop{\mathbb P}\{X_1 > t\}

\displaystyle \mathop{\mathbb P}\{\min(X_1,X_2,\cdots,X_N) > t\} \sim p \mathop{\mathbb P}\{X_1 > t\},

which gives

\displaystyle \lim_{t\rightarrow\infty} \frac{\mathop{\mathbb P}\{\max(X_1,X_2,\cdots,X_N) > t\}}{ \mathop{\mathbb P}\{\min(X_1,X_2,\cdots,X_N) > t\}} = \frac{1}{p^2} = \mathop{\mathbb E}[N]^2

It is also interesting to note that the form of Theorem 7 for the random maximum exactly parallels the form of the maximum of a deterministic number of samples, i.e., {N=n}. In particular, the tail of the max of {n} random variables satisfies

\displaystyle \mathop{\mathbb P}\{\max(X_1,\ldots,X_n)>t\} \sim n \mathop{\mathbb P}\{X_1>t\}. \ \ \ \ \ (3)

The parallel form of the above to that of Theorem 7 highlights that, with respect to the tail, we can basically ignore the fact that {N} is random when studying random maxima. Interestingly, this is similar to the insight provided by Wald’s equation for random sums.


Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s