This is the second series of posts I’m writing on topics related to what we are covering in our book on heavy-tails (which I discussed in an earlier post). The first was on the catastrophe principle (subexponential distributions) and now we move to one of the most commonly discussed aspects of heavy-tailed distributions: power laws and scale invariance.

**Scale invariance in our daily lives**

In our daily lives, many things that we come across have a typical size, or “scale,” that we associate with them. For example, the ratio of the maximum to minimum heights and weights that we see in a given day is usually less than 3, so none deviates too much from the population average. In contrast, the ratio of the maximum to minimum income of people we see in a particular day may often be 100 or more! This contrast is a consequence of the fact that light-tailed distributions, such as heights and weights, tend to have a “typical scale,” while many heavy-tailed distributions, such as incomes, are “scale invariant,” i.e., regardless of the scale on which you look at them, they look the same.

Upon first encounter, scale invariance is a particularly mysterious aspect of heavy-tailed distributions, since it is natural to think of the average of a distribution as a good predictor of what samples will occur. The fact that this is no longer true for scale invariant distributions leads to counter-intuitive properties. For example, consider the old economics joke: “If Bill Gates walks into a bar, on average, everybody in the bar is a millionaire.”

Though initially mysterious, scale invariance is a beautiful and widely-observed phenomenon that has received attention broadly beyond mathematics and statistics, e.g., in physics, computer

science, and economics.

For example, fractals give a beautiful view of scale invariance, but more concretely, it is an important phenomena in both classical and quantum field theory, as well as statistical mechanics. In fact, it is closely tied to the notion of “universality” in physics, which relates to the fact that widely different systems can be described by the same underlying theory.

In the context of network science, scale invariance has received considerable attention. Widely varying networks have been found to have scale invariant degree distributions (and are thus termed “scale-free networks”), and this observation has had dramatic impacts for our understanding of the structural properties of networks. So, clearly, scale invariance is a broad area, but in these posts, we’ll just focus on scale invariance in the context of probability and statistics.

In particular, in this set of posts, I want to talk about the property of “scale invariance” and its connections with “power law” distributions, a.k.a., Pareto distributions. Note that both “scale invariance” and “power law” are often used synonymously with “heavy-tailed,” and thus, it is important to start by pointing out that not all heavy-tailed distributions are scale invariant or power law (though all scale invariant distributions are heavy-tailed, as are all power law distributions).

The main goal of this set of posts is to describe how to generalize and formalize the notions of scale-invariance and power-law as a class of heavy-tailed distributions termed “regularly varying distributions” that is particularly appealing from a mathematical perspective. Further, in order to illustrate the usefulness of this class, I’ll try to highlight a variety of properties and examples of the class.

** Scale invariance and power laws **

To this point, I have only briefly introduced scale-invariance informally as the property that the distribution looks the “same” regardless of the scale on which it is looked at. A more careful way to say this is that, if the scale (or units) with which the samples from the distribution are measured is changed, then the shape of the distribution is basically unchanged. This is formalized by the following definition.

Definition 1A distribution function is scale-invariant if there exists an and a continuous function such thatfor all satisfying .

To interpret the definition of scale-invariant, one can think of as the “change of scale” for the units being used. With this interpretation, the definition says that the shape of the distribution remains unchanged, up to a multiplicative factor if the measurements are scaled by .

Scale-invariance is a very elegant property, but it is also a fragile one. In particular, it does not hold for most probability distributions, e.g., it holds for the Pareto distribution, but does not hold for the Exponential distribution.

To see that the Pareto is scale-invariant, recall that a Pareto distribution has for . Thus,

whenever It is also easy to see that the Exponential distribution is not scale-invariant. Recall that an Exponential distribution has for . Therefore,

Thus, there is not a choice for that is independent of .

The previous examples highlight that scale-invariance is an elegant property. But, perhaps surprisingly, it turns out that it is extremely special: distributions with “power-law tails,” i.e., tails that match the Pareto distribution up to a multiplicative constant, are the *only* scale-invariant distributions. That is, “scale-invariance” can be thought of interchangeably with “power-law.”

Theorem 2A distribution function is scale-invariant if and only if has a power-law tail, i.e., there exists , , and such that for .

This may seem surprising at first, but the proof below highlights the reason for the equivalence pretty clearly.

*Proof:* Note that the case where is identically zero over trivially satisfies the conditions of the lemma (this corresponds to the case )

Excluding the above trivial case from consideration hereon, it is easy to see that must be non-zero for all Indeed, if for some then for any

Fix We may then pick large enough such that From the scale-free property of Of course, we may also write Since we conclude that the function satisfies the following property.

It is well known that the only continuous non-zero functions satisfying the above condition are for some . Noting that for all we conclude that (since must be monotonically decreasing, with ). Therefore, for for some

We have just seen that all scale-invariant distributions are power-law distributions, a.k.a. distributions with tails matching a Pareto distribution up to a multiplicative constant. This makes scale-invariance a very fragile property that one should not expect to see in reality and, in the strictest sense, that is true. It is quite unusual for the distribution of an observed phenomenon to *exactly* match a power-law distribution, and thus be scale-invariant. Instead, what tends to be observed in practice is that the body of a distribution is not scale invariant, and the tail of a distribution is only *approximately* scale-invariant.

In the next post in this series, I’ll talk about how to formalize a notion of approximate scale invariance.

Pingback: Rigor + Relevance | Scale Invariance, Power Laws, and Regular Variation (Part II)

Pingback: Rigor + Relevance | Scale Invariance, Power Laws, and Regular Variation (Part III)

Pingback: Rigor + Relevance | Residual lives, Hazard rates, and Long tails (Part I)

Thank you for a good introduction to the concept of scale invariance. However,

the way the material is presented might lead to some ambiguity. My concern is

about the statement regarding the change of scale (or units) of the measured

samples. It is very common practice in science and engineering to work with

dimensionless quantities. This is also true in the case of densities. So for

example, the simplest of all lifetime distribution models is the exponential

distribution, $f\left( t\right) =%

%TCIMACRO{\U{b5}}%

%BeginExpansion

\mu

%EndExpansion

e^{-%

%TCIMACRO{\U{b5}}%

%BeginExpansion

\mu

%EndExpansion

t}$ for $t>0.$ In this case $%

%TCIMACRO{\U{b5}}%

%BeginExpansion

\mu

%EndExpansion

^{-1}$ is the expected lifetime. Note that the exponent of the exponential

function is always dimensionless. If the expected lifetime $%

%TCIMACRO{\U{b5}}%

%BeginExpansion

\mu

%EndExpansion

$ is given in seconds then $t$ must also be in seconds. In this respect, the

change of unites only affects the normalization constant. In effect, using the

change of variables formula for densities, it can be shown that in general for

any density%

\[

f\left( \lambda x\right) =\lambda^{-1}f\left( x\right)

\]

for all $x,\lambda$ satisfying $x,\lambda x\geq x_{0}.$ Thus following your

definition of scale invariance, all densities are scale invariant. It is not a

surprising result. Since in decision making, the outcome should be independent

of the choice of coordinate system and hence scale that one is measuring the samples.

I am sorry, the equations are written in Latex and seems to me that this site does not support that.

Thank you for a good introduction to the concept of scale invariance. However, the way the material is presented might lead to some ambiguity. My concern is about the statement regarding the change of scale (or units) of the measured samples. It is very common practice in science and engineering to work with dimensionless quantities. This is also true in the case of densities. So for example, the simplest of all lifetime distribution models is the exponential distribution, f(t)= µ exp(-µt) for t>0. In this case 1/µ is the expected lifetime. Note that the exponent of the exponential function is always dimensionless. If the expected lifetime µ is given in seconds then t must also be in seconds. In this respect, the change of unites only affects the normalization constant. In effect, using the change of variables formula for densities, it can be shown that in general for any density λ f(λx)= f(x) for all x, λ satisfying x,λx≥x0.

Thus following your definition of scale invariance, all densities are scale invariant. It is not a surprising result. Since in decision-making, the outcome should be independent of the choice of coordinate system and hence scale that one is measuring the samples.

This is Jayakrishnan Nair, one of Adam’s co-authors on the book on heavy tails.

It is true that given any density f(x), λ f(λx) is also a density function. However, it is not true in general that λ f(λx)= f(x) for large enough x. You can easily check that the exponential/Gaussian density does not satisfy this property. In fact, it can be proved that only power-law densities can satisfy this condition.

In fact, the the property λ f(λx)= f(x) is very special — it says that even if you view the density with a different scale/unit, the plot looks exactly the same, except for a multiplicative factor.

I am not sure that I correctly understand your arguments. I understand the mathematics in the above text, which is presented in a quite clear and simple manner. However, my objection is on some of the phrases used in interpretation of the result. If we change the units of measurement, for example use seconds instead of hours in the lifetime measurement, we get a different distribution. This new distribution is related to the previous one by the rule of change of variables for densities. Accordingly, for any distribution, λ f(λx)= f(x). However, the mathematics of the above note is not about this. It is about looking at the same distribution with different magnifying glass. I think the phrase self-similar is more correct. This means that if one has description of a neighbourhood of a point, it is sufficient in order to be able to describe the neighbourhoods of any other points of the distribution. In short, looking at a self-similar density with different magnifying glass or scale is not the same as changing the units of the measurement. Since the later will result in a different density, which can be determined by the rule of change of variables.

Sorry for my delayed reply…and thanks JK for answering in the meantime.

The english surrounding concepts like self similar and scale invariant is always fuzzy and confusing, so let’s just try to agree on the math. It seems that we don’t have confusion about the definition I have used for scale invariant. It’s a standard notion (see wikipedia for another discussion https://en.wikipedia.org/wiki/Scale_invariance). As you’ll see in the wikipedia article the typical notion of scale-invariance is a stricter notion than self similarity, which only imposes the the function scale invariant for a particular set of scales, i.e., only a discrete set of \lambdas in the definition I give in this post.

If those two definitions are clear for you, and it’s clear that only very special f() satisfy them, then it’s just the verbiage that was confusing. The verbiage is just meant to highlight that one interpretation of the math in the definition of self-similarity is that the form of f() does not change if a rescaling of \lambda is applied. I tend to think of this visually. When looking at a plot of f(), a linear scaling of the axes does not change the visible shape. Hopefully that is more clear? If not, maybe the wikipedia article will help.

Thank you for the clarification. I tend to visualise the scale invariance in the same manner. But in the physics literature the fuzziness of the term “scale-invariance” has caused a lot of confusion among people. In order to avoid confusion, I think one should be careful in talking about “measurement units”. Although, in mathematical sense, it does not do any harm but it makes things look quite different for people outside the field of mathematics.