For the past year, Bert Zwart, JK Nair and I have been working on a new book on heavy-tails. The working title is “The fundamentals of heavy-tails: Properties, Emergence, and Identification.”
I’ll probably make a handful of posts about the book over the coming months, both on the topics we’re including and the process of writing, but I figure I’ll start by just giving an overview about why we’re even bothering to write such a book?
Why write a book on heavy-tails?
The short answer is that writing the book was my present to myself for getting tenure!
But, still you may wonder about the topic, since within CS, heavy-tails are starting to feel like an “old” area… They were “discovered” (at least within CS) 15+ years ago at this point, and in the meantime, they’ve been found in nearly all aspects of computer systems and networks. They seemingly pop up everywhere — from degree distributions in the internet and social networks to file sizes and inter-arrival times of workloads. But, even with 15+ years of work in the field, they are still treated as mysterious, surprising, and even controversial… And it is not just CS where this happens — across physics, geology, economics, ecology, biology, etc., the story is similar. Heavy-tails are a continual source of excitement, confusion, and controversy as they are repeatedly “discovered.” And, unfortunately, the confusion and controversy doesn’t go away for a long time after the discovery…
That is why we decided to write a book. Our feeling is that heavy-tails are not really surprising or mysterious, and that they shouldn’t cause confusion and controversy. These things are just a consequence of the fact that we all learn probability in the context of light-tailed distributions, and heavy-tailed distributions are simply different. (Often, they’re actually easier to reason about!)
So, the goal of the book is to demystify heavy-tailed distributions by showing how to reason formally about their seemingly counter-intuitive properties; to highlight that their emergence should be expected (not surprising) by showing that a wide variety of general processes lead to heavy-tailed distributions; and to show that most of the controversy surrounding heavy-tails is the result of bad statistics, and can be avoided by using the proper tools.
The challenge of writing such a book is that heavy-tailed distributions require covering mathematically deep concepts such as the generalized central limit theorem, extreme value theory, and regular variation. That is likely why all the books to this point on heavy-tails are either general audience pop science or grad-level mathematics books. However, we have spent a lot of time coming up with expositions of these topics that use only elementary mathematical tools in the hopes of making these topics accessible to anyone who has had an introductory probability course.
For more info on the topics we plan to cover, see my website.
At this point, we’ve been working on the book for about a year, and we’ve actually managed to get a bit more than 2/3 of the way through the writing. This was mainly a function of trying to write to keep up with a course I taught on the topic last spring…and since then, progress has been very slow. However, the hope at this point is to finish up the book this spring, at which point we’ll be looking for volunteer readers to give us feedback. Please send me a note if you’d be interesting in helping!
I’ll also likely post snippets from the book on this blog as we work to finish off things, so stay tuned…