I recently returned from a workshop on Big Data and Differential Privacy, hosted by the Simons Institute for the Theory of Computing, at Berkeley.
Differential privacy is a rigorous notion of database privacy intended to give meaningful guarantees to individuals whose personal data are used in computations, where “computations” is quite broadly understood—statistical analyses, model fitting, policy decisions, release of “anonymized” datasets,…
Privacy is easy to get wrong, even when data-use decisions are being made by well-intentioned, smart people. There are just so many subtleties, and it is impossible to fully anticipate the range of attacks and outside information an adversary might use to compromise the information you choose to publish. Thus, much of the power of differential privacy comes from the fact that it gives guarantees that hold up without making any assumptions about the attacks the adversary might use, her computational power, or any outside information she might acquire. It also has elegant composition properties (helping us understand how privacy losses accumulate over multiple computations).
In part I of this post, I described the conspiracy and catastrophe principles informally. However, as I mentioned, these principles can be made rigorous, and can serve as powerful analytic tools when studying heavy-tailed and light-tailed distributions.
It is important to note that there is not really one catastrophe principle and one conspiracy principle. Instead, there are many variations of these principles that can be defined and used, each with varying strengths and generality. In this post, I’ll introduce the simplest statements of each in order to highlight how these properties can be formalized. You can see the book for other versions…
James Hamilton has had two interesting recent posts about local renewable generation for data centers that are definitely worth a read for folks interested in “sustainable data centers”: Solar at Scale and Datacenter Renewable Power Done Right.
Sustainable data centers
It’s always interesting to hear perspectives on “sustainable data centers” from industry, because there is a big diversity still in how companies are moving to make their data centers sustainable. Some companies are going with a “local” approach, where renewable generation (in a variety of forms) is integrated on-site, while others are going with a more global approach, where renewables are placed somewhere else on the grid (often nearby, but not always). An example of the former is Apple and an example of the latter is Google.
As Steven said in his recent posts — smart grid and energy are certainly in style right now. In our group at Caltech, a large fraction of the students are working on something related to energy, whether it be optimal power flow, demand response, sustainable data center design, or electricity markets. There are lots of important, challenging problems in all of these areas…
But, a challenge for people doing work in this area is publishing: The publishing style of the traditional power systems community is very different from that of CS or OR, and people from all of these areas are starting to mix and mingle. As a result of the mixture of areas and the surge of interest, lots of new publication venues are emerging, and it’s sometimes hard to tell where one should send their work so that it gets attention/recognition/etc. To that end, I wanted to use this post to plug one venue that is emerging as a strong outlet for work in this area: ACM e-Energy. (This year’s CFP has been floating around recently, which is what prompted this post.)
This is the first of what will likely be many posts related to topics we are covering in our book on heavy-tails (which I discussed in an earlier post).
I figure I’ll start with a topic on heavy-tails that is near and dear to my heart — the catastrophe principle. This is, in my mind, a crucial and defining property of heavy-tailed distributions that far too many people aren’t aware of…
A thought experiment
Suppose you are in a class with 50 other students, and the professor does an experiment. She records the heights and the number of twitter followers of every student in the class. Interestingly, it turns out that both the sum of the heights and the sum of the numbers of followers are unexpectedly large, meaning that they are significantly larger than they would have been if each person had the average height and number of followers. The question the professor then asks the class is “What led to the unexpectedly large sums?”
I’m happy to announce that Caltech will be looking for faculty applicants in Electrical Engineering this year. Go here to see the official ad and to apply. For various reasons the announcement is going up late, so I hope that this will help to spread the word quickly…
The search will be looking for strong applicants from anywhere in EE systems, since Caltech likes to hire based on impressive and high-impact work, rather than a preconceived notion of what area is interesting at the moment. But, I think it’s fair to say we have a particular need on the networking / wireless side of things and the machine learning side of things. Also, there is a growing interest in power systems and energy issues. So, if you’re from any of those areas and you’re on the fence about applying, please do! But, the search is certainly not limited to those areas. Quoting from the ad: “research areas of interest include, but are not restricted to, signal processing, communications and information theory, control systems, networks, optimization, machine learning, large data systems, power systems, robotics and autonomous systems and cyber-physical systems.”
Also, if you have a degree from a department other than EE, don’t let that stop you from applying — Caltech has a very fuzzy interpretation of the boundaries between disciplines. While I’m not officially in EE, there are strong ties between EE systems and CS, as well as between EE systems and Applied & Computational Math (ACM) and Control and Dynamical Systems (CDS). Actually, the first two students that I graduated were EE students co-advised with EE faculty, and about half of the students in our RSRG group today are EE students. So, this is the next best thing to CS having a search, which unfortunately, we don’t this year (grumble, grumble).
In part I of this post, we have seen how a layered architecture has transformed the communication network. What is so difficult about a layered architecture for the power network? Let’s again look first at its role in the communication network.
DARPA started a packet network in 1969 with four nodes at UCLA, UCSB, SRI (Stanford Research Institute), the University of Utah, that grew into today’s Internet. Early to mid 1990s was when the world at large discovered the Internet. The release of the Mosaic browser in 1993 by the National Center of Supercomputing Applications of the University of Illinois, Urbana-Champaign, has probably played the most visible role in triggering this transition. But 1990s was also the time when multiple technologies and infrastructures have come together to ready the Internet for prime time. What exactly was the role of layering?