Privacy Workshop Summary

Two weeks ago, on April 24, thirty-some people gathered at Caltech for a workshop on privacy. At least from my (biased) perspective, it was a great time!

The basic format was that each speaker was paired with a discussant who doesn’t work on privacy, but who could provide motivation, discussion, context, or critique for the talk.

Moritz Hardt gave a talk on privacy + adaptive data analysis, and he shared the session with Antonio Rangel, a Caltech neuroeconomist. They held an exciting and informative conversation on the the risks of false discovery in data analysis, and the potential of new, differential privacy-based tools to mitigate those risks.

Kobbi Nissim gave a talk on privacy + learning, and he shared the session with Pietro Perona, a Caltech electrical engineer who studies vision. One interesting thing that has come out of recent work on private learning is that the central challenge for many tasks is to privately learn the underlying “scale’’ of the data, and there has been interesting progress on private scale-finding in the past few years.

Aaron Roth spoke on privacy as a toolkit for mechanism design in large games. He shared the session with Leeat Yariv, a Caltech economist. Together, they helped us understand the potential of the privacy toolkit to provide robustness guarantees that could be appealing in many settings where individuals don’t have huge influence on others (e.g., traffic, large matchings).
The program is available here:

and there are links with slides and video!

On Saturday morning after the workshop, we went for a “privacy hike” on nearby Verdugo Mountain. The signs promised mountain lions, but the lions decided to keep to themselves.
Privacy HikeThanks, everyone, for a fun workshop!

Some thoughts on broad privacy research strategy

Let me begin by saying where I think the interesting privacy research question does not lie. The interesting question is not how do people and organizations currently behave with respect to private information. Current behaviors are a reflection of culture, legislation, and policy, and all of these have proven themselves to be quite malleable, in our current environment. So the interesting question when it comes to private information is—how could and should people and organizations behave, and what options could or should they even have? This is a fundamental and part-normative question, and one that we cannot address without a substantial research effort. Despite being part-normative, this question can be useful in suggesting directions for even quite mathematical and applied research.

The first thing I’d like to ask is, What do we need to understand better in order to decide how to address this question? I see three relevant types of research that are largely missing:
1. We need a better understanding of the utility and harm that individuals, organizations, and society can potentially incur from the use of potentially sensitive data.
2. We need a better understanding of what the options for behavior could look like—which means we need to be open to a complete reinvention of the means by which we store, share, buy, sell, track, compute on, and draw conclusions from potentially sensitive data. Thus, we need a research agenda that helps us understand the realm of possibilities, and the consequences such possibilities would have.
3. It is, of course, important to remember the cultural, legislative, and policy context. It’s not enough to understand what people want and what is feasible. If we care about actual implementation, we must consider this broader context.

The first two of these points can and must be addressed with mathematical rigor, incorporating the perspectives of a wide variety of disciplines. Mathematical rigor is essential for a number of reasons, but the clearest one is that privacy is not an area where we can afford to deploy heuristic solutions and then cross our fingers. While inaccurate computations can later be redone for higher accuracy, and slow systems can later be optimized for better performance, privacy, once lost, cannot be “taken back.”

The second point offers the widest and richest array of research challenges. The primary work to address them will involve the development of new theoretical foundations for the technologies that would support these various interactions on potentially sensitive data.

For concreteness, let me give a few example research questions that fall under the umbrella of this second point:
1. What must be revealed about an individual’s medical data in order for her to benefit from and contribute to advances in medicine? How can we optimize the tradeoff of these benefits against potential privacy losses and help individuals make the relevant decisions?
2. When an offer of insurance is based on an individual’s history, how can this be made transparent to the individual? Would such transparency introduce incentives to “game” the system by withholding information, changing behaviors, or fabricating one’s history? What would be the impact of such incentives for misbehavior, and how should we deal with them?
3. How could we track the flow of “value” and “harm” through systems that transport large amounts of personal data (for example, the system of companies that buy and sell information on individuals’ online behavior)? How does this suggest that such systems might be redesigned?

Finding Any Nontrivial Coarse Correlated Equilibrium Is Hard

One of the things one might hope to get from an equilibrium concept in a game is that it might plausibly describe or predict behavior in the real world. That’s one reason negative results suggesting that certain equilibria can’t be computed efficiently are so interesting—they call those equilibrium concepts into question. Of course, there’s much more to it than that, but the existence of efficient algorithms (or, even better, simple, distributed dynamics) for computing equilibria is a big selling point for a solution concept.

In that sense, correlated equilibria (CE) and coarse correlated equilibria (CCE) are much more promising than Nash equilibria (NE). Even in games with many players, there exist a number of natural dynamics that quickly converge to these solution concepts. In particular, these dynamics induce efficient computation of approximate CE and CCE in multiplayer games; by contrast, computation of approximate Nash equilibria is computationally hard in multiplayer games.

If you aren’t so familiar with CEs and CCEs, you can think of them as very similar to Nash equilibria, in that they also consist of a distribution over players’ action profiles at which no player can benefit by unilateral deviation, and hence represent stable choices of distributions over player actions. The big difference is that, whereas a Nash equilibrium is defined to be a product of independent distributions over actions (one for each player); correlated and coarse correlated equilibria are general (joint) probability distributions—CEs and CCEs allow players to correlate their choices of actions. For the purposes of this blog post, let’s not worry too much about the distinction between CEs and CCEs. Notice that CE and CCE are more general than NE—every NE is a CE (CCE), but the converse is not true. So it makes sense that one might be able to compute a CE efficiently, even in settings where computing a NE is hard.

Beyond computation of equilibria, another significant thread of research in algorithmic game theory has been the study of the quality of equilibria, often as measured by the social welfare of the equilibrium or its ratio to the social welfare of the socially optimal outcome (c.f. the extensive literature on the price of anarchy (PoA)). Given that we know it is possible to efficiently compute CE and CCE, it is natural to ask how good are the equilibria we can efficiently compute? For example, do existing efficient dynamics find the best such equilibria, or at least ones that approximately optimize the social welfare? Since the gap between the worst and the best equilibria (CE or CCE), in terms of social welfare, can be large in natural games, it is interesting to understand if there exist efficient dynamics or algorithms that avoid—at least to some extent—the bad outcomes.

In their notable work, Papadimitriou and Roughgarden [PR] show that determining a socially optimal CE is NP-hard, in a number of succinct multiplayer games*. (What’s socially optimal? The equilibrium that maximizes the total (or average) welfare of the players.) This result intuitively follows from the fact that determining an action profile with maximum welfare—i.e., solving the problem of welfare optimization even without equilibrium constraints—is NP-hard in general. The hardness result of [PR]  leaves open the question of computing near-optimal CE/CCE, i.e., whether there exist efficient algorithms that compute CE/CCE with welfare at least, say, alpha times the optimal, for a nontrivial approximation ratio alpha < 1.

In recent work with Sid Barman, a postdoc at Caltech, we consider exactly this question. We establish that, unless P=NP, there does not exist any efficient algorithm that computes a CCE with welfare better than the worst possible CCE, in succinct multiplayer games*. We also establish similar hardness results for computing equilibria under the egalitarian objective or Pareto-optimality.

Analogous hardness results hold for CE. A classical interpretation of a CE is in terms of a mediator who has access to the players’ payoff functions and who draws outcomes from a correlated equilibrium’s joint distribution over player actions and privately recommends the corresponding actions to each player. The equilibrium conditions ensure that no player can benefit in expectation by unilaterally deviating from the recommended actions. Therefore, the problem we study here is exactly the computational complexity of the problem that a mediator faces if she wishes to maximize social welfare.

We also extend the hardness result to approximate CE and CCE. Therefore, while one can efficiently compute an approximate CE/CCE in succinct multiplayer games, one cannot provide any nontrivial welfare guarantees for the resulting equilibrium (unless P=NP).

In addition, we show that this hardness result also holds specifically for potential games (generally considered to be a very tractable class of games), and persists even in settings where the gap between the best and worst equilibrium is large.

We also have some positive results, but I think the negative ones are the most interesting—this says that those appealing dynamics that compute CE/CCE can’t be guaranteed to be finding anything other than the worst equilibrium. To me, that’s some surprising bad news. One silver lining is that this provides new motivation for studying the price of anarchy (the quality of the worst equilibrium) for CE/CCE, since generally that’s the best thing we can hope to compute.

*Technical Aside (succinct games): In general multiplayer games the size of the normal form representation, $N$, is exponentially large in the number of players; one can compute a CE/CCE that optimizes a linear objective by solving a linear program of size polynomial in $N$, and hence the computational complexity of equilibrium computation is not interesting for general games. However, most games of interest—such as graphical games, polymatrix games, congestion games, local effect games, network design games, anonymous games, and scheduling games—admit a succinct representation (wherein the above-mentioned linear program can be exponentially large in the size of the representation), and hence it is such succinctly representable games that we (and previous works) study.

[PR] PAPADIMITRIOU, C. H. AND ROUGHGARDEN, T. 2008. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM) 55, 3, 14.

(Research) Heroes

Greetings from Manhattan, where I’m visiting for STOC and for the Women in Theory (WIT) workshop.

I attended the first WIT workshop as a PhD student, back in 2008. That workshop marked the moment when I first started to feel that I belonged in the CS theory community. I realized I was finally at the point where I could get a lot out of attending technical talks, and that I could even sometimes ask a good question. And there I was at the workshop, schmoozing with some of my biggest research heroes—Dorit Aharonov, Shuchi Chawla, Julia Chuzhoy, Irit Dinur, Cynthia Dwork, Joan Feigenbaum, Shafi Goldwasser, Tal Malkin, Eva Tardos, Tiffani Williams, Lisa Zhang—what a lineup! It was so inspiring to talk with the speakers at the workshop and to hear not only about their work, but about their lives, their families, and the challenges they’d faced in their careers. There was something very special about sitting in technical talks and looking around the room to see a sea of female faces, when I’d grown accustomed to being the only woman or one of only a few in the room. Then, to top it off, I had a conversation with Eva Tardos where she suggested coming to Cornell for a postdoc, which was my dream.

Continue reading

Simons Workshop on Big Data and Differential Privacy

I recently returned from a workshop on Big Data and Differential Privacy, hosted by the Simons Institute for the Theory of Computing, at Berkeley.

Differential privacy is a rigorous notion of database privacy intended to give meaningful guarantees to individuals whose personal data are used in computations, where “computations” is quite broadly understood—statistical analyses, model fitting, policy decisions, release of “anonymized” datasets,…

Privacy is easy to get wrong, even when data-use decisions are being made by well-intentioned, smart people. There are just so many subtleties, and it is impossible to fully anticipate the range of attacks and outside information an adversary might use to compromise the information you choose to publish. Thus, much of the power of differential privacy comes from the fact that it gives guarantees that hold up without making any assumptions about the attacks the adversary might use, her computational power, or any outside information she might acquire. It also has elegant composition properties (helping us understand how privacy losses accumulate over multiple computations).

Continue reading