Privacy as plausible deniability?

As I was flying to the NSDI PC meeting this week I was catching up on reading and came across an article on privacy in the Atlantic that (to my surprise) pushed nearly the same perspective on privacy that we studied in a paper a year or so ago… Privacy as plausable deniability.

The idea is that hacks, breaches, monitoring behavior, etc. are so common and hard to avoid that relying on tools from crypto or differential privacy isn’t really enough.  Instead, if someone really cares about privacy they probably need to take that into account in their actions.  For example, you can assume that google/facebook/etc. are observing your behavior online and that this is impacting prices, advertisements, etc. Tools from privacy, encryption, etc. can’t really help with this.  However, tools that add “fake” traffic can.  If an observer knows that you are using such a tool then you always have plausible deniability about any observed behavior, and if these are chosen carefully, then they can counter the impact of personalized ads, pricing, etc.  There are now companies such as “Plausible Deniability LLC” that do exactly this!

On the research front, we looked at this in the context of the following question: If a consumer knows that their behavior is being observed and cares about privacy, can the observer infer the true preferences of the consumer?  Our work gives a resounding “no”.  Using tools from revealed preference theory, we show that the observer not only cannot learn, but that every set of observed choices can be “explained” as consistent with any underlying utility function from the consumer.  Thus, the consumer can always maintain plausible deniability.

If you want to see the details, check it out here!   And, note that the lead author (Rachel Cummings) is on the job market this year!

P.S. The NSDI PC meeting was really stimulating!  It’s been a while since I had the pleasure of being on a “pure systems” PC, and it was great to see quite a few rigorous/mathematical papers be discussed and valued.  Also, it was quite impressive to see how fair and thorough the discussions were.  Congrats to Aditya and Jon on running a great meeting!

Some thoughts on broad privacy research strategy

Let me begin by saying where I think the interesting privacy research question does not lie. The interesting question is not how do people and organizations currently behave with respect to private information. Current behaviors are a reflection of culture, legislation, and policy, and all of these have proven themselves to be quite malleable, in our current environment. So the interesting question when it comes to private information is—how could and should people and organizations behave, and what options could or should they even have? This is a fundamental and part-normative question, and one that we cannot address without a substantial research effort. Despite being part-normative, this question can be useful in suggesting directions for even quite mathematical and applied research.

The first thing I’d like to ask is, What do we need to understand better in order to decide how to address this question? I see three relevant types of research that are largely missing:
1. We need a better understanding of the utility and harm that individuals, organizations, and society can potentially incur from the use of potentially sensitive data.
2. We need a better understanding of what the options for behavior could look like—which means we need to be open to a complete reinvention of the means by which we store, share, buy, sell, track, compute on, and draw conclusions from potentially sensitive data. Thus, we need a research agenda that helps us understand the realm of possibilities, and the consequences such possibilities would have.
3. It is, of course, important to remember the cultural, legislative, and policy context. It’s not enough to understand what people want and what is feasible. If we care about actual implementation, we must consider this broader context.

The first two of these points can and must be addressed with mathematical rigor, incorporating the perspectives of a wide variety of disciplines. Mathematical rigor is essential for a number of reasons, but the clearest one is that privacy is not an area where we can afford to deploy heuristic solutions and then cross our fingers. While inaccurate computations can later be redone for higher accuracy, and slow systems can later be optimized for better performance, privacy, once lost, cannot be “taken back.”

The second point offers the widest and richest array of research challenges. The primary work to address them will involve the development of new theoretical foundations for the technologies that would support these various interactions on potentially sensitive data.

For concreteness, let me give a few example research questions that fall under the umbrella of this second point:
1. What must be revealed about an individual’s medical data in order for her to benefit from and contribute to advances in medicine? How can we optimize the tradeoff of these benefits against potential privacy losses and help individuals make the relevant decisions?
2. When an offer of insurance is based on an individual’s history, how can this be made transparent to the individual? Would such transparency introduce incentives to “game” the system by withholding information, changing behaviors, or fabricating one’s history? What would be the impact of such incentives for misbehavior, and how should we deal with them?
3. How could we track the flow of “value” and “harm” through systems that transport large amounts of personal data (for example, the system of companies that buy and sell information on individuals’ online behavior)? How does this suggest that such systems might be redesigned?

Data, Privacy, and Markets

We’ve posted in the past about some of the work going on in our group related to privacy, and of course there are always lots of news articles popping up.  But, today I came across a recent animation by Jorge Cham of PhD comics fame that does a very nice job of summarizing one of the interesting directions these days — managing the interaction of personal data with data marketplaces.

Though I haven’t posted about it here yet, this is one of the new directions RSRG is moving in — how does one design a data marketplace that allows the transition from data as a commodity to data as a service?  We have already seen computing go from commodity to service with the emergence of cloud infrastructure providers like Amazon EC2 and Microsoft Azure, and I think it won’t be long until data makes the same transition.  But, in setting up these data marketplaces, how does one manage issues such as privacy?  and how does one place a value on pieces of data, which have many different uses?

In any case, enjoy the animation!

A report from (two days of) Sigmetrics

Well, June is conferences season for me, so despite a new baby at home I went off on another trip this week — sorry honey! This time it was ACM Sigmetrics in Austin, where I helped to organize the GreenMetrics workshop, and then presented one of our group’s three papers on the first day of the main conference.

Continue reading

A report from (one day of) EC

This past week, a large part of our group attended ACM EC up in Palo Alto.  EC is the top Algorithmic Game Theory conference, and has been getting stronger and stronger each year.  I was on the PC this year, and I definitely saw very strong papers not making the cut (to my dismay)… In fact, one of the big discussions at the business meeting of the conference was how to handle the growth of the community.

Finding about about the increasingly difficult acceptance standards, I was even happier that our group was so well-represented.  We had four papers on a variety of topics, from privacy to scheduling to equilibrium computation.  I’ll give them a little plug here before talking about some of my highlights from the conference…

Continue reading

Simons Workshop on Big Data and Differential Privacy

I recently returned from a workshop on Big Data and Differential Privacy, hosted by the Simons Institute for the Theory of Computing, at Berkeley.

Differential privacy is a rigorous notion of database privacy intended to give meaningful guarantees to individuals whose personal data are used in computations, where “computations” is quite broadly understood—statistical analyses, model fitting, policy decisions, release of “anonymized” datasets,…

Privacy is easy to get wrong, even when data-use decisions are being made by well-intentioned, smart people. There are just so many subtleties, and it is impossible to fully anticipate the range of attacks and outside information an adversary might use to compromise the information you choose to publish. Thus, much of the power of differential privacy comes from the fact that it gives guarantees that hold up without making any assumptions about the attacks the adversary might use, her computational power, or any outside information she might acquire. It also has elegant composition properties (helping us understand how privacy losses accumulate over multiple computations).

Continue reading