Let me begin by saying where I think the interesting privacy research question does not lie. The interesting question is not how do people and organizations currently behave with respect to private information. Current behaviors are a reflection of culture, legislation, and policy, and all of these have proven themselves to be quite malleable, in our current environment. So the interesting question when it comes to private information is—how could and should people and organizations behave, and what options could or should they even have? This is a fundamental and part-normative question, and one that we cannot address without a substantial research effort. Despite being part-normative, this question can be useful in suggesting directions for even quite mathematical and applied research.
The first thing I’d like to ask is, What do we need to understand better in order to decide how to address this question? I see three relevant types of research that are largely missing:
1. We need a better understanding of the utility and harm that individuals, organizations, and society can potentially incur from the use of potentially sensitive data.
2. We need a better understanding of what the options for behavior could look like—which means we need to be open to a complete reinvention of the means by which we store, share, buy, sell, track, compute on, and draw conclusions from potentially sensitive data. Thus, we need a research agenda that helps us understand the realm of possibilities, and the consequences such possibilities would have.
3. It is, of course, important to remember the cultural, legislative, and policy context. It’s not enough to understand what people want and what is feasible. If we care about actual implementation, we must consider this broader context.
The first two of these points can and must be addressed with mathematical rigor, incorporating the perspectives of a wide variety of disciplines. Mathematical rigor is essential for a number of reasons, but the clearest one is that privacy is not an area where we can afford to deploy heuristic solutions and then cross our fingers. While inaccurate computations can later be redone for higher accuracy, and slow systems can later be optimized for better performance, privacy, once lost, cannot be “taken back.”
The second point offers the widest and richest array of research challenges. The primary work to address them will involve the development of new theoretical foundations for the technologies that would support these various interactions on potentially sensitive data.
For concreteness, let me give a few example research questions that fall under the umbrella of this second point:
1. What must be revealed about an individual’s medical data in order for her to benefit from and contribute to advances in medicine? How can we optimize the tradeoff of these benefits against potential privacy losses and help individuals make the relevant decisions?
2. When an offer of insurance is based on an individual’s history, how can this be made transparent to the individual? Would such transparency introduce incentives to “game” the system by withholding information, changing behaviors, or fabricating one’s history? What would be the impact of such incentives for misbehavior, and how should we deal with them?
3. How could we track the flow of “value” and “harm” through systems that transport large amounts of personal data (for example, the system of companies that buy and sell information on individuals’ online behavior)? How does this suggest that such systems might be redesigned?