Introducing DOLCIT

At long last, we have gotten together and created a “Caltech-style” machine learning / big data / optimization group, and it’s called DOLCIT: Decision, Optimization, and Learning at the California Institute of Technology.  The goal of the group is to take a broad and integrated view of research in data-driven intelligent systems. On the one hand, statistical machine learning is required to extract knowledge in the form of data-driven models. On the other hand, statistical decision theory is required to intelligently plan and make decisions given imperfect knowledge. Supporting both thrusts is optimization.  DOLCIT envisions a world where intelligent systems seamlessly integrate learning and planning, as well as automatically balance computational and statistical tradeoffs in the underlying optimization problems.

In the Caltech style, research in DOLCIT spans traditional areas from applied math (e.g., statistics and optimization) to computer science (e.g., machine learning and distributed systems) to electrical engineering (e.g., signal processing and information theory). Further, we will look broadly at applications spanning information and communication systems to the physical sciences (neuroscience and biology) to social systems (economic markets and personalized medicine).

In some sense, the only thing that’s new is the name, since we’ve been doing all these things for years already.  However, with the new name will come new activities like seminars, workshops, etc.  It’ll be exciting to see how it morphs in the future!

(And, don’t worry, RSRG is still going strong — RSRG and DOLCIT should be complementary with their similar research style but differing focuses with respect to tools and applications.)


Simons Workshop on Big Data and Differential Privacy

I recently returned from a workshop on Big Data and Differential Privacy, hosted by the Simons Institute for the Theory of Computing, at Berkeley.

Differential privacy is a rigorous notion of database privacy intended to give meaningful guarantees to individuals whose personal data are used in computations, where “computations” is quite broadly understood—statistical analyses, model fitting, policy decisions, release of “anonymized” datasets,…

Privacy is easy to get wrong, even when data-use decisions are being made by well-intentioned, smart people. There are just so many subtleties, and it is impossible to fully anticipate the range of attacks and outside information an adversary might use to compromise the information you choose to publish. Thus, much of the power of differential privacy comes from the fact that it gives guarantees that hold up without making any assumptions about the attacks the adversary might use, her computational power, or any outside information she might acquire. It also has elegant composition properties (helping us understand how privacy losses accumulate over multiple computations).

Continue reading