# CS+X at Caltech: Disrupting science and engineering with computational thinking

It seems like I’ve been waiting forever to make this post!  Back in the fall, I helped to organize an Alumni College at Caltech centered around the theme of “CS+X”.  Now, I’m very excited to announce that the videos from the event are up!

What is an alumni college you ask?  Well, instead a homecoming game or something like that, we get alumni back to Caltech by promising a day of research talks, well really thinks like TED talks!  So, Alumni College focuses on a different theme each year, and then does a day of provocative talks on that topic.  This year the theme was “Disrupting Science and Engineering with computational thinking” i.e., the disruptive power of CS+X.

As I’ve written about before, we view “CS+X” as what makes Caltech’s approach to computer science so distinctive compared to other schools.  We pride ourselves on inventing fields and then leaving them for others once they’re popular so that we can invent the next field.  So, “seeding fields and then ceding them”…

In any case, the alumni college was a day filled with talks on CS+X from researchers at Caltech that are leading new fields…  We covered CS+Astronomy, CS+Physics, CS+Biology, CS+Economics, CS+Chemistry, CS+Energy, and so on…

# Beyond worst-case analysis

Well, last week was the final workshop in the semester-long “Algorithms & Uncertainty” program that I helped to organize at the Simons institute.  There are a few weeks of reading groups / seminars / etc. left, but the workshop marks the last “big” event.  It’s sad to see the program wrapping up, though I’m definitely happy that I will be spending a lot less time on planes pretty soon — I’ve been flying back and forth every week!

The last workshop was on “Beyond worst-case analysis” and the organizers (Avrim Blum, Nir Ailon, Nina Balcan, Ravi Kumar, Kevin Leyton-Brown, Tim Roughgarden) did a great job of putting together a really unique program.  Since there’s not “one way” to go beyond worst-case, the week included a huge array of interesting ways to get results beyond what is possible from worst-case analysis, whether it be learning commonalities of different types of instances, semi-random / smoothed analysis models, or extracting features that enable learning what algorithmic approach to use.

If you’re interested in getting a broad overview of the approaches, I highly recommend watching Avrim’s introductory talk, which gave a survey of a ton of different approaches for going “beyond worst-case” that have received attention recently.  Tim also has a great course on the topic that is worth checking out…

My own contribution was the final talk of the week, which gave an overview of our work on how to use properties of prediction noise to help design better online optimization algorithms.  The idea is that predictions are crucial to almost all online problems, but we don’t really understand how properties of prediction error should influence algorithm design.  Over the last few years we’ve developed some models and results that allow initial insights into, e.g., how correlation structure in prediction errors changes the “optimal” algorithm form.  Check out the video for more details!

# Algorithms in the Field

One of the great new NSF programs in recent years is the introduction of the “Algorithms in the Field” program, which is a joint initiative from the CCF, CNS, and IIS divisions in CISE.  It’s goal is almost a direct match with what I try to do with my research:  it “encourages closer collaboration between (i) theoretical computer science researchers [..] and (ii) other computing and information researchers [..] very broadly construed”.  The projects it funds are meant to push the boundaries of theoretical tools and apply them in a application domain.

Of course this is perfectly suited to what we do in RSRG at Caltech!  We missed the first year of the call due to bad timing, but we submitted this year and I’m happy to say it was funded (over the summer when I wasn’t blogging)!

The project is joint with Steven Low, Venkat Chandrasekaran, and Yisong Yue and has the (somewhat generic) title “Algorithmic Challenges in Smart Grids: Control, Optimization, and Learning.

For those who are curious, here’s the quick and dirty summary of the goal…taken directly from the proposal.

# Data Markets in the Cloud

Over the last year, while I haven’t been blogging, one of the new directions that we’ve started to look at in RSRG is “data markets”.

“Data Markets” is one of those phrases that means lots of different things to lots of different people.  At its simplest, the idea is that data is a commodity these days — data is bought and sold constantly. The challenge is that we don’t actually understand too much about data as an economic good.  In fact, it’s a very strange economic good and traditional economic theory doesn’t apply…

# Finding Any Nontrivial Coarse Correlated Equilibrium Is Hard

One of the things one might hope to get from an equilibrium concept in a game is that it might plausibly describe or predict behavior in the real world. That’s one reason negative results suggesting that certain equilibria can’t be computed efficiently are so interesting—they call those equilibrium concepts into question. Of course, there’s much more to it than that, but the existence of efficient algorithms (or, even better, simple, distributed dynamics) for computing equilibria is a big selling point for a solution concept.

In that sense, correlated equilibria (CE) and coarse correlated equilibria (CCE) are much more promising than Nash equilibria (NE). Even in games with many players, there exist a number of natural dynamics that quickly converge to these solution concepts. In particular, these dynamics induce efficient computation of approximate CE and CCE in multiplayer games; by contrast, computation of approximate Nash equilibria is computationally hard in multiplayer games.

If you aren’t so familiar with CEs and CCEs, you can think of them as very similar to Nash equilibria, in that they also consist of a distribution over players’ action profiles at which no player can benefit by unilateral deviation, and hence represent stable choices of distributions over player actions. The big difference is that, whereas a Nash equilibrium is defined to be a product of independent distributions over actions (one for each player); correlated and coarse correlated equilibria are general (joint) probability distributions—CEs and CCEs allow players to correlate their choices of actions. For the purposes of this blog post, let’s not worry too much about the distinction between CEs and CCEs. Notice that CE and CCE are more general than NE—every NE is a CE (CCE), but the converse is not true. So it makes sense that one might be able to compute a CE efficiently, even in settings where computing a NE is hard.

Beyond computation of equilibria, another significant thread of research in algorithmic game theory has been the study of the quality of equilibria, often as measured by the social welfare of the equilibrium or its ratio to the social welfare of the socially optimal outcome (c.f. the extensive literature on the price of anarchy (PoA)). Given that we know it is possible to efficiently compute CE and CCE, it is natural to ask how good are the equilibria we can efficiently compute? For example, do existing efficient dynamics find the best such equilibria, or at least ones that approximately optimize the social welfare? Since the gap between the worst and the best equilibria (CE or CCE), in terms of social welfare, can be large in natural games, it is interesting to understand if there exist efficient dynamics or algorithms that avoid—at least to some extent—the bad outcomes.

In their notable work, Papadimitriou and Roughgarden [PR] show that determining a socially optimal CE is NP-hard, in a number of succinct multiplayer games*. (What’s socially optimal? The equilibrium that maximizes the total (or average) welfare of the players.) This result intuitively follows from the fact that determining an action profile with maximum welfare—i.e., solving the problem of welfare optimization even without equilibrium constraints—is NP-hard in general. The hardness result of [PR]  leaves open the question of computing near-optimal CE/CCE, i.e., whether there exist efficient algorithms that compute CE/CCE with welfare at least, say, alpha times the optimal, for a nontrivial approximation ratio alpha < 1.

In recent work with Sid Barman, a postdoc at Caltech, we consider exactly this question. We establish that, unless P=NP, there does not exist any efficient algorithm that computes a CCE with welfare better than the worst possible CCE, in succinct multiplayer games*. We also establish similar hardness results for computing equilibria under the egalitarian objective or Pareto-optimality.

Analogous hardness results hold for CE. A classical interpretation of a CE is in terms of a mediator who has access to the players’ payoff functions and who draws outcomes from a correlated equilibrium’s joint distribution over player actions and privately recommends the corresponding actions to each player. The equilibrium conditions ensure that no player can benefit in expectation by unilaterally deviating from the recommended actions. Therefore, the problem we study here is exactly the computational complexity of the problem that a mediator faces if she wishes to maximize social welfare.

We also extend the hardness result to approximate CE and CCE. Therefore, while one can efficiently compute an approximate CE/CCE in succinct multiplayer games, one cannot provide any nontrivial welfare guarantees for the resulting equilibrium (unless P=NP).

In addition, we show that this hardness result also holds specifically for potential games (generally considered to be a very tractable class of games), and persists even in settings where the gap between the best and worst equilibrium is large.

We also have some positive results, but I think the negative ones are the most interesting—this says that those appealing dynamics that compute CE/CCE can’t be guaranteed to be finding anything other than the worst equilibrium. To me, that’s some surprising bad news. One silver lining is that this provides new motivation for studying the price of anarchy (the quality of the worst equilibrium) for CE/CCE, since generally that’s the best thing we can hope to compute.

*Technical Aside (succinct games): In general multiplayer games the size of the normal form representation, $N$, is exponentially large in the number of players; one can compute a CE/CCE that optimizes a linear objective by solving a linear program of size polynomial in $N$, and hence the computational complexity of equilibrium computation is not interesting for general games. However, most games of interest—such as graphical games, polymatrix games, congestion games, local effect games, network design games, anonymous games, and scheduling games—admit a succinct representation (wherein the above-mentioned linear program can be exponentially large in the size of the representation), and hence it is such succinctly representable games that we (and previous works) study.

[PR] PAPADIMITRIOU, C. H. AND ROUGHGARDEN, T. 2008. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM) 55, 3, 14.

# Solution to puzzle: produce or learn?

This post is a solution to the puzzle in the last post.

The optimal strategy has a very simple form: there is a time ${t^* \in \{1, \dots, T\}}$ such that (${^*}$ denotes optimal quantities)

• only learn (${l^*(t)=1}$) before time ${t^*}$;
• only produce (${p^*(t)=1}$) from time ${t^*}$ on.

# Another puzzle: produce or learn?

When our kids were small, they were in sports teams (basketballs, baseball, soccer, …).  Their teams would focus on drills early in the season, and tournaments late in the season.  In violin, one studies techniques (scales, etudes, theory, etc.) as well as musicality (interpretation, performance, etc).   In (engineering) research, we spend a lot of time learning the fundamentals (coursework, mathematical tools, analysis/systems/experimental skills, etc.) as well as solving problems in specific applications (research). What is the optimal allocation of one’s effort in these two kinds of activities?

This is a complex and domain-dependent problem.  I suppose there is a lot of serious empirical and modeling research done in social sciences (I’d appreciate pointers if you know any).  But let’s formulate a ridiculously simple model to make a fun puzzle.

1. Consider a finite horizon t = 1, 2, …, T.   The time period t can be a day or a year.  The horizon T can be a project duration or a career.
2. Suppose there are only two kinds of activities, and let’s call them production and learning.  Our task is to decide for each t, the amount of effort we devote to produce and to learn.  Call these amounts p(t) and l(t) respectively.
3. These activities build two kinds of capabilities.  The fundamental capability L(t) at time t depends on the amount of learning we have done up to time t-1, L(t) := L(l(s), s=1, …, t-1).  The production capability P(t) at time t depends on the amount of effort we have devoted to production up to time t-1, P(t) := P(p(s), s=1, …, t-1).   We assume the functions L(l(s), s=1, …, t-1) and P(p(s), s=1, …, t-1) are increasing and time invariant (i.e., they depend only on the amount of effort already devoted, but not on time t).
4. The value/output we create in each period t is proportional to the time p(t) we spend on production multiplied by our overall capability at time t.   Our overall capability is a weighted sum P(t) + mL(t) of fundamental and production capabilities, with m>1.

Goal: choose nonnegative (p(t), l(t), t=1, …, T) so as to maximize the total value ${\sum_{t=1}^T\ p(t) (P(t) + m L(t))}$ subject to ${p(t) + l(t) \leq 1}$ for all t=1, …, T.

The assumption m>1 means that the fundamentals (quality) are more important than mere quantity of production.  The constraint ${p(t) + l(t) \leq 1}$ says that in each period t, we only have a finite amount of energy (assume a total of 1 unit) that can be devoted to produce and learn.  On the one hand, we want to choose a large p(t) because it not only produces value, but also increases future production capabilities P(s), s=t+1, …, T.  On the other hand, since m>1, choosing a large l(t) increases our overall capability more rapidly, enhancing value.  What is the optimal tradeoff?

We pause to comment on our assumptions, some of which can be addressed without complicating our model too much.

Caveats.  On the outset, our model assumes every activity can be cleanly classified as building either the fundamental capability or the production capability.  In reality, many activities contribute to both.  Moreover, the interaction between these two activities is completely ignored, except that they sum to no more than 1 unit.  For example, production (games, performance, research and publication, etc) often provides important incentives and contexts for learning and influences strongly the effectiveness of learning, but our function L is independent of  p(s).  The time invariance assumption in 3 above implies that we retain our capabilities forever after they are built; in reality, we may lose some of them if we don’t continue to practice.  If we think of P(t)+mL(t) as a measure of quality, then our objective function assumes that there is always positive value in production, regardless of its quality.  In reality, production of poor quality may incur negative value, even fatal.

A puzzle

A simple puzzle is the special case where the capabilities depend on (are) the total amounts of effort devoted, i.e.,

${L(t)\ := \ \sum_{s=1}^{t-1} l(s), \ \ \ P(t) \ :=\ \sum_{s=1}^{t-1} p(t) }$

Despite its nonconvexity, the problem can be explicitly solved and the optimal strategy turns out to have a very simple structure.  I will explain the solution in the next post and discuss whether it agrees, to first order, with our intuition and how some of the disagreements can be traced back to our simplifying assumptions.