Throughout our work on data center workload management over the past few years, one common theme that emerged was that of online convex optimization. Whether we were looking at load shifting, dynamic right-sizing, geographical load balancing, or data center demand response, we consistently ended up with an online convex optimization problem that we needed to solve. So, I wanted to do a short post here highlighting something particularly surprising (at least to us) that we uncovered last year.

**Online convex optimization**

*Online convex optimization (OCO)* is a rich theoretical area with a wide range of important applications. In OCO problems, a learner/agent interacts with an uncertain environment in a sequence of rounds. During each round : (i) the learner must choose an action from a convex decision space ; then, (ii) the environment reveals a convex cost function ; and finally, (iii) the learner experiences cost . The goal is to provide an algorithm that minimizes the cost experienced over a (long) horizon of rounds.

Historically, in computer science, OCO is perhaps most associated with the so-called -experts problem, a discrete-action version of online optimization wherein at each round the learning algorithm must choose one of possible actions, which can be viewed as following the advice of one of “experts”. However, there are also a number of other areas where OCO has a long history, e.g., portfolio management. Today, OCO is studied and applied within machine learning, online algorithms, computer systems, communication networks, control theory, economics, and operations research, to name a few.

**Different communities, differing perspectives**

Given the wide array of communities interested in OCO, it is not at all surprising that there is also a wide array of variations of OCO-like formulations across the communities. Even within computer science, this is true. The two CS communities with perhaps the largest literatures on OCO-type problems are the machine learning and online algorithms communities, and the formulations studied in each community are quite different.

The machine learning community has primarily focused on OCO formulations similar to that described above. The goal within that community is most typically to develop algorithms with good performance with respect to **regret**, i.e., the difference between the cost of the algorithm and the cost of the offline optimal static solution. In this context, a number of algorithms have been shown to provide sub-linear regret (a.k.a. no regret) in the worst-case. For example, online gradient descent can achieve -regret. Regret-minimizing algorithms have been known since the 1950’s (though not specifically for the OCO setting), and employ a wide variety of algorithmic techniques.

In contrast, the online algorithms community has focused on a variation of OCO termed *metrical task systems (MTSs)*. A MTS differs in that the learner is allowed to see the next cost function before selecting an action (effectively swapping steps (i) and (ii) above), and algorithms are penalized for a *switching cost* when changing actions from one round to the next, i.e., they pay an additional cost each round for some metric . Further, the typical performance metric is different in this context. Here, the goal is to design algorithms that minimize the **competitive ratio**, which is the worst-case ratio between the cost of the algorithm and the cost of the optimal offline (dynamic) solution. In this setting, most results tend to be “negative,” e.g., for any metric space, the competitive ratio of an MTS with states chosen from that space grows without bound as the number of states grows. However, recently, results have shown that positive results can be found in the case when the cost functions and decision space are convex. For example, in the case when the action space is one-dimensional, we have developed an algorithm that we call Lazy Capacity Provisioning (LCP), which is 3-competitive.

Note that the key difference between the metrics in the communities is whether they compare to the offline optimal dynamic solution, as in the competitive ratio, or the offline optimal static action, as in regret — that is, whether the metric measures how well a dynamic concept is learned or how well a static concept is learned. Clearly, it is desirable for an algorithm to be able to learn both dynamic and static concepts. (The addition of a switching cost in the MTS community turns out to not be a significant change, since the algorithms that perform well in OCO problems without switching costs also perform well in OCO problems with switching costs.)

**The incompatibility of competitive ratio and regret**

Interestingly, to this point, despite the wide variety of algorithms developed and analyzed in the two literatures, there are no algorithms that can guarantee good performance with respect to both the dynamic optimal and the static optimal solutions.

The closest work to this, until recently, is the work of Blum & Burch in 2000, which provides some connections between the machine learning and online algorithms literatures in the special case where there are switching costs that are a fixed constant. In this context, they show how to translate bounds on regret into bounds on the competitive ratio, and vice versa. However, beyond this result, there has been almost no success at connecting the various OCO literatures. (It is worth noting that Blum, Chawla, and Kalai have also showed how to achieve simultaneous guarantees with respect to the static and dynamic optimal solutions in other settings, e.g., decision making on lists and trees.)

I think our result from COLT last year sheds some light as to why this gap exists. In particular, there seems to be a fundamental incompatibility between learning a dynamic concept and learning a static concept, i.e., doing well with respect to both regret and the competitive ratio. Specifically, we have proven that **no algorithm can maintain both sublinear regret in OCO problems and a constant competitive ratio in MTS problems.** Importantly, this “incompatibility result” is not for a pathological example; it holds even for the simple case when is linear and is scalar, and it holds for both deterministic and randomized algorithms.

**Where to go from here?**

I’m taking the time to highlight this results here because I think it motivates a lot of interesting open questions for algorithm design in OCOs. For example, what if one considers variations on the definition of regret that focus on comparisons with optimal policies that allow limited dynamic behavior? Does the incompatibility result extend? What if the algorithms are allowed to use predictions — how much prediction is necessary for an algorithm to do well on both measures? What if a stochastic model is considered instead of an adversarial model? And so on…

We’ve been working hard on the task of understanding the value of predictions, since predictions are of crucial importance in all of the applications that we’re working on. So, I hope to have an answer there soon… But, on many of the other generalizations, I still have very little to say!