A report from CDS 20 & CMS 0

All of the slides (and some videos) are now up from the Control and Dynamical Systems (CDS) 20th anniversary workshop that we held here at Caltech earlier this month.

The workshop was a huge success, and it was thrilling to see so many alums from the PhD program coming back.  It’s really amazing to see how successful the program has been, and how varied the research is that they are doing now.  CDS alums have become professors at all of the top 15 universities in the world (according to the London Times ranking), and they hold positions in a huge variety of departments: CS, EE, Control, Mech E, Math, and Bio.  So, as a result, the applications covered during the CDS20 workshop were extremely broad: from bio and physiology, to communication networks and the smart grid, to machine learning and privacy.

And, not only was the worshop the 20th anniversary of CDS, it marked the kickoff our our new Computing and Mathematical Sciences (CMS) PhD program.  We used the last session to highlight the evolution of CDS, which has resulted in the emergence of CMS.   This new program is really modeled after the tenets of CDS that have proven so successful: seek rigor and relevance, and ensure that research is student-centric. I’ve written about this new program in a previous post, but I put together a slide deck for CDS20 that I think does a pretty good job of introducing the program.  We can only hope that in 20 years we’ll have had as much impact as CDS…

Residual Lives, Hazard Rates, and Long Tails (Part II)

This is part II in a series on residual life, hazard rates, and long-tailed distributions. If you haven’t read part I yet, read that first! The previous post in this series highlighted that one must be careful in connecting “heavy-tailed” with the concepts of “increasing mean residual life” and “decreasing hazard rate.”

In particular, there are many examples of light-tailed distributions that are IMRL and DHR. However, if we think again about the informal examples that we discussed in the previous post, it becomes clear that IMRL and DHR are too “precise” to capture the phenomena that we were describing. For example, if we return to the case of waiting for a response to an email, it is not that we expect our remaining waiting time to be monotonically increasing as we wait. If fact, we are very likely to get a response quickly, so the expected waiting time should drop initially (and the hazard rate should increase initially). It is only after we have waited a “long” time already, in this case a few days, that we expect to see a dramatic increase in our residual life. Further, in the extreme, if we have not received a response in a month, we can reasonably expect that we may never receive a response, and so the mean residual life is, in some sense, growing unboundedly, or equivalently, the hazard rate is decreasing to zero. The example of waiting for a subway train highlights the same issues. Initially, we expect that the mean residual life should decrease, because if the train is on schedule, things are very predictable. However, once we have waited a long time beyond when the train was supposed to arrive, it likely means something went wrong, and could mean the train has had some sort of mechanical problem and will never arrive.

Continue reading

Resnick Sustainability Institute Post-Doctoral Fellowship

I’d like to announce a postdoc opportunity at Caltech for those on the energy-side of things.  The program is run by the Resnick Institute, which is the overarching center for energy research of all forms on campus.  It includes the things that we (Steven, Mani, me, etc) do in power systems, as well as lots of other activities across materials, chemistry, physics, aeronautics, etc.  So, it’s a great place for interdisciplinary work.

Here’s the blurb about the postdoc fellowship:

About the Resnick Sustainability Institute Post-Doctoral Fellowship: The Resnick fellows will have support for up to two years to work on creative, cross-catalytic research that complements the existing work of the Caltech faculty, or that creates new research directions within the mission areas of the Resnick Sustainability Institute. Eligible candidates will have completed their PhD within five years of the start of the appointment, and should have secured a commitment from one or more Caltech faculty members to serve as a mentor and provide office/lab space for the length of the fellowship. Candidates can come from any country, provided they are proficient in English. Applications consisting of a research proposal, cover letter, recommendations and CV can be submitted through our website: http://resnick.caltech.edu/fellowships-apply.php. The fellowship will provide an annual salary of $65,000 plus benefits, $6,000/year in research budget, and relocation allowance of $3,000. Any questions can be directed to rpd@caltech.edu.

Note that this is not the only postdoc program available for folks that want to join RSRG.  We also look for postdocs through the CMI program, and that call will come out later in the fall.  Applications for CMI tend to be due in December.

Autoscale, a.k.a. “Dynamic right-sizing”, at Facebook

A bit of news on the data center front, for those who may have missed it:  Facebook recently announced the deployment of a new power-efficient load balancer called “Autoscale.”  Here’s their blog post about it.

Basically, the quick and dirty summary of the design is to adapt the number of active servers so that it’s proportional to the workload, and adjust the load balancing to focus on keeping servers “busy enough” so that they don’t end up in a situation where lots of servers are very lightly loaded.

So, the ideas are very related to what’s been going on in academia over the last few years.  Some of the ideas are likely inspired by Anshul Ghandi and Mor Harchol-Balter et al.’s work (who have been chatting with Facebook over the past few years), and it’s actually quite similar in the architecture to the “Net Zero Data Center Architecture” developed by HP (that incorporated some of our work, e.g. these papers, which are joint with Minghong Lin, who now works with the infrastructure team at Facebook).

While this isn’t the first tech company to release something like this, it’s always nice to see it happen.   And, it will give me more ammo to use when chatting with people about the feasibility of this sort of design.  It is amazing to me that I still get comments from folks about how “data center operators don’t care about energy”…  So, to counter that view, here’re some highlights from the post:

“Improving energy efficiency and reducing environmental impact as we scale is a top priority for our data center teams.”

“during low-workload hours, especially around midnight, overall CPU utilization is not as efficient as we’d like. […] If the overall workload is low (like at around midnight), the load balancer will use only a subset of servers. Other servers can be left running idle or be used for batch-processing workloads.”

Anyway, congrats to Facebook for taking the plunge.  I hope that I hear about many other companies doing the same in the coming years!

Residual Lives, Hazard Rates, and Long Tails (Part I)

This is the third series of posts I’m writing on topics related to what we are covering in our book on heavy-tails (which I discussed in an earlier post). The first two were on the catastrohphe principle (subexponential distributions) and power laws (regularly varying distributions). This time I’ll focus on connections between residual life, hazard rate, and long tailed distributions.

Residual life in our daily lives

Over the course of our days we spend a lot of our time waiting for things — we wait for a table at restaurants, we wait for a subway train to show up, we wait for people to respond to our emails, etc. In such scenarios, we hold on to the belief that, as we wait, the likely amount of remaining time we will need to wait is getting smaller. For example, we believe that, if we have waited ten minutes for a table at a restaurant, the expected time we have left to wait should be smaller than it was when we arrived and that, if we have waited five minutes for the subway, then our expected remaining wait time should be less than it was when we arrived.

In many cases this belief holds true. For example, as other diners finish eating, our expected waiting time for a table at a restaurant drops. Similarly, subway trains follow a schedule with (nearly) deterministic gaps between trains and thus, as long as the train is on schedule, our expected remaining waiting time decreases as we wait. However, a startling aspect of heavy-tailed distributions is that this is not always true. For example, if you have waited a very long time past the scheduled arrival time for a subway train, then it is very likely that there was some failure and the train may take an extremely long time to arrive, and so your expected remaining waiting time has actually increased while you waited. Similarly, if you are waiting for a response to an email and have not heard for a few days, it is likely to be a very long time until a response comes (if it ever does).

Continue reading

Data centers & Energy: Did we get it backwards?

The typical story surrounding data centers and energy is an extremely negative one: Data centers are energy hogs.  This message is pervasive in the media, and it certainly rings true.  However, we have come a long way in the last decade, and though we certainly still need to “get our house in order” by improving things further, the most advanced data centers are quite energy-efficient at this point.  (Note that we’ve done a lot of work in this area at Caltech and, thanks to HP, we are certainly glad to see it moving into industry deployments.)

But, the view of data centers as energy hogs is too simplistic.  Yes, they use a lot of energy, but energy usage is not a bad thing in and of itself.  In the case of data centers, energy usage typically leads to energy savings.  In particular, moving things to the cloud is most often a big win in terms of energy usage…

More importantly, though, the goal of this post is to highlight that, in fact, data centers can be a huge benefit in terms of integrating renewable energy into the grid, and thus play a crucial role in improving the sustainability of our energy landscape.

In particular, in my mind, a powerful alternative view is that data centers are batteries.  That is, a key consequence of energy efficiency improvements in data centers is that their electricity demands are very flexible.  They can shed 10%, 20%, even 30% of their electricity usage in as little as 10 minutes by doing things such as precooling, adjusting the temperature, demand shifting, quality degradation, geographical load balancing, etc.  These techniques have all been tested at this point in industry data centers, and can be done with almost no performance impact for interactive workloads!

Continue reading