CMS Faculty Search is Live — Apply today!

I’m very happy to announce that our CMS department faculty search is live.  As in previous years, we’re searching broadly — truly broadly.  We’re looking across applied math and computer science both and expect to be able to make multiple offers.  We’re interested in candidates in a variety of core areas, from distributed systems and machine learning to statistics and optimization (and lots of other areas).  But, more generally, we look for impressive, high-impact work rather than enforcing preconceived notions of what is hot at the moment.  Beyond the core areas of applied math and computer science, we are hoping to see strong applications in areas on the periphery of computing and applied math too — candidates at the interface of EE, mechanical engineering, economics, privacy, biology, physics, etc. are definitely encouraged to apply!  As I said in my recent post, inventing new CS+X fields is something that Caltech excels at — it’s our brand.

Also, I want to highlight that we adopted a unique way of running our searches last year, and it worked out wonderfully so we’ll be doing it again this year — we’re organizing our interviews as hiring symposiums.  

In my opinion, the traditional mode of faculty hiring does a disservice to both candidates and departments.  Candidates come and give talks to audiences that are exhausted from attending dozens of job talks and often end up scheduled on days when many faculty are out of town (sometimes the most important faculty).  Then, they have to wait for feedback/offers until after all the other candidates go through — all the while learning very little about how they did and what their status is.  On the other had, departments often end up making decisions by comparing candidates who did not see the same set of faculty (do to people being out of town) and who came through over months.  So the hazy memories and varying faculty in attendance lead to poor decision making.

Our approach, which we adopted from the Biology community, is to organize our interviews in two “symposiums”, where ~5 candidates come in for each symposium.  The first day of the symposium is all research talks and candidates can choose whether to attend each others talks or visit around campus.  The second day is one-on-one interviews.  Because the dates of the symposiums are picked well in advance every faculty member can be present — even faculty from other related departments.  As a result, we end up with audiences of around 100+ people, which is huge for a small place like Caltech.  Further, the symposiums are scheduled close together so that within a couple weeks of the interviews the faculty can finish discussions and all candidates can know exactly where they stand.  This additionally means that the discussions can be much more fair and balanced since they happen when the talks are all equally fresh and all the faculty were present for all the talks.  An added bonus is that the candidates can get quick feedback with which to improve their later interviews!

As I said, the initial test of this model last year was a resounding success from all accounts (both from our faculty and the candidates we interviewed) and so I’m excited to announce that we’ll be doing it again this year!

Making Sigmetrics a “jourference”

The CFP for this year’s Sigmetrics is now being widely circulated and it includes something very new — it takes Sigmetrics a step towards the hybrid journal/conference, a.k.a., jourference model.   This represents the culmination of more than two years of discussions and work by the Sigmetrics board (of which I’m a part of), so I’m pretty excited to see how the experiment plays out!

Why go to the jourference model? 

For those who have somehow managed to avoid all the debates about the pluses and minus of the conference models in CS, I won’t rehash them here.  You can find in depth discussions here, here, here, and many other places…

In the case of Sigmetrics, there are additional factors at play beyond the typical ones.  Sigmetrics has always been a sort of “in between” conference.  It’s been a place where theoretical work, measurement work, and systems work play nicely but maintain a sort of fragile balance.  This means that the PC and paper submissions are diverse, which often leads to very difficult discussions on the PC — discussions where people from different subcommunities have very different biases, styles, and opinions.  As a result, there is often a situation where papers that fall in between communities (e.g. papers that mix systems and theory) get reviewers that are looking for very different things in papers, and thus have a very hard time keeping everyone happy.  In a review process that doesn’t allow for revisions, this leads to a tougher path to acceptance for such papers.  Yet, these are exactly the papers the community wants to support!

So, one of the key things we’re hoping to accomplish with this shift is a move from “comparing papers to one another” and looking for “reasons to reject” toward a system where reviewers ask if the paper is “interesting and high quality” and look for “reasons to accept”.  Adding revisions and moving away from a model where the goal is to fill a program with 30-35 papers toward one where we accept as many papers as are interesting without trying to put them in an ordered list should lead to a situation where the program is much more diverse and provocative!

Another issue that the Sigmetrics community faces is one where many authors care about journal publications, since they overlap with the EE or OR communities. Converting Sigmetrics papers (which are longer than most journal page limits) into journal papers is difficult and frustrating…and so providing a clear, top-tier ACM journal publication for all Sigmetrics papers will hopefully help to attract more papers from  these interdisciplinary researchers (a group where I include myself)!

How it will work…

To make this work for Sigmetrics we’ll be moving toward a system with three deadlines per year.  Papers accepted at any point during the year will be presented at the Sigmetrics conference, and will appear in a “Proceedings of the ACM” journal immediately after acceptance (maybe before the conference).  The review process for papers will be the best of journal and conference reviews.  Papers will be either accepted, rejected, or offered a chance at revision.  There will be only one round of revision allowed in order to ensure quick decisions on papers.  Also, we will have face-to-face PC meetings (as always) in order to ensure reviewers have a chance to discuss papers in detail (and are held accountable for there interpretations of the papers).

For this year, we’ll only have time for two submission deadlines before the conference (Oct 18 and Jan 17), so be sure not to miss them!  Next year will start the full three submission period cycle…

I hope to see lots of papers!

 

(Nearly) A year later

It’s been one year since I started as executive officer (Caltech’s name for department chair) for our CMS department…and, not coincidentally, it’s been almost that long since my last blog post!  But now, a year in, I’ve got my administrative legs under me and I think I can get back to posting at least semi-regularly.

As always, the first post back after a long gap is a news filled one, so here goes!

Caltech had an amazing faculty recruitment year last year!  Caltech’s claim to fame in computer science has always been pioneering disruptive new fields at the interface of computing — quantum computing, dna computing, sparsity and compressed sensing, algorithmic game theory, … Well, this year we began an institute-wide initiative to redouble our efforts on this front and it yielded big rewards.  We hired six new mid-career faculty at the interface of computer science!  That is an enormous number for Caltech, where the whole place only has 300 faculty…

The hires include:

We’re extremely excited to have all of them join us at Caltech!

On the more personal side, I’m in the middle of an exciting semester… I’m an organizer of the program at the Simons Institute this term on Algorithms and Uncertainty.  So, I’m spending two days every week up at Berkeley.  It sounds painful, but it’s actually working out great so far.  It’s about 2.5 hours from my house in Pasadena to my office at Berkeley, so I can have breakfast with the kids one day and be home for dinner with the kids the next!

We’re just wrapping up the first workshop of the program, on   “Optimization and Decision-making under Uncertainty“.  All the talks are videoed, so I highly recommend watching them!

Also, I was neglectful in not posting during the boot camp last month, but there were some great talks there too.  I gave two tutorials there: one with Nikhil Bansal on “Online scheduling meets Queueing” (part I, part II) and one with Eilyan Bitar on “Energy and Uncertainty” (part I, part II).  The videos for both are available at the links.

That’s enough for a first post, but I’ll try to follow this up with more regular posts in the coming weeks/months!

The Forgotten Data Centers

Data centers are where the Internet and cloud services live, and so they have been getting lots of public attention in recent years. If we read technology news or research papers, it’s not uncommon that we see IT giants, like Google and Facebook, publicly discuss and share the designs of mega-scale data centers they operate. But, another important type of data center –– multi-tenant data center, or commonly called “colocation”/”colo” –– has been largely hidden from the public and rarely discussed (at least in research papers), although it’s very common in practice and located almost everywhere, from Silicon Valley to the gambling capital, Las Vegas.

Unlike a Google-type data center where the operator manages both IT equipment and the facility, multi-tenant data center is a shared facility where multiple tenants house their own servers in shared space and the data center operator is mainly responsible for facility support (like power, cooling, and space). Although the boundary is blurring, multi-tenant data centers can be generally classified as either a wholesale data center or a retail data center: wholesale data centers (like Digital Realty) primarily serve large tenants, each having a power demand of 500kW or more, while retail data centers (like Equinix) mostly target tenants with smaller demands.

Multi-tenant data centers serve almost all industry sectors, including finance, energy, major web service providers, content delivery network providers, and even some IT giants lease multi-tenant data centers to complement their own data center infrastructure. For example, Google, Microsoft, Amazon, and eBay are all large tenants in a hyper-scale multi-tenant data center in Las Vegas, NV, and Facebook leases a large data center in Singapore to serve its users in Asia.

Multi-tenant data centers and clouds are also closely tied. Many public cloud providers, like SalesForce, which don’t want to or can’t build their own massive data centers, lease capacities from multi-tenant data center providers. Even the largest players in the public clouds, like Amazon, use multi-tenant data centers to quickly expand their services, especially in regions outside the U.S. In addition, with the emergence of hybrid cloud as the most popular option, many companies are housing the entirety of their private clouds in multi-tenant data centers, while large public cloud providers are forging partnership with multi-tenant data center providers to help tenants leverage public clouds to complement their private parts.

Today, the U.S. alone has over 1,400 large multi-tenant data centers, which consumed nearly as five times energy as Google-type data centers all combined (37.3% versus 7.8%, in percentage relative to all data center energy usage, excluding tiny server closets). Driven by the surging demand for web services, cloud computing, and Internet of Things, the multi-tenant data center industry is expected to continue its rapid growth. While the public attention mostly goes to large IT giants who continuously expand their data center infrastructure, multi-tenant data center providers are also building their own, even at a faster pace.

Despite their prominence, multi-tenant data centers have been much less studied than Google-type data centers by the research community. While, these two types of data centers share many of the same high-level goals, like energy efficiency, utilization, and renewable integration; many of the existing approaches proposed for Google-type data centers don’t apply to multi-tenant data centers, which have additional challenges due to the operator’s lack of control over tenants’ servers. Even worse, individual tenants manage their own servers with little coordination with others (in fact, tenants typically don’t even know whom they’re sharing the data center with).

As a concrete example, consider the problem we studied in our recent HPCA’16 paper. Keeping servers’ aggregate power usage always below the data center capacity is extremely important for ensuring data center uptime.  When the aggregate power occasionally exceeds the capacity (called emergency, due to power oversubscription), a common technique used in Google-type data centers is to carefully lower the servers’ power consumption so as to meet multi-level power capacity constraints while minimizing the performance degradation. In a multi-tenant data center, the operator can’t do this –– tenants themselves control the servers. Further, who should reduce power and by how much must be carefully decided to minimize the performance loss, but these decisions require the operator to know tenants’ private information (e.g., what workloads are running, what’s the performance loss if lowering certain servers’ power usage).  So, these challenges can’t be addressed by the existing technological approaches alone; instead, they require novel market designs along with new advances in data center architecture, with the goal of providing mechanisms that are “win-win” for both data center operators and tenants.

The above was one example of the added complexity due to the multi-tenant setting, but there are many others.  We have a few papers on this topic (HPCA’16, HPCA’15, Performance’15, …), but will be looking into it much more in the coming years. We hope others do as well!

Reporting from SoCal NEGT

Last week, USC hosted our annual Southern California Network Economics and Game Theory (NEGT) workshop.  (Thanks to David Kempe and Shaddin Dughmi for all the organization this year!)  It’s always a very fun workshop, and really does a great job in ensuring a multidisciplinary community around CS, EE, and Econ in the LA area.  We’ve been doing it for so long now that the faculty & students really know each other well at this point…

As always, there were lots of great talks.  In particular, we had a great set of keynotes again this year.

The first keynote was Ashish Goel, who gave an inspiring talk about his work on “Crowdsourced democracy” where he has managed an incredible thing.  He has built a system for participatory budgeting (the process where the community votes on particular social works projects and the outcome of the voting actually determines budget priorities).  His system has now been used in a wide variety of cities, and whenever it’s used, he’s gotten to run experiments outside of the voting that allow him to gather data about the effectiveness of a variety of platform designs for participatory voting.  This in turn has motivated some deep theoretical work on the efficiency of different platform designs, which looks like it has the possibility of impacting real practice in the coming years!  It is truly an exciting place where theory and practice are intertwined — and where a researcher is really attacking a problem of crucial societal importance.

The second keynote came from Kevin Leyton-Brown, who also gave a truly ambitious talk about work he’s pursuing that takes a look at the foundations of game theoretic models — questioning the standard models of game theory.  Kevin’s work typically takes a hard look at the interactions of theoretical and practical issues in algorithmic game theory, and this does the same.  It questions the typical theoretical abstractions about agent behavior and strives to build better models of how people actually react in strategic settings.  It is great to see someone from the computer side of algorithmic game theory getting engaged in the behavioral economic side of things — this is an area of economics that is, to this point, fairly untouched by computer scientists.

Of course, there were lots of interesting talks from the “locals” too, but I’ll stop here.  We’ve now been doing this for seven years, and I’m so glad that it’s going strong — I’m looking forward to trucking over to UCLA for next year’s incarnation!

Introducing DOLCIT

At long last, we have gotten together and created a “Caltech-style” machine learning / big data / optimization group, and it’s called DOLCIT: Decision, Optimization, and Learning at the California Institute of Technology.  The goal of the group is to take a broad and integrated view of research in data-driven intelligent systems. On the one hand, statistical machine learning is required to extract knowledge in the form of data-driven models. On the other hand, statistical decision theory is required to intelligently plan and make decisions given imperfect knowledge. Supporting both thrusts is optimization.  DOLCIT envisions a world where intelligent systems seamlessly integrate learning and planning, as well as automatically balance computational and statistical tradeoffs in the underlying optimization problems.

In the Caltech style, research in DOLCIT spans traditional areas from applied math (e.g., statistics and optimization) to computer science (e.g., machine learning and distributed systems) to electrical engineering (e.g., signal processing and information theory). Further, we will look broadly at applications spanning information and communication systems to the physical sciences (neuroscience and biology) to social systems (economic markets and personalized medicine).

In some sense, the only thing that’s new is the name, since we’ve been doing all these things for years already.  However, with the new name will come new activities like seminars, workshops, etc.  It’ll be exciting to see how it morphs in the future!

(And, don’t worry, RSRG is still going strong — RSRG and DOLCIT should be complementary with their similar research style but differing focuses with respect to tools and applications.)

Caltech CMS is hiring

I’m happy to announce that the Computing and Mathematical Sciences (CMS) department will be continuing to grow this year.  The ad for our faculty search is now up, so spread the word!

As you’ll see, the ad is intentionally broad.  We are looking for strong applicants across computer science and applied math.  We look for impressive, high-impact work rather than enforcing preconceived notions of what is hot at the moment, and we are definitely welcoming of areas on the periphery of computing and applied math too — candidates at the interface of EE, mechanical engineering, economics, biology, physics, etc. are definitely encouraged to apply!  One of the strengths of our department (and Caltech in general) is the permeability of the boundaries between fields.