Data Markets in the Cloud

Over the last year, while I haven’t been blogging, one of the new directions that we’ve started to look at in RSRG is “data markets”.

“Data Markets” is one of those phrases that means lots of different things to lots of different people.  At its simplest, the idea is that data is a commodity these days — data is bought and sold constantly. The challenge is that we don’t actually understand too much about data as an economic good.  In fact, it’s a very strange economic good and traditional economic theory doesn’t apply…

Continue reading

QUESTA Special Issue on Cloud Computing

I’ve been meaning to post about this for a while, but better late then never I guess!   Javad Ghaderi, Sanjay Shakkottai, Sasha Stolyar, and I are editing a special issue at QUESTA on Cloud Computing.  The issue is devoted to modeling and theoretical analysis of algorithm design, market issues, and performance challenges in cloud systems. So, the scope is quite broad but, of course, being QUESTA, we are interested in papers that develop new analytic tools and techniques for this domain in, e.g., areas such as stochastic processes and scheduling theory.

The deadline for papers is April 1st, so I apologize for posting this so late.  But, I hope to see lots of great submissions!

You can find the full details for submission, formatting, etc., here.

Autoscale, a.k.a. “Dynamic right-sizing”, at Facebook

A bit of news on the data center front, for those who may have missed it:  Facebook recently announced the deployment of a new power-efficient load balancer called “Autoscale.”  Here’s their blog post about it.

Basically, the quick and dirty summary of the design is to adapt the number of active servers so that it’s proportional to the workload, and adjust the load balancing to focus on keeping servers “busy enough” so that they don’t end up in a situation where lots of servers are very lightly loaded.

So, the ideas are very related to what’s been going on in academia over the last few years.  Some of the ideas are likely inspired by Anshul Ghandi and Mor Harchol-Balter et al.’s work (who have been chatting with Facebook over the past few years), and it’s actually quite similar in the architecture to the “Net Zero Data Center Architecture” developed by HP (that incorporated some of our work, e.g. these papers, which are joint with Minghong Lin, who now works with the infrastructure team at Facebook).

While this isn’t the first tech company to release something like this, it’s always nice to see it happen.   And, it will give me more ammo to use when chatting with people about the feasibility of this sort of design.  It is amazing to me that I still get comments from folks about how “data center operators don’t care about energy”…  So, to counter that view, here’re some highlights from the post:

“Improving energy efficiency and reducing environmental impact as we scale is a top priority for our data center teams.”

“during low-workload hours, especially around midnight, overall CPU utilization is not as efficient as we’d like. […] If the overall workload is low (like at around midnight), the load balancer will use only a subset of servers. Other servers can be left running idle or be used for batch-processing workloads.”

Anyway, congrats to Facebook for taking the plunge.  I hope that I hear about many other companies doing the same in the coming years!

Data centers & Energy: Did we get it backwards?

The typical story surrounding data centers and energy is an extremely negative one: Data centers are energy hogs.  This message is pervasive in the media, and it certainly rings true.  However, we have come a long way in the last decade, and though we certainly still need to “get our house in order” by improving things further, the most advanced data centers are quite energy-efficient at this point.  (Note that we’ve done a lot of work in this area at Caltech and, thanks to HP, we are certainly glad to see it moving into industry deployments.)

But, the view of data centers as energy hogs is too simplistic.  Yes, they use a lot of energy, but energy usage is not a bad thing in and of itself.  In the case of data centers, energy usage typically leads to energy savings.  In particular, moving things to the cloud is most often a big win in terms of energy usage…

More importantly, though, the goal of this post is to highlight that, in fact, data centers can be a huge benefit in terms of integrating renewable energy into the grid, and thus play a crucial role in improving the sustainability of our energy landscape.

In particular, in my mind, a powerful alternative view is that data centers are batteries.  That is, a key consequence of energy efficiency improvements in data centers is that their electricity demands are very flexible.  They can shed 10%, 20%, even 30% of their electricity usage in as little as 10 minutes by doing things such as precooling, adjusting the temperature, demand shifting, quality degradation, geographical load balancing, etc.  These techniques have all been tested at this point in industry data centers, and can be done with almost no performance impact for interactive workloads!

Continue reading

The Community Seismic Network: Citizen Science and the Cloud

We almost missed the chance to highlight that the cover story of the July, 2014 issue of the Communications of the ACM (CACM) is a paper by a Caltech group on the Community Seismic Network (CSN). This note is about CSN as an example of system in a growing, important nexus: citizen science, inexpensive sensors, and cloud computing.

CSN uses inexpensive MEMS accelerometers or accelerometers in phones to detect shaking from earthquakes. The CSN project builds accelerometer “boxes” that contain an accelerometer, a Sheevaplug, and cables. A citizen scientist merely affixes the small box to the floor with double-sided sticky tape, and connects cables from the box to power and to a router. Installation takes minutes.

Analytics in the Sheevaplug or some other computer connected to the accelerometer analyzes the raw data streaming in from the sensor. This analytics engine detects local anomalous acceleration. Local anomalies could be due to somebody banging on a door, or a big dog jumping off the couch (frequent occurrence in my house), or due to an earthquake. The plug computer or phone sends messages to the cloud when it detects a local anomaly. An accelerometer may measure at 200 samples per second, but messages get sent to the cloud at rates that range from one per minute, to one every 20 minutes. The local anomaly message includes the sensor id, location (because phones move), and magnitude.

There are four critical differences between community networks and traditional seismic networks:

  •  Community sensor fidelity is much poorer than that of expensive instruments.
  •  The quality of deployment of community sensors by ordinary citizens is much more varied than that of sensors deployed by professional organizations.
  • Community sensors can be deployed more densely than expensive sensors. Think about the relative density of phones versus seismometers in earthquake-prone regions of the world such as Peru, India, China, Pakistan, Iran and Indonesia.
  • Community sensors are deployed where communities are located, and these locations may not be the most valuable for scientists.

Research questions investigated by the Caltech CSN team include: Are community sensor networks useful? Does the lower-fidelity, varied installation practices, and relatively random deployment result in networks that don’t provide value to the community and don’t provide value to science? Can community networks add value to other networks operated by government agencies and companies? Can inexpensive cloud computing services be used to fuse data from hundreds of sensors to detect earthquakes within seconds?

Continue reading

“How clean is your cloud” two years later

Two years ago, Greenpeace put out a report titled “How clean is your cloud,” taking many of the IT giants to task for their lack of  commitment to sustainability in their data centers.  Now, a few years later, Greenpeace is still at it and has been pushing hard with a mixture of yearly public praise/shaming (or maybe they’d prefer the term “public education”) about the commitment and progress companies are making toward a sustainable cloud.

When reading the most recent report “Clicking clean,” it is really quite amazing how far the industry has come.  While there is still room for improvement, even the companies Greenpeace critiques are light-years ahead of where the industry was five years ago.    Apple, which was the black sheep of the initial report, has now committed to 100% renewable energy for its cloud, while Amazon, which was ahead of the curve in the initial report, is really hit hard.

Continue reading