The Forgotten Data Centers

Data centers are where the Internet and cloud services live, and so they have been getting lots of public attention in recent years. If we read technology news or research papers, it’s not uncommon that we see IT giants, like Google and Facebook, publicly discuss and share the designs of mega-scale data centers they operate. But, another important type of data center –– multi-tenant data center, or commonly called “colocation”/”colo” –– has been largely hidden from the public and rarely discussed (at least in research papers), although it’s very common in practice and located almost everywhere, from Silicon Valley to the gambling capital, Las Vegas.

Unlike a Google-type data center where the operator manages both IT equipment and the facility, multi-tenant data center is a shared facility where multiple tenants house their own servers in shared space and the data center operator is mainly responsible for facility support (like power, cooling, and space). Although the boundary is blurring, multi-tenant data centers can be generally classified as either a wholesale data center or a retail data center: wholesale data centers (like Digital Realty) primarily serve large tenants, each having a power demand of 500kW or more, while retail data centers (like Equinix) mostly target tenants with smaller demands.

Multi-tenant data centers serve almost all industry sectors, including finance, energy, major web service providers, content delivery network providers, and even some IT giants lease multi-tenant data centers to complement their own data center infrastructure. For example, Google, Microsoft, Amazon, and eBay are all large tenants in a hyper-scale multi-tenant data center in Las Vegas, NV, and Facebook leases a large data center in Singapore to serve its users in Asia.

Multi-tenant data centers and clouds are also closely tied. Many public cloud providers, like SalesForce, which don’t want to or can’t build their own massive data centers, lease capacities from multi-tenant data center providers. Even the largest players in the public clouds, like Amazon, use multi-tenant data centers to quickly expand their services, especially in regions outside the U.S. In addition, with the emergence of hybrid cloud as the most popular option, many companies are housing the entirety of their private clouds in multi-tenant data centers, while large public cloud providers are forging partnership with multi-tenant data center providers to help tenants leverage public clouds to complement their private parts.

Today, the U.S. alone has over 1,400 large multi-tenant data centers, which consumed nearly as five times energy as Google-type data centers all combined (37.3% versus 7.8%, in percentage relative to all data center energy usage, excluding tiny server closets). Driven by the surging demand for web services, cloud computing, and Internet of Things, the multi-tenant data center industry is expected to continue its rapid growth. While the public attention mostly goes to large IT giants who continuously expand their data center infrastructure, multi-tenant data center providers are also building their own, even at a faster pace.

Despite their prominence, multi-tenant data centers have been much less studied than Google-type data centers by the research community. While, these two types of data centers share many of the same high-level goals, like energy efficiency, utilization, and renewable integration; many of the existing approaches proposed for Google-type data centers don’t apply to multi-tenant data centers, which have additional challenges due to the operator’s lack of control over tenants’ servers. Even worse, individual tenants manage their own servers with little coordination with others (in fact, tenants typically don’t even know whom they’re sharing the data center with).

As a concrete example, consider the problem we studied in our recent HPCA’16 paper. Keeping servers’ aggregate power usage always below the data center capacity is extremely important for ensuring data center uptime. When the aggregate power occasionally exceeds the capacity (called emergency, due to power oversubscription), a common technique used in Google-type data centers is to carefully lower the servers’ power consumption so as to meet multi-level power capacity constraints while minimizing the performance degradation. In a multi-tenant data center, the operator can’t do this –– tenants themselves control the servers. Further, who should reduce power and by how much must be carefully decided to minimize the performance loss, but these decisions require the operator to know tenants’ private information (e.g., what workloads are running, what’s the performance loss if lowering certain servers’ power usage). So, these challenges can’t be addressed by the existing technological approaches alone; instead, they require novel market designs along with new advances in data center architecture, with the goal of providing mechanisms that are “win-win” for both data center operators and tenants.

The above was one example of the added complexity due to the multi-tenant setting, but there are many others. We have a few papers on this topic (HPCA’16, HPCA’15, Performance’15, …), but will be looking into it much more in the coming years. We hope others do as well!