Data Markets in the Cloud

Over the last year, while I haven’t been blogging, one of the new directions that we’ve started to look at in RSRG is “data markets”.

“Data Markets” is one of those phrases that means lots of different things to lots of different people.  At its simplest, the idea is that data is a commodity these days — data is bought and sold constantly. The challenge is that we don’t actually understand too much about data as an economic good.  In fact, it’s a very strange economic good and traditional economic theory doesn’t apply…

  • Data has zero marginal cost.  Once you have it, you can sell it as many times as you want without additional cost.  This it’s a so-called “digital good”.
  • Data has strange complementarities and externalities.  The value of a piece of data depends on who else knows it and what other data the buyer has.
  • Data is extremely hard to value.  How do you value every possible query or combination of data?
  • Once data is sold, it can’t be “used up” — selling creates competitors!
  • Releasing data can have unexpected social consequences.  Leakage can have huge privacy consequences!

We have more questions than we have answers at this point, but our vision, which I outlined recently in a talk at an MSR workshop on “System design for cloud services”(see the 15min mark of this video), is that within 5-10 years data will move from a commodity to a service in the cloud.  This transition is similar to what happened as computing infrastructure moved from a commodity to a service rented from the cloud.

There’s lots of challenges before cloud data markets can become a reality though.  All the things that make data a strange economic good, make it extremely hard to price and value in an automated way.  The transition has already started via companies like Factual, which seeks to be a clearinghouse for data.  But, there’s a lot of research that needs to be done before we can truly have automated cloud markets for data…


