# Data, Privacy, and Markets

We’ve posted in the past about some of the work going on in our group related to privacy, and of course there are always lots of news articles popping up.  But, today I came across a recent animation by Jorge Cham of PhD comics fame that does a very nice job of summarizing one of the interesting directions these days — managing the interaction of personal data with data marketplaces.

Though I haven’t posted about it here yet, this is one of the new directions RSRG is moving in — how does one design a data marketplace that allows the transition from data as a commodity to data as a service?  We have already seen computing go from commodity to service with the emergence of cloud infrastructure providers like Amazon EC2 and Microsoft Azure, and I think it won’t be long until data makes the same transition.  But, in setting up these data marketplaces, how does one manage issues such as privacy?  and how does one place a value on pieces of data, which have many different uses?

In any case, enjoy the animation!

# I’m Adam Wierman and this is how I work

I’m a big fan of the “I’m XX and this is how I work” series on lifehacker. It’s quite interesting to see how successful folks manage their time / work-spaces / etc.  The contrasts are interesting, e.g., between Ira Glass (of this American Life) and  Nathan Blecharczyk (of AirBnB).

In any case, I figured that since I’m not getting asked by Lifehacker anytime soon, I’d do one of my own with a bit of an academic slant…

What are some apps/software/tools you can’t live without?

Probably the biggest no-brainer is Dropbox, which I use for everything (both work and personal).  It’s crucial for keeping projects synced with collaborators/students/TAs/etc.  Sometimes we run a SVN inside or use github to get tighter management of code, but often dropbox is enough.

For teaching, I’m a big fan of Piazza.  I really can’t imagine running a large class without it at this point.  Typically questions posted by students get responses within 10 minutes (either from TAs or other students) and then I have an archive of these questions that I can use to improve the course in future years.

For travel, my assistant got me using TripIt, and I’m a big fan now.  It keeps everything in one place and syncs it so I have all the details I need even when offline.  Saves a lot of time on the road…

For presentations, most people complain about powerpoint, but I think it’s a great tool.  Especially the newer versions are much improved.  As an academic, once you create a keyboard shortcut to start a tex equation in a text box (Alt-1 for me) then it’s smooth sailing (at least on Windows).

Of course email is a large part of my job… I use gmail for everything and am a big believer in Inbox Zero.  So, the only things I have in my inbox are the tasks I want to accomplish in a given day.  The key things I use to manage that are filters in gmail (it’s amazing how freeing it is to set up an “AcademicSpam” filter, mine is a little more complicated than the one suggested by PhD comics) and the boomerang plugin.  I just started using boomerang recently, but I already can’t imagine not being about to send messages at specific times and have them leave the inbox only to pop up new at a later, scheduled time, i.e., boomerang them.

Setting up meetings can be a pain, but is much easier with when2meet. I don’t know why anyone still uses doodle…

For writing, I use winedt with sumatrapdf (which integrates much more tightly than adobe, e.g., you can click on the pdf and winedt takes you to that part of your code).  And, for blogging, I use latex2wp to convert tex to wordpress format.

# Videos on universal laws and architectures

I’ve posted the beginnings of what I hope will become an extensive library of videos, papers, notes, and slides exploring in more detail both illustrative case studies and theoretical foundations for the universal laws and architectures that I superficially referred to in my previous blog posts.  For the moment, these are simply posted on dropbox, so be sure to download them, since looking at them in a browser may only give a preview…

I’m eager to get feedback on any aspects of the material, and all the sources are available for reuse.

In addition to the introductory and overview material, of particular interest might be a recent paper on heart rate variability, one of the most persistent mysteries in all of medicine and biology, which we resolve in a new but accessible way.  There are tutorial videos in addition to the paper for download.

# Solution to puzzle: produce or learn?

This post is a solution to the puzzle in the last post.

The optimal strategy has a very simple form: there is a time ${t^* \in \{1, \dots, T\}}$ such that (${^*}$ denotes optimal quantities)

• only learn (${l^*(t)=1}$) before time ${t^*}$;
• only produce (${p^*(t)=1}$) from time ${t^*}$ on.

# Another puzzle: produce or learn?

When our kids were small, they were in sports teams (basketballs, baseball, soccer, …).  Their teams would focus on drills early in the season, and tournaments late in the season.  In violin, one studies techniques (scales, etudes, theory, etc.) as well as musicality (interpretation, performance, etc).   In (engineering) research, we spend a lot of time learning the fundamentals (coursework, mathematical tools, analysis/systems/experimental skills, etc.) as well as solving problems in specific applications (research). What is the optimal allocation of one’s effort in these two kinds of activities?

This is a complex and domain-dependent problem.  I suppose there is a lot of serious empirical and modeling research done in social sciences (I’d appreciate pointers if you know any).  But let’s formulate a ridiculously simple model to make a fun puzzle.

1. Consider a finite horizon t = 1, 2, …, T.   The time period t can be a day or a year.  The horizon T can be a project duration or a career.
2. Suppose there are only two kinds of activities, and let’s call them production and learning.  Our task is to decide for each t, the amount of effort we devote to produce and to learn.  Call these amounts p(t) and l(t) respectively.
3. These activities build two kinds of capabilities.  The fundamental capability L(t) at time t depends on the amount of learning we have done up to time t-1, L(t) := L(l(s), s=1, …, t-1).  The production capability P(t) at time t depends on the amount of effort we have devoted to production up to time t-1, P(t) := P(p(s), s=1, …, t-1).   We assume the functions L(l(s), s=1, …, t-1) and P(p(s), s=1, …, t-1) are increasing and time invariant (i.e., they depend only on the amount of effort already devoted, but not on time t).
4. The value/output we create in each period t is proportional to the time p(t) we spend on production multiplied by our overall capability at time t.   Our overall capability is a weighted sum P(t) + mL(t) of fundamental and production capabilities, with m>1.

Goal: choose nonnegative (p(t), l(t), t=1, …, T) so as to maximize the total value ${\sum_{t=1}^T\ p(t) (P(t) + m L(t))}$ subject to ${p(t) + l(t) \leq 1}$ for all t=1, …, T.

The assumption m>1 means that the fundamentals (quality) are more important than mere quantity of production.  The constraint ${p(t) + l(t) \leq 1}$ says that in each period t, we only have a finite amount of energy (assume a total of 1 unit) that can be devoted to produce and learn.  On the one hand, we want to choose a large p(t) because it not only produces value, but also increases future production capabilities P(s), s=t+1, …, T.  On the other hand, since m>1, choosing a large l(t) increases our overall capability more rapidly, enhancing value.  What is the optimal tradeoff?

We pause to comment on our assumptions, some of which can be addressed without complicating our model too much.

Caveats.  On the outset, our model assumes every activity can be cleanly classified as building either the fundamental capability or the production capability.  In reality, many activities contribute to both.  Moreover, the interaction between these two activities is completely ignored, except that they sum to no more than 1 unit.  For example, production (games, performance, research and publication, etc) often provides important incentives and contexts for learning and influences strongly the effectiveness of learning, but our function L is independent of  p(s).  The time invariance assumption in 3 above implies that we retain our capabilities forever after they are built; in reality, we may lose some of them if we don’t continue to practice.  If we think of P(t)+mL(t) as a measure of quality, then our objective function assumes that there is always positive value in production, regardless of its quality.  In reality, production of poor quality may incur negative value, even fatal.

A puzzle

A simple puzzle is the special case where the capabilities depend on (are) the total amounts of effort devoted, i.e.,

${L(t)\ := \ \sum_{s=1}^{t-1} l(s), \ \ \ P(t) \ :=\ \sum_{s=1}^{t-1} p(t) }$

Despite its nonconvexity, the problem can be explicitly solved and the optimal strategy turns out to have a very simple structure.  I will explain the solution in the next post and discuss whether it agrees, to first order, with our intuition and how some of the disagreements can be traced back to our simplifying assumptions.

# A holiday puzzle: solution

I now discuss two solutions to the puzzle described in the last post — one for the special case of a linear grid, and the other for the general 2D grid.  I thank Johan Ugander and Shiva Navabi for very useful pointers (see the Comment in the last post, and a funny nerd snipe comic) — I will return to them below.  But first, here is a simple heuristic solution.

# A holiday puzzle

I am afraid of gifts, both receiving and giving.  Luckily, I have been largely spared having to confront this challenge.  I am often (rightly) criticized that in rare occasions when I give, the gifts are often what I like, not what the receivers would.  People say gifting is an art — no wonder I’m bad at it.   It is therefore a pleasant surprise that I received a holiday gift a few days ago, and it is a fun puzzle.

Consider an infinite grid where each branch (solid blue line segment) has a resistance of 1 ohm, as shown in the figure below.

What is the equivalent resistance between any pair of adjacent nodes?   In other words, take an arbitrary pair of adjacent nodes, labeled + and − in the figure, and apply an 1-volt voltage source to the pair (the dotted line connecting the voltage source to the grid is idealized and has zero resistance). Denote the current through the voltage source by I_0.  What is the value of the equivalent resistance R := 1/I_0?

Chances are such an interesting problem must have been solved.  But instead of researching on prior work and its history, why not have some fun with it.  We don’t have to worry about (nor claim any) credits or novelty with a holiday puzzle!

…. But I would appreciate any pointer to its history or solution methods if you do know.  Even a random guess of the answer will be welcome.

In the next post, I’d describe two methods: one is a simple symmetry argument for a special case, and the other a numerical solution for the general case.   Meanwhile, have fun and happy holidays!