# Solution to puzzle: produce or learn?

This post is a solution to the puzzle in the last post.

The optimal strategy has a very simple form: there is a time ${t^* \in \{1, \dots, T\}}$ such that (${^*}$ denotes optimal quantities)

• only learn (${l^*(t)=1}$) before time ${t^*}$;
• only produce (${p^*(t)=1}$) from time ${t^*}$ on.

The optimal strategy has the following interpretation. Let’s call ${Q(t) := P(t) + mL(t)}$ the quality at time ${t}$.  Then the optimal strategy is to learn at full speed to build up quality rapidly as long as the present quality ${Q(t)}$ is less than a “future capacity,” and then switch to produce at full speed when the present quality ${Q(t)}$ exceeds the future capacity.  In particular, except possibly at time ${t^*}$, it is never optimal to split time between learning and producing ${(l^*(t), p^*(t) \in (0, 1))}$. This simple structure is a consequence of the modeling assumption that learning ${l(t)}$ builds quality ${Q(t)}$ more rapidly than production ${p(t)}$ does (${m>1}$).

The derivation of the optimal strategy and its precisely interpretation is here.

The proof also determines explicitly the switching time ${t^*}$.   As ${m}$ changes from 1 to ∞,  the switching ${t^*}$ changes from 1 to (T + 1)/2.  Hence, as quality becomes more important (larger m), one starts to produce later in order to first build up to a higher quality (up to roughly half the horizon T ) before starting to produce.

Remark. The simplicity of the optimal strategy is striking. It matches our intuition that one should first focus on learning before switching to producing and that more and more people stay longer an longer in schools; see the examples at the beginning of the last post. In reality, however, we usually don’t completely stop learning after ${t^*}$. This can be because new knowledge or techniques that can boost our capability become available only after ${t^*}$, or because our capabilities decay over time if we don’t continually practice. These factors are not captured in our simple model. A way to model this is to include a decay term in ${Q(t)}$:

${Q(t) \ := \ P(t) + m L(t) - \alpha t}$

Finally, another factor that contributes to the clean separation of learning and production in the optimal strategy is that the model assumes they are independent.  In reality, production (games, performance, research and publication, etc.) often provides important incentives and contexts for learning and influences strongly the effectiveness of learning.