Source: Nested Learning — Behrouz et al., Google Research
One of the more technically dense papers I worked through this semester was Google Research’s Nested Learning — and it’s one of the ones I keep coming back to because it offers a genuinely new way of thinking about what large language models are doing when they learn.
The core idea: rather than thinking of a neural network as a single monolithic optimisation problem (minimise the loss, adjust the weights), Nested Learning proposes representing models as a set of nested, multi-level optimisation problems, each with its own “context flow.” Think of it as layers of learning that can influence each other — not just bottom-up signal propagation, but structured interdependence between levels.
The payoff of this framing is that it explains something that has always seemed almost magical: in-context learning. When you show a large model a few examples in the prompt and it suddenly generalises to new cases, why does that work? The standard explanation is vague. Nested Learning gives a more mechanistic account: in-context learning emerges as a natural consequence of higher-order optimisation loops that develop as models scale.
The practical implication, if this theory is right, is that there’s a meaningful path to higher-order in-context learning by designing architectures with more explicit “levels.” That’s a research direction worth watching.