Source: AgentArk — Luo, Jin et al., CMU/Amazon/Georgia Tech, arXiv 2602.03955
The multi-agent systems dilemma is one I find genuinely interesting as a practical problem. Multi-agent systems — where multiple AI models debate, critique each other, and converge on answers — produce significantly better reasoning on complex tasks than any single model. The problem: they’re dramatically more expensive. Computation grows roughly quadratically with the number of agents, latency is high, and errors can propagate and amplify across the network.
AgentArk proposes an elegant solution: instead of running multi-agent systems at inference time, distil their reasoning dynamics into a single model’s weights at training time. Three distillation strategies are proposed: reasoning-enhanced fine-tuning, trajectory-based augmentation, and process-aware distillation.
The results are meaningful: distilled models preserve the efficiency of a single agent while showing reasoning and self-correction performance closer to multi-agent systems. They also show improved robustness and generalisation across diverse reasoning tasks.
The practical implication: if you need strong reasoning in a latency-constrained environment, AgentArk-style distillation is a more realistic path than deploying live multi-agent systems. The intelligence of the crowd, baked in at training time rather than reconstructed at every inference.