Research Journey

Notes, findings, and honest reflections from my DBAI at PolyU Hong Kong.

Research
The missing layer — why AI agents need an operating system
The agent harness is the layer between the raw model and the application — managing memory, tools, and context over time. Without it, even the best models fail on long tasks.
Research
The deployment dilemma — choosing between speed and intelligence in AI agents
Fast and cheap means single agents. Smart and reliable means multi-agent systems. The cost and latency gap between them is the central engineering challenge of AI deployment in 2026.
Research
Can you distill multi-agent intelligence into a single model? AgentArk says yes.
Multi-agent systems reason better but cost exponentially more. AgentArk proposes a third path — bake the multi-agent reasoning into a single model's weights during training.
Research
We finally have a rigorous definition of AGI. Here's what it reveals.
GPT-4 scores 27%. GPT-5 scores 57%. Progress is real — but the gaps it exposes are just as interesting as the gains.
Research
GPT-4.5 was judged more human than actual humans. And most researchers still said AGI wasn't close.
In March 2025, a model passed the Turing Test. 76% of researchers still believed scaling wouldn't get us to AGI. The cognitive dissonance is remarkable.
Research
The coordination curse — why AI teams get worse, not better
Human collaboration is 1+1=3. AI agent collaboration is currently 1+1 is less than 1. Understanding why is more important than pretending the problem doesn't exist.
Research
The bottleneck theory of AI — why every breakthrough follows the same pattern
Every major AI leap responds to a specific constraint that got unlocked. Understanding the pattern helps you see what's coming next.
Research
A complete history of AI — from logic machines to the GPT revolution
Understanding where AI has been is the best way to reason about where it's going. The cycles of enthusiasm, winter, and revival follow a surprisingly consistent pattern.
Research
A 397 billion parameter model on a consumer laptop — the memory wall has been broken
By treating SSD flash as extended DRAM and combining it with autonomous self-improving pipelines, researchers ran a frontier-scale model on consumer hardware at 5.5 tokens per second.
Research
Two failure modes that break every AI agent — and how to fix them
Context Anxiety and Self-Evaluation Leniency appear so consistently across AI agent deployments that they deserve names. The fix requires separating Generator and Evaluator agents.
Research
Anthropic built their most powerful AI — then decided not to release it
Claude Mythos Preview showed such a capability leap that Anthropic quietly decided the world wasn't ready. This is what responsible AI governance actually looks like in practice.
Research
Context engineering — the skill that actually makes AI work
Most AI failures aren't model failures. They're context failures — the information supply chain is broken. Here's the three-part framework that changes how you build with AI.
Research
I thought more AI agents meant better results. The research says otherwise.
CooperBench ran 600+ collaborative AI coding tasks. State-of-the-art agents working together performed 30% worse than working alone. The curse of coordination is real.
Research
DeepMind's AGI bet — what it means to go all-in on general intelligence
How should a company balance exploiting current AI models versus investing in the paradigms that might actually reach AGI? DeepMind's strategy is instructive.
Research
The architecture of 1-million-token AI — a DeepSeek V4 deep dive
V4 isn't just incrementally better than V3. It's a ground-up reconstruction aimed at one specific goal — making 1-million-token context commercially viable.
Research
DeepSeek V4 — 1 million tokens at 10% of the cost
V4 supports a 1-million-token context window with dramatically less memory than its predecessor. The architectural innovations behind this efficiency are worth understanding.
Research
March 7, 2026 — the day a virtual fly changed the conversation
Eon Systems released a 60-second video of a virtual fruit fly navigating, foraging, and grooming. No code. No scripts. No training. A biological brain driving a digital shell.
Research
Zero-shot wetware — what happens when you boot a biological brain in silicon
Map the exact synaptic topology. Run electricity through it. Watch behaviour emerge — with no training data, no scripts, no reinforcement learning. This is a genuinely new paradigm.
Research
The first complete brain simulation — 125,000 neurons, 50 million synapses
A leaky integrate-and-fire model of the entire Drosophila brain predicts which neurons drive feeding and grooming — then validates the predictions experimentally.
Research
DeepSeek's Engram — the architecture that changes how LLMs remember
Current LLMs waste early-layer depth computing things that should just be looked up. Engram fixes this — and the performance gains are bigger than expected.
Research
The Engram Paradigm — why current LLMs waste their own intelligence
Language has two kinds of content. Forcing neural networks to handle both with the same mechanism is inefficient. The Engram Paradigm makes this case more clearly than any paper I've read.
Research
Google's prompting framework — from a single instruction to autonomous agents
Google's distilled prompting masterclass reveals a clear progression that mirrors the shift from chatbot to agentic AI. The same core principles apply at every level.
Research
Are we at the intelligence explosion? What the inflection point actually means
The intelligence explosion hypothesis has moved from theoretical concern to active research programme. Here's what it actually means and why 2026 feels different.
Research
Intelligence is not a volume knob — a 10-tier framework for thinking about AI
We treat intelligence as a single dimension from 0 to 100. The research suggests it's a nested hierarchy — and conflating the levels is making our AI debates useless.
Research
Apple's breakthrough — running large language models 20x faster from flash memory
The memory wall — the gap between model size and available DRAM — has been a hard constraint on AI deployment. Apple researchers found a way around it.
Research
Nested Learning — a new theory of how AI actually learns
Google Research proposes that deep learning is really a sequence of nested optimisation problems. It reframes in-context learning in a way that actually makes intuitive sense.
Research
What a fruit fly's brain taught researchers about building better AI
NeuroMechFly v2 simulates a complete Drosophila nervous system and uses it to study how brains and bodies work together. The implications for AI are deeper than they first appear.
Research
OpenClaw — the autonomous AI operating system that goes beyond chatbots
While everyone builds chatbots, OpenClaw is building the architecture for AI that wakes up, checks your channels, manages your pipelines, and acts — without being asked.
Research
Why the future of AI compute might be in space
Terrestrial data centres are hitting thermal, water, and energy ceilings. Space offers infinite solar power, near-absolute-zero cooling, and vacuum-speed laser communication.
Research
Embodied AI's GPT-3 moment is coming. Open source will be there first.
OpenVLA, a 7B open-source model, beat a 55B closed-source model by 16.5% across 29 robotic tasks. The ecosystem war for embodied AI is already underway.
Research
Strategic chaos — how adversarial actors could destabilise AI agent ecosystems
The most dangerous AI agent threat isn't the dramatic failure that crashes a system. It's the subtle failure that looks correct while systematically producing harm.
Research
System 2 AI — we've moved from prediction to deliberation
The shift from next-token prediction to test-time search and verification is as significant as the shift from rule-based AI to neural networks.
Research
Why LLMs are stuck — the case for world models
The Moravec Paradox explains why AI can pass the bar exam but fail at basic physical reasoning. The fix requires moving beyond statistical text correlation to grounded world models.