Blog

Engineering notes on AI agents, automation, and the infrastructure behind them.

Build the Harness Once With Your Best Model. Run It on a Cheap One.

Build the Harness Once With Your Best Model. Run It on a Cheap One.

June 3, 2026 · 4 min read
Agents forget and good ones cost. The fix is not a better model. Put the goal in deterministic scripts and run a cheap model against them.
Most of Your AI Skills Will Rot. Here's Which Ones Compound.

Most of Your AI Skills Will Rot. Here's Which Ones Compound.

June 3, 2026 · 4 min read
A skill's lifespan is set by what it couples to, not how good the prompt is. Why most AI skills rot, which parts compound, and how to tell.
Claude Code Stops Following Your CLAUDE.md: Read-Once Rules and Hooks

Claude Code Stops Following Your CLAUDE.md: Read-Once Rules and Hooks

June 2, 2026 · 4 min read
Claude Code reads your CLAUDE.md once at startup, so rules decay as the session fills up. Move the ones that must never break into hooks.
Claude Opus 4.8 Is Out. The Number I Care About Isn't on the Benchmark Chart.

Claude Opus 4.8 Is Out. The Number I Care About Isn't on the Benchmark Chart.

May 29, 2026 · 3 min read
Opus 4.8 shipped May 28. For unattended cron agents, the upgrades that matter are not the benchmark scores. A use-case breakdown from real builds.
Your 50th Skill Makes the First 49 Less Reliable

Your 50th Skill Makes the First 49 Less Reliable

May 27, 2026 · 4 min read
Past a token-budget threshold, each new skill silently lowers reliability of the rest. Where the work actually lives is below the skill layer.
Self-Hosted Voice AI: Why GDPR Is the Wrong Test (NIS2 Is the Real One)

Self-Hosted Voice AI: Why GDPR Is the Wrong Test (NIS2 Is the Real One)

May 21, 2026 · 4 min read
A GDPR tick isn't a NIS2 test. What you really need to verify with hosted voice AI vendors before NIS2 puts the board on the hook personally.
Splitting Grounding from Reasoning in Browser-Agent Stacks

Splitting Grounding from Reasoning in Browser-Agent Stacks

May 19, 2026 · 4 min read
Browser-agent stacks bundle grounding and reasoning. A local 2B parser splits them, beats GPT-4o on ScreenSpot-v2 by 2.5x, costs $4 to train.
Context Engineering Is Just File Naming

Context Engineering Is Just File Naming

May 12, 2026 · 4 min read
Context engineering sounds new. It is the file-naming hygiene developers always had, load-bearing now because LLMs read what you point them at.
Your AI Workflow Doesn't Need Better Prompts. It Needs Less AI.

Your AI Workflow Doesn't Need Better Prompts. It Needs Less AI.

May 5, 2026 · 9 min read
Prompting is discovery. Skills are repetition. Gates are how AI workflows become reliable.
Agentic Knowledge Base — Karpathy's LLM wiki, with adapters

Agentic Knowledge Base — Karpathy's LLM wiki, with adapters

May 2, 2026 · 8 min read
A framework that turns whatever task or note app you use into a Karpathy-style LLM wiki. Pluggable adapters, parallel retrieval with RRF.
What Anthropic's April 23 Postmortem Reveals About Your Agent Harness

What Anthropic's April 23 Postmortem Reveals About Your Agent Harness

April 30, 2026 · 3 min read
Three bugs over two months, one usage-limit reset for every Pro subscriber. The postmortem reads like a free audit checklist for any production agent harness.
Voice AI in Production: From RunPod to Hosted Kubernetes

Voice AI in Production: From RunPod to Hosted Kubernetes

April 23, 2026 · 4 min read
One pod serves one user at a time. Production serves thousands. Here's what that gap actually costs, and why voice AI companies keep asking for hosted Kubernetes.