Tagged: Llm

12 posts

Your AI Agent Makes Four Bad Decisions a Smarter Model Won't Fix

June 25, 2026 · 3 min read · blog

AI agents fail decisions for the same four reasons people do. The WRAP framework, encoded as a system-prompt gate, fixes what a bigger model can't.

AI Agent Best Practices: Trust Your Own Results Before Google

June 16, 2026 · 3 min read · blog

Your AI agent reaches for googled best practices before your own proven fixes. Wire a trust order into your CLAUDE.md and agent loop instead.

Build the Harness Once With Your Best Model. Run It on a Cheap One.

June 3, 2026 · 4 min read · blog

Agents forget and good ones cost. The fix is not a better model. Put the goal in deterministic scripts and run a cheap model against them.

Claude Opus 4.8 Is Out. The Number I Care About Isn't on the Benchmark Chart.

May 29, 2026 · 3 min read · blog

Opus 4.8 shipped May 28. For unattended cron agents, the upgrades that matter are not the benchmark scores. A use-case breakdown from real builds.

Context Engineering Is Just File Naming

May 12, 2026 · 4 min read · blog

Context engineering sounds new. It is the file-naming hygiene developers always had, load-bearing now because LLMs read what you point them at.

Enterprise AI PII Redaction System for Sensitive Documents

April 21, 2026 · 5 min read · case-studies

Self-hosted German PII redactor for SAP prod→dev copies. Plugs in after TDMS/Delphix/Informatica to cover free-text NOTES columns, unclassified Z-tables, and OCR'd attachments. DSGVO-konform, Apache 2.0, runs on a single consumer GPU.

How to Choose an LLM for Production: 7 Criteria That Matter

April 17, 2026 · 13 min read · guides

How to choose an LLM for production workloads. 7 selection criteria, a decision tree, an evaluation process, and a requirements checklist from real deployments. Download the free AI Automation Checklist.

Self-Hosted LLM vs API Cost: Break-Even Analysis (2026)

April 16, 2026 · 15 min read · guides

Self-hosted LLM vs API cost analysis with break-even math. When to self-host, when to stay on Claude, and the hybrid pattern most production teams actually use. Download the free AI Automation Checklist.

LLM API Comparison 2026: Claude, OpenAI, Gemini for Production

April 15, 2026 · 18 min read · guides

Feature matrix, pricing, reliability and EU hosting across major LLM APIs. Where Anthropic, OpenAI and Google win, and what to pick for production.

LLM API Cost Comparison 2026: Framework, Not a Stale Table

April 11, 2026 · 12 min read · guides

LLM API cost comparison for 2026. Model your real workload costs with prompt caching, output tokens, reasoning, and batch API factored in. Download the free AI Automation Checklist.

Self-Hosted LLM on Kubernetes: A Production vLLM Deployment

April 5, 2026 · 16 min read · blog

Complete self-hosted LLM Kubernetes guide. Deploy vLLM on GPU nodes with manifests, HPA, monitoring, and cost modeling. Practitioner notes included. Download the free AI Automation Checklist.

RAG Pipeline Tutorial: Build a Production Document Q&A System with Qdrant and Claude

April 1, 2026 · 16 min read · blog

End-to-end RAG pipeline tutorial. Qdrant + Claude Sonnet 4.6 + local embeddings. Real code for chunking, retrieval, augmentation, and citation-grounded answers. Download the free AI Automation Checklist.