Key Takeaways
- Context engineering is filling the context window with the right information for the task — not writing a better prompt
- It’s the layer below prompting: prompting tunes the question, context engineering decides everything the model already knows when it reads it
- It draws on four sources: the always-on system prompt / AGENTS.md, scoped rules and just-in-time retrieval, living documentation, and tool results & memory
- “Context rot” — recall degrading as the window fills with junk — is the failure mode it exists to prevent
- It is not RAG; RAG is one retrieval technique that lives inside context engineering, not the whole of it
An AI coding agent fails most often not because it can’t write the code, but because it doesn’t know your codebase when it starts. It reinvents a utility you already have, picks an error pattern you abandoned six months ago, queries the database in a route handler your team banned. The model is capable. The context is missing. Context engineering is the discipline of fixing that — deciding what the model knows before it writes a line.
This is a different job from prompt engineering, and a more durable one. A great prompt helps for one message. A well-engineered context helps every message, every session, every agent that touches the repo. If your AI tools produce inconsistent results, the lever is usually here, not in your phrasing.
If you’ve been vibe coding and getting code that looks right but ignores your conventions, this guide explains the layer you’re missing and how to build it.
What Is Context Engineering?
Context engineering is the practice of assembling everything an AI model sees before it generates output: the system instructions, the project rules, the relevant files, the retrieved data, the conversation history, and the results of any tools it has called. The model’s answer is only as good as that assembled context. Engineer the context well and a mid-tier model behaves like an expert on your codebase; engineer it badly and the best model on the market guesses.
The term has a precise origin. On June 25, 2025, Andrej Karpathy described it as “the delicate art and science of filling the context window with just the right information for the next step.” Two days earlier, on June 23, Shopify CEO Tobi Lütke had framed it as “the art of providing all the context for the task to be plausibly solvable by the LLM” (@tobi on X, June 23, 2025). Simon Willison collected both and argued the phrase captures production LLM work better than “prompt engineering” ever did (Willison, June 27, 2025).
Context engineering is the delicate art and science of filling the context window with just the right information for the next step. — Andrej Karpathy, June 25, 2025
Karpathy is a useful source here for a second reason: he also coined “vibe coding” on February 2, 2025 (Willison, March 19, 2025), a term that went on to be named Collins Dictionary’s Word of the Year for 2025 (Collins Dictionary, 2025). The same person who gave us the freewheeling way to build with AI also named the discipline that makes it hold up. Vibe coding describes the gas pedal; context engineering is the steering.
The key word in Karpathy’s definition is filling. Context engineering is not about writing cleverer instructions. It is about deciding what goes into a finite window: which rules, which files, which retrieved facts, which prior turns — and just as importantly, what to leave out.
Context Engineering vs Prompt Engineering
Prompt engineering and context engineering get conflated constantly, and the distinction is the whole point. Prompt engineering optimizes the question you ask. Context engineering optimizes everything the model already has loaded when it reads that question. One is a sentence; the other is a system.
Put concretely: prompt engineering is choosing to write “refactor this function and explain your reasoning step by step” instead of “fix this.” Context engineering is making sure that, before the model reads either prompt, it already knows your stack is Next.js with strict TypeScript, that business logic belongs in a service layer, and that every database query filters by tenant. The prompt is the same length either way. The output is not.
| Prompt engineering | Context engineering | |
|---|---|---|
| Scope | A single message | The whole context window |
| What it tunes | The wording of the request | What the model already knows |
| Durability | Per-prompt; you redo it each time | Persistent; set up once, applies to every task |
| Failure mode it fixes | A vague or ambiguous instruction | Code that ignores your conventions and reinvents patterns |
Neither replaces the other. A sharp prompt against an empty context still produces code that violates your architecture. A rich context with a lazy prompt still needs a clear ask. But for AI-assisted development specifically, context is the higher-leverage layer, because the conventions, architecture, and constraints it carries apply to every prompt you will ever write against that repo.
Why It Matters: Context Rot and Pattern Drift
Anthropic’s September 2025 guide, “Effective context engineering for AI agents,” names the central failure mode: context rot. As a context window fills, a model’s recall degrades; it starts losing track of facts that were stated earlier or buried under irrelevant tokens (Anthropic, September 2025). More context is not better context. A window stuffed with the entire repo performs worse than a lean window holding only what the task needs. This is the counterintuitive heart of the discipline, and it’s why “just paste everything in” is the wrong instinct.
The core counterintuition
Adding more to the context window can make the model dumber, not smarter. Context rot means recall degrades as the window fills with tokens the task doesn’t need. The goal is the right information, not the most.
The cost of getting this wrong shows up in code quality. Veracode’s 2025 GenAI Code Security Report found that 45% of AI-generated code failed security tests, introducing known vulnerability classes at a steady clip across models (Veracode, 2025). A large share of that is a context problem: without knowing your validation helpers, your auth boundaries, or your error conventions, the model reinvents them, and its reinvention is often weaker than what you already shipped. Multiply that across a codebase and you get pattern drift — the same job done five inconsistent ways because the model never saw the first four.
Birgitta Böckeler’s “context engineering for coding agents” on martinfowler.com is the practitioner reference for treating this seriously: she frames context as something you deliberately curate and feed, not something that accumulates by accident (Böckeler, martinfowler.com). The teams that ship reliable AI-generated code are the ones who decide, on purpose, what the agent knows.
The Four Sources of Context
A working context layer pulls from four distinct sources. Each answers a different question, and a good setup uses all four together.
1. The always-on system prompt / AGENTS.md
This is the layer that loads on every task: your project’s standing rules. In coding tools it usually lives in an AGENTS.md file (or CLAUDE.md, .cursorrules) that the agent reads before it does anything. Stack, architecture, conventions, the things you never want it to do. It is the smallest, most-read piece of context you own, which is exactly why it should be lean and human-written rather than a 2,000-line data dump.
2. Scoped rules and just-in-time retrieval
Not every rule belongs in the always-on layer. Detail that only matters when the agent touches a specific part of the codebase — the database layer, the billing flow, the email templates — should load only when that area is in play. This is just-in-time retrieval: pull the right rule at the right moment instead of carrying all of them all the time. It keeps the window lean and directly fights context rot.
3. Living documentation
Documentation that drifts from the code is worse than no documentation, because the agent trusts it. The third source is docs kept in sync with the codebase — regenerated or checked on commit — so that when the agent reads them, they describe what the system actually does, not what it did three sprints ago. Stale context is confidently wrong context.
4. Tool results and memory
The final source is what the agent pulls at runtime: the output of a database query, the contents of a file it read, a search result, or a note it wrote to memory in an earlier turn. This is dynamic context. Instead of you pasting the current schema into a prompt, the agent calls a tool and reads it fresh — which means it never goes stale, because it’s fetched on demand.
A Context Layer for a SaaS Codebase
Abstract principles are easy to nod at and hard to apply, so here is a concrete one. Below is a complete, copy-pasteable AGENTS.md for a fictional multi-tenant SaaS called TaskFlow. It is the always-on layer: short enough to load on every task, specific enough to change the output.
# AGENTS.md — TaskFlow
Read this before writing code. It is the source of truth for how this
codebase works. When in doubt, follow what's here over your defaults.
## Stack
- Next.js 16 (App Router), TypeScript (strict)
- PostgreSQL + Prisma — every table is scoped by `organizationId`
- Clerk (auth + organizations), Stripe (billing), Inngest (background jobs)
- Tailwind + shadcn/ui
## Architecture
- Business logic lives in `src/services/` — never in route handlers or components.
- API routes in `src/app/api/` do three things: validate input with Zod,
call a service, shape the response. No database calls in routes.
- `src/lib/repositories/` is the ONLY place allowed to import `prisma`.
- Every Prisma query filters by `organizationId`. A cross-tenant read is a
security bug, not a style nit.
## Conventions
- Roles are lowercase: 'owner' | 'admin' | 'member'.
- Money is integer cents. Never floats.
- Dates cross boundaries as UTC ISO strings; use `Date` objects internally.
- Throw typed errors from `src/lib/errors.ts`; the API layer maps them to
status codes.
## Testing
- Tests first. A new service function gets a `*.test.ts` beside it before
the implementation exists.
- Run `npm test` before you report a task done.
## Don't
- Don't add a dependency without checking `package.json` for one that already does it.
- Don't edit `prisma/schema.prisma` without creating a migration in the same change.
- Don't bypass the service layer "just this once". That file loads on every prompt. But the deep detail about how the database layer works does not belong there — it’s noise on the 90% of tasks that never touch Prisma. So it goes in a scoped rule that loads only when the agent works in prisma/ or src/lib/repositories/:
# database.md — loads when the agent touches prisma/ or src/lib/repositories/
- Every model has `organizationId String` and `@@index([organizationId])`.
- Filter by `organizationId` from the request context — never from user input.
- Soft-delete with `deletedAt DateTime?`. Never hard-delete tenant data.
- Migrations: `npx prisma migrate dev --name _`, commit the SQL. The split is the point. AGENTS.md is always present, so the agent always knows the shape of the system. The database rule is loaded just-in-time — only when the agent edits a model or a repository — so a task that styles a button never pays for tenant-isolation rules it doesn’t need. That is context engineering in miniature: the right information, at the right moment, and nothing more.
How to Build Your Own (Step by Step)
You can stand up a real context layer in an afternoon. The steps build on each other, and you should resist the urge to make any single piece exhaustive.
- Write a lean AGENTS.md. Stack, architecture rules, conventions, and an explicit list of “don’t”s. Keep it human-written and short; this is the file the agent reads most, so every irrelevant line is a tax on every task. For the full structure, see how to write an AGENTS.md for a SaaS.
- Split task-specific detail into scoped rules. Anything that only matters in one area of the codebase — database, billing, auth — becomes a separate rule that auto-loads by path. This is how you keep the always-on layer lean while still having deep guidance where it counts.
- Keep docs next to the code and regenerate on commit. Documentation the agent can read is only useful if it’s true. Wire doc generation or a sync check into your commit flow so the living docs never drift into lies.
- Give the agent tools to pull fresh state. Instead of pasting the current schema, a config value, or a query result into a prompt, let the agent call a tool and read it. Dynamic context never goes stale because it’s fetched on demand.
- Prune. More context is not better — context rot is real. Periodically cut what the agent doesn’t need from the always-on layer. The discipline is subtractive as much as additive.
One practical note on the file at the center of all this: AGENTS.md is becoming a cross-tool standard. Claude Code, Cursor, Windsurf, and Gemini CLI all read it (or a close equivalent), which means the context you write once travels with the repo regardless of which agent a teammate opens it in. That portability is a real reason to invest in getting it right.
Context Engineering vs RAG
The most common confusion about context engineering is that it’s just RAG with a new name. It isn’t. RAG — retrieval-augmented generation — is a specific technique: retrieve relevant chunks from a corpus at runtime and inject them into the context window. That is one of the four sources above (just-in-time retrieval), not the whole discipline. Context engineering also covers the system prompt, the scoped rules, the living docs, and the tool outputs that RAG never touches.
The practical question is when you actually need retrieval. You reach for RAG when the corpus is too large to fit in the window or changes too often to bake in — a knowledge base of thousands of support articles, say. You don’t need it when file paths plus a few tools plus a modern 1M-token window already let the agent reach everything it needs. For a lot of SaaS codebases, the simpler architecture wins: the agent reads the files it needs and calls tools for live state, no vector database required.
We make the full case for that simpler pattern — file-based memory, context rot, and why most agents skip the vector database — in our deep dive on whether you need RAG for your AI agent. The short version: RAG is a tool in the context-engineering kit, not a synonym for it.
Where It Fits: Context, Guardrails, and Specs
Context engineering does not work in isolation. It is one layer of a larger system that harness engineering calls the harness: the full apparatus around the model that makes its output reliable. Three layers, three jobs.
- Context is what the AI knows — the rules, docs, retrieved data, and memory covered in this guide.
- Guardrails are what’s enforced — the tests, type checks, and security scans that reject bad output regardless of intent.
- Specs are what to build — the contract a feature is coded against, before a prompt is written.
The three are complements, not alternatives. Context without guardrails means the agent knows your conventions but nothing stops it from breaking them. Guardrails without context means it gets corrected after the fact instead of guided before. And a spec without context produces code that does the right thing the wrong way. VibeReady’s structured vibe coding framework wires all three together so your first prompt runs inside a real harness instead of a vibe.
VibeReady ships context engineering out of the box — AGENTS.md, 14 scoped rules, and living docs that keep AI on your patterns. See editions from $149 →
Frequently Asked Questions
What is context engineering?
Context engineering is designing the information an AI model receives before it acts, so its output fits your project. It's the layer below prompting: prompting tunes the question, context engineering decides everything the model already knows when it reads that question.
Is context engineering the same as prompt engineering?
No. Prompt engineering tunes one message's wording. Context engineering manages everything the model already knows — system prompt, scoped rules, living docs, retrieved data, and memory — when it reads that message. Prompting optimizes the question; context engineering optimizes the surrounding state.
Is context engineering just RAG?
No. RAG (retrieval-augmented generation) is one retrieval technique inside context engineering. You also engineer the system prompt, scoped rules, living docs, and tool outputs. RAG fetches relevant chunks at runtime; context engineering is the whole discipline of deciding what fills the window.
What is AGENTS.md?
AGENTS.md is a plain-markdown file of project rules and conventions that AI coding tools read before touching your code. It's an emerging cross-tool standard read by Claude Code, Cursor, Windsurf, and Gemini CLI. See our guide on how to write one for a SaaS at vibeready.sh/blog/agents-md-for-saas/.
What is context rot?
Context rot is the failure mode where a model's recall degrades as its context window fills with irrelevant tokens. Anthropic named it in September 2025. Good context engineering keeps the window lean — loading only what the task needs — so the model stays accurate as the session grows.
Have more questions? See our full FAQ →