Key Takeaways
- AGENTS.md is an open Markdown file that tells AI coding agents how your project works. OpenAI released it in August 2025; by December, 60,000+ projects used it and it moved to the Linux Foundation
- Next.js 16.2 generates an AGENTS.md in create-next-app by default as of March 2026, so the convention now ships with the framework
- The research is split: a human-written context file raised agent success about 4 points in an ETH Zurich study, while an LLM-generated one lowered it about 3 and added 20%+ cost. Write it by hand
- For a SaaS, the highest-value lines are what an agent can't infer: exact commands, your multi-tenant data-access rule, and a three-tier list of what to never touch
- AGENTS.md is the cheapest reliability layer, not the whole system. Pair it with guardrails and specs — the rest of a harness
Your AI coding agent doesn't read your README on its own. It won't open your wiki, your Notion, or the architecture doc you wrote last quarter unless you point it there. But most coding agents now look for one file automatically at the start of a session, and read it before they touch your code: AGENTS.md.
That file is plain Markdown, it sits at the root of your repository, and in under a year it went from one vendor's convention to a cross-industry standard. OpenAI published the AGENTS.md format in August 2025; by December it was used by more than 60,000 open-source projects and had been donated to the Linux Foundation (Linux Foundation, December 2025). For SaaS teams especially, it's the cheapest reliability upgrade you can make to an AI agent stack, and the easiest to get wrong.
What AGENTS.md Actually Is
AGENTS.md is an open format for telling AI coding agents how to work in your project. The canonical spec calls it “a README for agents”: a single, predictable place to put the build commands, conventions, and constraints an agent needs, so it doesn't have to guess (agents.md). A README is written for the human joining your team. AGENTS.md is written for the agent joining your repo, and unlike the README, the agent actually loads it.
The format caught on because it's boring in the right way. It's Markdown, it has no required schema, and it lives where tools already look. OpenAI released it in August 2025, and within months the major coding agents read it: Cursor, GitHub Copilot, Codex, Devin, Gemini CLI, Jules, and VS Code among them (Linux Foundation, December 2025). That December the format was donated, alongside Anthropic's Model Context Protocol, to the Linux Foundation's new Agentic AI Foundation, with AWS, Google, Microsoft, and OpenAI backing it. A format governed by a neutral foundation is a safer thing to standardize a team on than one company's house style.
The one file your agent loads by default
The reason AGENTS.md matters more than any doc you've written is discovery. An agent won't read context it doesn't know exists. Vercel tested this directly: in its own Next.js evals, giving the agent bundled docs through AGENTS.md produced a 100% pass rate, versus 53% with no docs and 79% when the same docs were available but the agent had to choose to fetch them (Vercel, January 2026). Always-available context beat on-demand retrieval, because agents often don't recognize when they should go looking. That's Vercel's eval on its own framework, so read it as a strong product signal rather than a universal benchmark.
One wrinkle is worth knowing: not every tool reads the same filename. Claude Code looks for CLAUDE.md, and Gemini CLI has historically looked for GEMINI.md. The common fix is to keep one real file, AGENTS.md, and have the others point to it. When create-next-app scaffolds a project, it does exactly that: it writes an AGENTS.md plus a CLAUDE.md that imports it with @AGENTS.md (Next.js, March 2026).
Why SaaS Codebases Need It More Than Most
Every AI coding agent hits the same wall, and it isn't intelligence. Anthropic's 2026 Agentic Coding Trends Report calls it the delegation gap: developers now use AI for around 60% of their work but can fully hand off only 0–20% of tasks (Anthropic, 2026). What stands between “the agent helped” and “the agent did it” is context: the agent knowing your codebase, your constraints, and your failure modes well enough to finish without you.
SaaS raises the stakes on all three. A SaaS codebase is multi-tenant, so almost every database query needs a tenant filter, and a single missing one leaks one customer's data to another. It has authentication and authorization rules that aren't visible in the function an agent happens to be editing. It has billing and webhook code where a careless retry double-charges someone. And it's large, so an agent that infers a convention wrong repeats that mistake across dozens of files before anyone notices.
None of that lives in the code in a form an agent can reliably read. An agent that doesn't know your tenancy rule will write a query that returns every tenant's rows and still pass its own tests, because the test database has one tenant. The security data says this isn't hypothetical: across 3,500 security-relevant samples in an April 2026 arXiv study, 55.8% of AI-generated code contained at least one vulnerability, and asking explicitly for secure code cut that by only about four points (arXiv, April 2026). AGENTS.md is where you write the rules that keep an agent on the safe side of those numbers.
What the Research Actually Says
The honest version of the AGENTS.md pitch has two halves: the efficiency win is real and measured, and the quality win is small and depends entirely on who writes the file.
The efficiency case (proven)
A January 2026 arXiv study measured what happens when you add an AGENTS.md to a repo an agent is working in. The presence of the file cut the agent's median run time by about 29% and its output tokens by about 17%, both statistically significant (arXiv, January 2026). The agent spent less time exploring because the file told it where things were and how to run them. That study measured efficiency, not correctness, so the clean takeaway is that a good AGENTS.md makes agents faster and cheaper, which on its own pays for the half hour it takes to write.
The success-rate case (modest, and it depends who writes it)
Correctness is where the nuance lives. A February 2026 study from ETH Zurich tested developer-written versus LLM-generated context files across four coding agents. Human-written files raised task success by about 4 percentage points on average and beat the auto-generated version for every agent tested. The LLM-generated files did the opposite: they lowered success by roughly 3 points and added more than 20% to inference cost (arXiv, February 2026).
Two things follow. First, write the file yourself; the one task you should not delegate to the agent is the document that tells the agent what to do. Second, a great AGENTS.md buys single-digit gains in correctness, not a transformation. It's worth doing, and it isn't magic. Treat it as the floor of a reliable workflow, not the ceiling.
What Goes in a SaaS AGENTS.md
The best AGENTS.md files share a shape. GitHub's engineering team analyzed more than 2,500 agents.md files across public repositories and found the patterns that work: put the executable commands first, prefer short snippets of real code over prose descriptions, and start minimal, adding lines only when an agent trips on something (GitHub, November 2025). Six areas are worth covering.
The six core sections
- Commands: the exact lines to install, run, test, lint, and typecheck, with the flags you actually use.
- Project structure: a short map of where things live and what each area is for.
- Code style: the conventions you'd otherwise correct in review, shown as a snippet rather than a paragraph.
- Testing: how to run tests, what to mock, and what “done” requires.
- Git workflow: branch names, commit format, and what not to commit.
- Boundaries: what the agent may do freely, what it must ask about, and what it must never touch.
The SaaS-specific additions
Those six get you a file that works for any project. A SaaS needs a few more, and they're the rules that are invisible in any single file but true everywhere. State your tenancy model and the one helper every query must go through, so the agent never writes an unscoped query. State that authorization is checked server-side on every route, and that ownership is verified before a record is returned. Name your data-access pattern, the ORM or the row-level-security policy, and forbid raw SQL that bypasses it. Mark the billing and webhook code as ask-first territory, since idempotency bugs there cost real money. And draw a hard line around secrets and environment variables, because an agent that hardcodes an API key to make a test pass has just shipped a credential to your repo.
A Complete SaaS AGENTS.md
Here's a full AGENTS.md for a multi-tenant Next.js SaaS, short enough that an agent reads it on every run and specific enough to change what the agent does. Copy it, then swap the stack names, paths, and helper names for your own.
# AGENTS.md
## Commands
- Install: `pnpm install`
- Dev: `pnpm dev` (http://localhost:3000)
- Test: `pnpm test` (Vitest)
- Single test: `pnpm test path/to/file.test.ts`
- Typecheck: `pnpm typecheck` (tsc --noEmit, strict)
- Lint: `pnpm lint`
- Migrate DB: `pnpm db:generate && pnpm db:migrate` (Drizzle)
Before a task is done, all three must pass:
pnpm typecheck && pnpm test && pnpm lint
## Stack
- Next.js 16 (App Router, Server Components, Server Actions)
- TypeScript, strict mode. No `any`, no `@ts-ignore`.
- Postgres via Drizzle ORM
- Auth: Better Auth (`lib/auth`)
- Payments: Stripe (`lib/billing`)
- UI: Tailwind + shadcn/ui (`components/ui`)
## Architecture
- Multi-tenant: every row is scoped to an `organizationId`.
- Never touch the database directly. Use `withTenant(orgId)` from
`lib/db`, which adds the tenant filter and enforces row-level security.
- Data access lives in `lib/queries/*` (one file per entity).
UI and components never import `db`.
- Server Actions live in `app/**/actions.ts` and call
`requireSession()` before anything else.
## Conventions
- Validate input with Zod at every boundary (action, route handler).
- Throw `AppError` from `lib/errors` for expected failures;
never leak a raw database error to the client.
- Files kebab-case, components PascalCase, hooks `useThing`.
## Testing
- Vitest + Testing Library. Co-locate tests as `*.test.ts(x)`.
- Mock the database with the in-memory adapter in `test/db.ts`.
- Every Server Action needs a test for the logged-out and
wrong-tenant cases.
## Git
- Branches: `feat/`, `fix/`, `chore/` prefixes; Conventional Commits.
- Never commit `.env`, secrets, or files under `drizzle/`.
## Boundaries
- Always: call `requireSession()` and verify `organizationId`
ownership before returning a record.
- Ask first: anything in `lib/billing/`, the Stripe webhook at
`app/api/webhooks/stripe/`, or a database migration.
- Never: write raw SQL, hardcode a secret, disable RLS, or use
`any` to silence a type error. Use `lib/db` and the typed
query builders instead.
Three things make that file work, and they're worth copying even if your stack is different. The Commands block comes first, because the single most common agent failure is running the wrong one. The Architecture section states the one rule a SaaS can't survive an agent breaking: every query is tenant-scoped through withTenant(). And every entry under Boundaries pairs a prohibition with the alternative, so a blocked agent has a road to take instead of improvising. Notice what the file leaves out. It doesn't explain what Next.js is, redraw the folder tree the agent can already read, or document Drizzle's API. Every line is something the agent would otherwise get wrong.
How to write your own, in four passes
You don't write a file like that in one sitting. Build it the way you'd harden any system, one pass at a time, each pass triggered by a real failure.
- Commands first, nothing else. Write the exact lines to install, run, test, typecheck, and lint, with the flags you actually use. Ship the file with only this, and you've already removed the most common source of wasted agent effort: guessing at your toolchain.
- Add the one rule you can't afford to have broken. For most SaaS that's tenancy: every query scoped to the organization, through a single named helper. If the agent learns one architectural fact from your file, make it this one.
- Turn your last three incidents into boundaries. Open your recent bugs or postmortems. Each becomes a “never” paired with an alternative: “we shipped a query with no tenant filter” becomes “Never query outside
withTenant().” The file earns its keep by encoding the mistakes you've already paid for. - Watch an agent work, then patch. Give it a real task and watch where it goes wrong: a convention it missed, a check it skipped, a utility it reinvented. Each miss is one new line. This is the loop Mitchell Hashimoto calls engineering the harness — every time the agent makes a mistake, you add the constraint so it never makes that one again. Stop when new tasks stop surfacing new rules.
Keep the whole thing under a page. The best files start minimal and grow only when an agent trips on something, not before. Treat it like code: the pull request that changes a convention updates the file in the same commit. Our AI agent starter kit ships a SaaS AGENTS.md built exactly this way, wired to the tenancy, auth, and billing rules in the example above.
Boundaries: The Highest-Leverage Section
If you write only one section well, write Boundaries. It does the most to keep an agent out of trouble, and it's the part most files skip. The structure that works is three tiers: what the agent may always do, what it must ask about first, and what it must never do.
The detail that separates a useful boundary from a decorative one is pairing every prohibition with an alternative. “Don't write raw SQL” leaves a capable agent stuck, so it improvises. “Don't write raw SQL; use the query builder in lib/db, which adds the tenant scope” gives it the road you want it on. A warning tells the agent where the cliff is; a warning plus an alternative tells it where the bridge is.
For a SaaS, the never tier is where your incident postmortems become rules. Never return a record without checking tenant ownership. Never disable row-level security to make a test pass. Never call an external billing API outside the idempotent wrapper. Each line usually traces back to a real bug, which is the point: the boundary is how you keep the agent from reintroducing it next week.
AGENTS.md vs CLAUDE.md vs Cursor Rules
AGENTS.md is the standard, but it isn't the only file in the room. Claude Code reads CLAUDE.md, Cursor has its own rules under .cursor/rules, and Gemini CLI has historically read GEMINI.md. The “one file for every agent” promise is the direction the ecosystem is moving, not where it has fully arrived.
| File | Read by | What to do with it |
|---|---|---|
| AGENTS.md | Cursor, Copilot, Codex, Gemini CLI, VS Code, Devin, and more | Make this the one source of truth |
| CLAUDE.md | Claude Code | One line that imports it: @AGENTS.md |
| .cursor/rules | Cursor (in addition to AGENTS.md) | Keep thin, or skip if AGENTS.md covers it |
The practical rule is one source of truth plus thin pointers. Put everything in AGENTS.md, and where a tool insists on its own filename, make that file a one-line import rather than a second copy you'll forget to update. Two instruction files that disagree are worse than one, because the agent follows whichever it happens to read, and you won't know which.
Common Mistakes
Most broken AGENTS.md files fail in one of four ways, and each has a clean fix.
The first is auto-generating it. Pointing the agent at your repo and asking it to write its own AGENTS.md feels efficient and measurably backfires: the ETH Zurich study found LLM-generated context files lowered success and raised cost, while human-written ones helped (arXiv, February 2026). The file is your judgment about what matters, which is the one thing the model doesn't have.
The second is bloat. A long file feels thorough, but every line competes for the agent's attention and adds to the token bill, and the same ETH study put the cost of oversized generated files at more than 20%. Keep it short enough that an agent reads the whole thing on every run, and cut any line that isn't earning its place.
The third is restating what the code already says. The agent can see your folder structure and read your imports; describing them back wastes space that should go to the things it can't see. Spend your lines on commands, conventions, and boundaries, not on a tour of the repo.
The fourth is letting it rot. An AGENTS.md that describes last quarter's architecture is worse than none, because the agent will trust it anyway. Treat the file like code: update it in the same pull request that changes the convention it documents, and delete rules that no longer apply.
AGENTS.md Is One Layer, Not the Whole System
AGENTS.md has a ceiling. A well-written file bought about four points of success in the ETH study, and that's the high end. The file makes an agent faster, cheaper, and better-behaved, but it can't test the code, can't catch the vulnerability it didn't think to warn about, and can't verify the agent actually did what you asked.
That's why AGENTS.md is the first layer of a larger system, not the system itself. The instructions file is a guide: it shapes what the agent does before it starts. You pair it with sensors that check the work after, the automated tests, strict type checking, and security scanning that run no matter what the file said. And you pair both with a spec for the feature, so the agent starts from a clear target instead of a vague prompt. We make the full case for designing that whole environment in harness engineering, and for the spec half in spec-driven development.
Put together, those layers are what we call structured vibe coding: context, guardrails, and process working as one harness instead of three good intentions. AGENTS.md is where it starts, because it's the cheapest layer to add and the one every other layer assumes is already there.
VibeReady ships a production AGENTS.md wired into the rest of the framework: the quality gates, the 10 subagents, and the SaaS architecture it describes, so your agents start with the context this post argues for. See how the framework fits together →
Frequently Asked Questions
What is AGENTS.md?
AGENTS.md is an open Markdown file at the root of a repository that tells AI coding agents how your project works: the commands to run, the conventions to follow, and the boundaries to respect. OpenAI released the format in August 2025, and it's now governed by the Linux Foundation.
Does Claude Code read AGENTS.md?
Claude Code reads CLAUDE.md, not AGENTS.md, by default. The standard fix is to keep one AGENTS.md as the source of truth and make CLAUDE.md a single line that imports it (@AGENTS.md), which is exactly what create-next-app now generates.
Should I auto-generate my AGENTS.md?
No. A 2026 ETH Zurich study found LLM-generated context files lowered agent success by about 3 points and added over 20% to cost, while human-written files raised success by about 4. Write it yourself and keep it short.
How long should an AGENTS.md be?
Short enough that an agent reads the whole file on every run. Include only what an agent can't infer from the code: commands, conventions, and boundaries. Long, redundant files cost tokens and split the agent's attention without improving results.
What's the difference between AGENTS.md and README?
A README is written for humans and isn't loaded by agents automatically. AGENTS.md is written for agents and is the file most coding tools look for on their own. Keep prose for people in the README; put commands and rules for agents in AGENTS.md.
Do I still need AGENTS.md if I use Cursor rules?
They overlap, but AGENTS.md is the cross-tool standard while Cursor rules only apply in Cursor. Keep one AGENTS.md as the source of truth and have tool-specific files point to it. Learn more: https://vibeready.sh/structured-vibe-coding/
Have more questions? See our full FAQ →