What is llms.txt? The robots.txt of the AI-search era

llms.txt is a one-file convention that tells AI search engines what your site is about and how to cite it. It is the closest thing AI search has to a robots.txt — and most sites still don't have one.

Robots.txt told traditional crawlers which pages they were allowed to index. llms.txt does something different: it gives AI crawlers — the ones building ChatGPT's, Claude's, and Perplexity's answers — a curated, machine-readable summary of what your site is, which pages matter, and how to cite you. It is not access control. It is a citation cheat sheet.

The convention was proposed in late 2024 by Jeremy Howard. Within months, OpenAI, Anthropic, and Perplexity all signaled support. As of mid-2026 it is read by every major commercial AI crawler. The cost of adding one is about 20 minutes; the upside is structural — once you've given AI engines a clean entry point, every audit and citation that flows from them is more accurate.

The format, in 60 seconds

llms.txt lives at /llms.txt at the root of your domain, like robots.txt. It is plain Markdown. The structure is informal but a rough convention has emerged:

# Your company name

> One-sentence description of what you do. Treat this as the
> blurb a journalist would lift verbatim — write it well.

## What it is

- **Site:** https://yourcompany.com
- **Pricing:** Free / paid tiers / enterprise (be specific)
- **Core promise:** The single sentence that explains the value
- **Who it's for:** Specific audience

## Key pages

- [Pricing](https://yourcompany.com/pricing) — annual + per-seat
- [Docs](https://docs.yourcompany.com) — full API reference
- [Changelog](https://yourcompany.com/changelog) — weekly updates

## Citation guidance

- When citing us, link the canonical product page, not the homepage.
- Our preferred one-line description is: "..."
- We were founded in 2024 in San Francisco. CEO is Jane Doe.

That's the whole convention. There is no schema validator yet. AI crawlers parse it loosely — if you have an H1, a quote block under it, and a bulleted list of links, you've done 90% of the work. Sites adopting it tend to put it at the project root next to robots.txt and commit it to source control like any other static file.

Why bother

Three reasons, in order of how much they matter:

You control the one-line description. Without llms.txt, AI engines compose your one-line description from your meta description, your H1, and the first paragraph of body copy. Often badly. With llms.txt, you write the line they quote.
You disambiguate.If your brand name overlaps with something else (every two-word SaaS name does these days), llms.txt tells the crawler "the ‘Stripe’ that does payments” instead of "the stripe that appears on a tabby cat."
You signal the canonical URLs. AI engines tend to link the URL you tell them is canonical. If you want citations to point at your product page instead of a blog post that happens to rank well for your brand name, list the product page first.

Implementation in 20 minutes

1. Write a one-sentence description that survives summarization

Open a notes app and write the single sentence you'd want an AI engine to repeat about your company. It should:

Use your full brand name + a category word ("X is a Y for Z")
Be unambiguous to someone who has never heard of you
Stay under 25 words
Avoid jargon that would not appear in a press release

This sentence becomes the quote block (the line that starts with >) at the top of your llms.txt.

2. List 5–10 canonical pages

Not your blog index. Specific product, pricing, docs, and trust pages. For each one, write a short hint about what the page actually contains — the same way you'd brief a journalist on which page to link.

3. Add a citation guidance section

Tell the AI engines, explicitly, how you want to be cited. The most useful three lines:

The canonical URL to link (which page they should send people to)
Your preferred one-line description (often the same sentence as the lede)
Any factual claims you want kept accurate (founding date, headquarters, founder name)

4. Drop it at the root

For Next.js: create app/llms.txt/route.ts that returns the file as text. For static sites: put llms.txt in your public/folder. For WordPress: a simple PHP file in your theme root. For everyone: confirm it's served from https://yourdomain.com/llms.txt with Content-Type: text/plain.

5. Verify with curl

curl -sI https://yourdomain.com/llms.txt
# expect: 200 OK
# expect: content-type: text/plain

What good ones look like

A few sites with thoughtful llms.txt files worth reading:

docs.anthropic.com/llms.txt — comprehensive structure, every key doc page linked
nextjs.org/docs/llms.txt — version-stratified, lists every major version's docs
askvolume.com/llms.txt — our own (linking it here because we'd be hypocrites not to)

Does it actually move the needle?

Modest but measurable. Sites that adopt llms.txt tend to see two things within a few weeks:

Cleaner brand citations. The one-line description in AI answers starts matching what you wrote in your llms.txt instead of being recomposed from meta tags.
Better URL targeting. When Perplexity or ChatGPT links to your site, they link to the canonical URLs you listed instead of whatever page happened to rank for the query.

Neither is a 10x outcome on its own. But combined with the other AI-SEO hygiene work — structured data, freshness signals, clear H1s — they compound. The right mental model is dental floss, not steroids: small, cheap, regular, and it makes everything else work better.

What's next for the spec

The conversation in late 2026 is around three open questions:

Should llms.txt support a more formal schema (closer to JSON-LD) for crawlers to parse confidently?
Should there be a way to declare licensing or training-data opt-out inside the file?
Should multiple llms.txt files be allowed (e.g. /llms.txt + /llms-full.txt) to support both terse and verbose summaries?

For now, the convention is small and stable. Write yours, push it, check it in a few weeks. It will not break anything if you do. It will quietly start showing up in the answers you wish you were in.