llms.txt and AI Crawlers: What Actually Matters

GEO GUIDE

CUT THE HYPE

llms.txt and AI crawlers: what actually matters

There is a lot of noise about llms.txt. Here is the honest version, what it does, what it does not, and the crawler decisions that actually decide whether AI can cite you.

The GEO framework See the crawlers

# robots.txt
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Claude-SearchBot
Allow: /

A GEO guide by Apurv Singh, HQ Digital

llms.txt is a proposed file that hands AI models a tidy map of your best content. It is a reasonable idea. It is also not, today, something the major AI engines have confirmed they use to decide who gets cited. So treat it as a cheap hedge, not a strategy, and put your energy where it actually moves the needle.

The thing that genuinely controls whether AI can read and cite your site is older and less exciting: which crawlers you allow, and whether your site is clean enough for them to use. Let me separate the signal from the noise.

The honest truth about llms.txt

It was proposed in 2024 as a markdown file that summarises and links your most important pages, so a model does not have to wade through your navigation, banners and footers. Good intention. But after more than a year, no major answer engine has confirmed it uses the file to choose citations. Google has said on the record that it does not use it, and large studies across hundreds of thousands of domains have found no link between having an llms.txt and being cited. The crawlers barely request it.

Where it does earn its keep is developer tooling. AI coding assistants and documentation agents use it as a routing layer. So if you run a docs-heavy or developer product, it is genuinely useful. For everyone else, it is a thirty-minute, forward-compatible hedge. Yoast or Rank Math can generate one. Just do not build duplicate markdown copies of every page, which creates a duplicate-content mess, and do not expect a single citation to come from it.

THE AI CRAWLERS THAT MATTER

Managed in robots.txt, the only standard everyone actually respects.

GPTBotOpenAI

Trains OpenAI models

Your call

OAI-SearchBotOpenAI

Powers citations in ChatGPT search

Allow

ClaudeBotAnthropic

Trains Anthropic models

Your call

Claude-SearchBotAnthropic

Powers citations in Claude search

Allow

PerplexityBotPerplexity

Indexes pages for Perplexity answers

Allow

Google-ExtendedGoogle

Controls Gemini grounding and training

Allow

Search and answer crawlers are the ones that decide citations. The training crawlers are a separate, optional choice about model training.

The decision that actually matters

Forget the magic file. The real lever is whether you allow the crawlers that power AI answers. If you block OAI-SearchBot, Claude-SearchBot or PerplexityBot, you have quietly opted out of being cited on those platforms. For GEO, you allow them. The training crawlers, GPTBot and ClaudeBot, are a separate question about whether your content trains future models. Blocking those does not help your citations and may simply remove you from the conversation.

This is the Core pillar of the CITE framework in its most literal form. A crawlable, fast, well-structured site that lets the right bots in beats any speculative file. See how structure decides liftability ›

“

llms.txt is a cheap hedge, not a strategy. The crawler you allow and the page you structure decide whether AI can cite you.

Apurv Singh, Founder, HQ Digital

A five-minute action list

✓

Open your robots.txt and make sure the answer crawlers are allowed, not accidentally blocked by an old rule.

✓

Decide deliberately on the training crawlers. There is no single right answer, just an informed one.

✓

Keep critical content and structured data high on the page and the page light, so crawlers actually process it.

✓

If you want, ship an llms.txt through Yoast or Rank Math as a forward-compatible hedge. Skip the per-page markdown copies.

✓

Spend the time you saved on occasion-led content with a point of view. That is what gets cited.

Common questions

Does llms.txt help me get cited by AI?

There is no evidence it does. No major answer engine has confirmed using it, Google has said on the record that it does not, and large studies found no link between having the file and being cited. Treat it as a cheap hedge, not a citation strategy.

Should I add an llms.txt file anyway?

You can. It takes about thirty minutes, Yoast or Rank Math can generate it, and it is forward-compatible if platforms adopt it later. Just do not build duplicate markdown copies of every page, which causes duplicate-content problems, and do not expect citations from it.

What actually controls whether AI can use my site?

robots.txt, the only standard everyone respects. If you want GEO visibility, allow the answer and search crawlers like OAI-SearchBot, Claude-SearchBot and PerplexityBot. Block them and you opt out of being cited.

Keep going

Structure content for AIMake your allowed crawlers able to lift a clean answer.Read next ›The GEO framework hubThe CITE framework and the full playbook.Read next ›

Go deeper, hands-on

ON-DEMAND COURSEDream SEO MasterclassThe crawlable, technical core that makes any of this work.View course ›LIVE WORKSHOPAEO Workshop: Get Cited on ChatGPT, Perplexity and Google AIOsGet cited, the right way, hands-on.View course ›LIVE WORKSHOPAI First Agency IntensiveBuild the systems behind a modern, AI-ready site.View course ›

Apurv Singh

Founder of HQ Digital and a Growth Architect with 12 plus years in SEO and performance marketing. He built SEO functions at Times Internet and Future Group, has trained over 10,000 marketers, is a TEDx speaker, and has worked with brands including Spotify, Amazon and a long list of D2C businesses. More about Apurv ›