GEO GUIDE
CUT THE HYPE
llms.txt and AI crawlers: what actually matters
There is a lot of noise about llms.txt. Here is the honest version, what it does, what it does not, and the crawler decisions that actually decide whether AI can cite you.

The GEO frameworkSee the crawlers

# robots.txt
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Claude-SearchBot
Allow: /

A GEO guide by Apurv Singh, HQ Digital

llms.txt is a proposed file that hands AI models a tidy map of your best content. It is a reasonable idea. It is also not, today, something the major AI engines have confirmed they use to decide who gets cited. So treat it as a cheap hedge, not a strategy, and put your energy where it actually moves the needle.

The thing that genuinely controls whether AI can read and cite your site is older and less exciting: which crawlers you allow, and whether your site is clean enough for them to use. Let me separate the signal from the noise.

The honest truth about llms.txt

It was proposed in 2024 as a markdown file that summarises and links your most important pages, so a model does not have to wade through your navigation, banners and footers. Good intention. But after more than a year, no major answer engine has confirmed it uses the file to choose citations. Google has said on the record that it does not use it, and large studies across hundreds of thousands of domains have found no link between having an llms.txt and being cited. The crawlers barely request it.

Where it does earn its keep is developer tooling. AI coding assistants and documentation agents use it as a routing layer. So if you run a docs-heavy or developer product, it is genuinely useful. For everyone else, it is a thirty-minute, forward-compatible hedge. Yoast or Rank Math can generate one. Just do not build duplicate markdown copies of every page, which creates a duplicate-content mess, and do not expect a single citation to come from it.

THE AI CRAWLERS THAT MATTER
Managed in robots.txt, the only standard everyone actually respects.
GPTBotOpenAI
Trains OpenAI models
Your call
OAI-SearchBotOpenAI
Powers citations in ChatGPT search
Allow
ClaudeBotAnthropic
Trains Anthropic models
Your call
Claude-SearchBotAnthropic
Powers citations in Claude search
Allow
PerplexityBotPerplexity
Indexes pages for Perplexity answers
Allow
Google-ExtendedGoogle
Controls Gemini grounding and training
Allow
Search and answer crawlers are the ones that decide citations. The training crawlers are a separate, optional choice about model training.
The decision that actually matters

Forget the magic file. The real lever is whether you allow the crawlers that power AI answers. If you block OAI-SearchBot, Claude-SearchBot or PerplexityBot, you have quietly opted out of being cited on those platforms. For GEO, you allow them. The training crawlers, GPTBot and ClaudeBot, are a separate question about whether your content trains future models. Blocking those does not help your citations and may simply remove you from the conversation.

This is the Core pillar of the CITE framework in its most literal form. A crawlable, fast, well-structured site that lets the right bots in beats any speculative file. See how structure decides liftability

llms.txt is a cheap hedge, not a strategy. The crawler you allow and the page you structure decide whether AI can cite you.
Apurv Singh, Founder, HQ Digital
A five-minute action list
Open your robots.txt and make sure the answer crawlers are allowed, not accidentally blocked by an old rule.
Decide deliberately on the training crawlers. There is no single right answer, just an informed one.
Keep critical content and structured data high on the page and the page light, so crawlers actually process it.
If you want, ship an llms.txt through Yoast or Rank Math as a forward-compatible hedge. Skip the per-page markdown copies.
Spend the time you saved on occasion-led content with a point of view. That is what gets cited.
Common questions
Does llms.txt help me get cited by AI?
There is no evidence it does. No major answer engine has confirmed using it, Google has said on the record that it does not, and large studies found no link between having the file and being cited. Treat it as a cheap hedge, not a citation strategy.
Should I add an llms.txt file anyway?
You can. It takes about thirty minutes, Yoast or Rank Math can generate it, and it is forward-compatible if platforms adopt it later. Just do not build duplicate markdown copies of every page, which causes duplicate-content problems, and do not expect citations from it.
What actually controls whether AI can use my site?
robots.txt, the only standard everyone respects. If you want GEO visibility, allow the answer and search crawlers like OAI-SearchBot, Claude-SearchBot and PerplexityBot. Block them and you opt out of being cited.
Apurv Singh
Founder of HQ Digital and a Growth Architect with 12 plus years in SEO and performance marketing. He built SEO functions at Times Internet and Future Group, has trained over 10,000 marketers, is a TEDx speaker, and has worked with brands including Spotify, Amazon and a long list of D2C businesses. More about Apurv ›