How to Configure robots.txt for AI Crawlers
The robots.txt is the first file every crawler visits — including AI crawlers. It determines whether GPTBot, ClaudeBot and others can read your website at all. A wrong configuration can make you completely invisible.
Which AI Crawlers Exist?
The most important AI bots actively crawling websites:
- GPTBot — OpenAI (ChatGPT)
- ChatGPT-User — ChatGPT browsing
- ClaudeBot — Anthropic (Claude)
- anthropic-ai — Anthropic's training crawler
- Google-Extended — Google (Gemini)
- PerplexityBot — Perplexity AI
- Bingbot — Microsoft (Copilot)
- meta-externalagent — Meta AI
- DeepSeekBot — DeepSeek
- MistralBot — Mistral AI
- YouBot — You.com
The Ideal robots.txt for AI Visibility
For maximum visibility, your robots.txt should explicitly allow these bots:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://your-domain.com/sitemap.xml
Common Mistakes
The most common mistake: A blanket Disallow: / for all user agents that accidentally blocks AI bots too. Or: Deliberately blocking AI bots thinking it "protects data" — it just makes you invisible.
Another mistake: Having no robots.txt at all. This is technically fine, but explicit configuration shows AI systems you're deliberately granting access.
Block or Allow?
There are valid reasons to block certain AI crawlers — for example, if you don't want your content used for training. But remember: If you block GPTBot, ChatGPT won't know your content and can't recommend it.
How to Check Your Configuration
Visit https://your-domain.com/robots.txt in your browser. Or even easier: Scan your website with scan8 — the "AI Crawler Access" category instantly shows which of the 11 AI bots have access.