Blog

How to Configure robots.txt for AI Crawlers

scan8.io · 2026

The robots.txt is the first file every crawler visits — including AI crawlers. It determines whether GPTBot, ClaudeBot and others can read your website at all. A wrong configuration can make you completely invisible.

Which AI Crawlers Exist?

The most important AI bots actively crawling websites:

GPTBot — OpenAI (ChatGPT)
ChatGPT-User — ChatGPT browsing
ClaudeBot — Anthropic (Claude)
anthropic-ai — Anthropic's training crawler
Google-Extended — Google (Gemini)
PerplexityBot — Perplexity AI
Bingbot — Microsoft (Copilot)
meta-externalagent — Meta AI
DeepSeekBot — DeepSeek
MistralBot — Mistral AI
YouBot — You.com

The Ideal robots.txt for AI Visibility

For maximum visibility, your robots.txt should explicitly allow these bots:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://your-domain.com/sitemap.xml

Common Mistakes

The most common mistake: A blanket Disallow: / for all user agents that accidentally blocks AI bots too. Or: Deliberately blocking AI bots thinking it "protects data" — it just makes you invisible.

Another mistake: Having no robots.txt at all. This is technically fine, but explicit configuration shows AI systems you're deliberately granting access.

Block or Allow?

There are valid reasons to block certain AI crawlers — for example, if you don't want your content used for training. But remember: If you block GPTBot, ChatGPT won't know your content and can't recommend it.

How to Check Your Configuration

Visit https://your-domain.com/robots.txt in your browser. Or even easier: Scan your website with scan8 — the "AI Crawler Access" category instantly shows which of the 11 AI bots have access.

How does your website score?

Free AI Readiness Check — results in 30 seconds.

Scan now →