AI search guide
AI Crawlers and robots.txt: GPTBot, Google-Extended, ClaudeBot and More
Learn how robots.txt applies to AI crawlers, which user-agent tokens matter, and how to test AI crawler access safely.
What AI crawlers are
AI crawlers are automated user agents that request public web pages for AI-related products such as answer engines, model improvement, user-triggered browsing, or dataset collection.
They are not all the same. Some crawlers are associated with search indexing, some with AI training controls, and some with user-requested retrieval. That is why the exact user-agent token matters.
How AI crawlers use robots.txt
robots.txt is a public policy file for compliant crawlers. It can tell a crawler whether it is allowed to request a path, but it is not authentication and it does not guarantee every crawler will comply.
For AI search visibility, robots.txt matters because crawler access is a first layer. A blocked crawler may not be able to retrieve the page directly, while an allowed crawler still needs useful content, metadata, and schema before the page is easy to cite.
Difference between search crawlers and AI training crawlers
Search crawlers such as Googlebot are primarily associated with search discovery and indexing. AI training or control tokens such as Google-Extended can represent a different policy surface.
Do not assume one robots.txt rule covers every product from the same company. Review each token separately and describe the policy in plain language for future audits.
AI crawler user-agent tokens to review
Common AI-related tokens include OAI-SearchBot, GPTBot, ChatGPT-User, Google-Extended, ClaudeBot, PerplexityBot, CCBot, Applebot-Extended, Amazonbot, and Bytespider. Each token can represent a different product purpose, so avoid treating all AI crawlers as the same.
Googlebot and Google-Extended are separate. Blocking Google-Extended does not mean blocking Google Search crawling. If you want to change Google Search indexing, review Googlebot and page-level indexability separately.
- OAI-SearchBot: OpenAI automatic search discovery.
- GPTBot: OpenAI model-training crawler.
- ChatGPT-User: user-triggered browsing and retrieval requests.
- Google-Extended: control token for certain Gemini and Vertex AI uses.
- PerplexityBot: answer engine crawling and retrieval.
- CCBot: Common Crawl collection used by many downstream systems.
Example allow rules
A simple allow policy makes AI crawler access explicit. Test the rule at the final canonical host because staging rules and redirects can change the result.
User-agent: OAI-SearchBot Allow: / User-agent: GPTBot Disallow: / User-agent: PerplexityBot Allow: / User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Example block rules
A specific block policy restricts one AI crawler without necessarily changing search indexing or every other crawler. Blocking is a policy decision, not an authentication layer.
User-agent: Google-Extended Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Googlebot Allow: /
How to audit your policy
Start by testing the homepage, then test important sections such as documentation, pricing, support, and article pages. A root allow can coexist with deeper disallow rules that block useful pages.
Use the AI crawler robots.txt checker to parse the deployed file and identify allowed, blocked, and unspecified crawler states.