AI search guide
GPTBot robots.txt Rules: How to Allow or Block OpenAI Crawling
See practical GPTBot robots.txt examples, common mistakes, and a safe checklist for testing OpenAI crawler access.
What GPTBot is
GPTBot is an OpenAI user-agent token that site owners can address in robots.txt. A GPTBot rule communicates whether compliant requests from that crawler are allowed or blocked for matching paths.
The rule should be reviewed as a model-training policy, separately from OAI-SearchBot search discovery, ChatGPT-User requests, Google-Extended, ClaudeBot, PerplexityBot, and wildcard behavior.
Difference between GPTBot and ChatGPT-User
GPTBot and ChatGPT-User are separate user-agent tokens. GPTBot is documented for model training, while ChatGPT-User is associated with user-triggered requests and is not the automatic search crawler.
OAI-SearchBot is the separate OpenAI token for automatic search discovery. Test all three independently and note that OpenAI says robots.txt may not apply to user-initiated ChatGPT-User actions.
What GPTBot rules look like
GPTBot can be addressed with a specific User-agent group in robots.txt. Specific rules are easier to audit than relying only on broad User-agent: * behavior.
An allow rule does not guarantee crawling, indexing, training, ranking, or citation. It only communicates that a compliant crawler is permitted to request matching paths.
User-agent: GPTBot Allow: / User-agent: * Allow: /
How to block GPTBot
If your policy is to restrict GPTBot from the whole site, use a specific disallow rule and confirm that it is served from the canonical host.
If only some areas should be restricted, test the exact paths. Blocking /private/ is very different from blocking /docs/ or /pricing/.
User-agent: GPTBot Disallow: / User-agent: * Allow: /
GPTBot testing checklist
Check the deployed robots.txt status code, content type, redirects, and sitemap location. Then test GPTBot, ChatGPT-User, and User-agent: * separately because each can resolve to different rules.
- Confirm the file is served at /robots.txt on the canonical host.
- Check GPTBot-specific rules before wildcard rules.
- Test important public paths, not only the homepage.
- Document why the policy is allowed or blocked.
Common mistakes
Common GPTBot mistakes include testing only User-agent: *, forgetting path-specific disallow rules, copying examples without matching business intent, and assuming robots.txt is the same as access control.
Another common mistake is changing GPTBot rules without checking ChatGPT-User or other AI crawler tokens that may serve a different purpose.