AI Index Check

AI search guide

GPTBot robots.txt Rules: How to Allow or Block OpenAI Crawling

See practical GPTBot robots.txt examples, common mistakes, and a safe checklist for testing OpenAI crawler access.

What GPTBot is

GPTBot is an OpenAI user-agent token that site owners can address in robots.txt. A GPTBot rule communicates whether compliant requests from that crawler are allowed or blocked for matching paths.

The rule should be reviewed as a model-training policy, separately from OAI-SearchBot search discovery, ChatGPT-User requests, Google-Extended, ClaudeBot, PerplexityBot, and wildcard behavior.

Difference between GPTBot and ChatGPT-User

GPTBot and ChatGPT-User are separate user-agent tokens. GPTBot is documented for model training, while ChatGPT-User is associated with user-triggered requests and is not the automatic search crawler.

OAI-SearchBot is the separate OpenAI token for automatic search discovery. Test all three independently and note that OpenAI says robots.txt may not apply to user-initiated ChatGPT-User actions.

What GPTBot rules look like

GPTBot can be addressed with a specific User-agent group in robots.txt. Specific rules are easier to audit than relying only on broad User-agent: * behavior.

An allow rule does not guarantee crawling, indexing, training, ranking, or citation. It only communicates that a compliant crawler is permitted to request matching paths.

User-agent: GPTBot
Allow: /

User-agent: *
Allow: /

How to block GPTBot

If your policy is to restrict GPTBot from the whole site, use a specific disallow rule and confirm that it is served from the canonical host.

If only some areas should be restricted, test the exact paths. Blocking /private/ is very different from blocking /docs/ or /pricing/.

User-agent: GPTBot
Disallow: /

User-agent: *
Allow: /

GPTBot testing checklist

Check the deployed robots.txt status code, content type, redirects, and sitemap location. Then test GPTBot, ChatGPT-User, and User-agent: * separately because each can resolve to different rules.

  • Confirm the file is served at /robots.txt on the canonical host.
  • Check GPTBot-specific rules before wildcard rules.
  • Test important public paths, not only the homepage.
  • Document why the policy is allowed or blocked.

Common mistakes

Common GPTBot mistakes include testing only User-agent: *, forgetting path-specific disallow rules, copying examples without matching business intent, and assuming robots.txt is the same as access control.

Another common mistake is changing GPTBot rules without checking ChatGPT-User or other AI crawler tokens that may serve a different purpose.

Related AI Index Check tools