Crawler directory
AI and Search Crawler References
Compare documented crawler roles before changing robots.txt. Search discovery, model training, user-initiated requests, and control tokens are separate policy surfaces.
OpenAI: OpenAI search discovery for ChatGPT search features.
Last verified 2026-06-11
GPTBotOpenAI: OpenAI crawler for content that may be used to train generative AI models.
Last verified 2026-06-11
ChatGPT-UserOpenAI: User-initiated ChatGPT and Custom GPT requests.
Last verified 2026-06-11
GooglebotGoogle: Google Search crawling and indexing.
Last verified 2026-06-11
Google-ExtendedGoogle: Control token for certain Gemini and Vertex AI uses outside Google Search.
Last verified 2026-06-11
ClaudeBotAnthropic: Anthropic automated web crawler.
Last verified 2026-06-11
PerplexityBotPerplexity: Perplexity search and retrieval crawler.
Last verified 2026-06-11
CCBotCommon Crawl: Common Crawl dataset collection.
Last verified 2026-06-11
Applebot-ExtendedApple: Apple control token for use of web content in certain generative AI models.
Last verified 2026-06-11
AmazonbotAmazon: Amazon web crawler used across Amazon services.
Last verified 2026-06-11
BytespiderByteDance: ByteDance crawler; a dedicated public purpose statement was not verified.
Last verified 2026-06-11
How these references are maintained
Roles are checked against primary documentation and keep a manually maintained verification date. Allowing a crawler does not guarantee crawling, indexing, inclusion, ranking, or citation. Read the methodology and source policy.