Testing and source policy
AI Index Check Methodology
AI Index Check combines reproducible technical checks with clearly labeled editorial heuristics. Results identify avoidable crawl and extraction friction; they do not predict or guarantee crawling, indexing, ranking, inclusion, model use, or citation.
What the tools test
The tools fetch a user-provided public HTTP or HTTPS URL, its origin robots.txt, and its root llms.txt. They inspect response state, path-specific robots directives, metadata, canonical URL, visible headings and text, JSON-LD syntax and types, source links, date signals, and answer-passage structure.
Objective checks versus editorial heuristics
Objective checks cover observable states such as HTTP response, robots matching, noindex, canonical presence, H1 count, JSON syntax, and detected Schema.org types. Editorial heuristics cover source quality, passage independence, freshness context, and content depth. Heuristics are recommendations, not platform requirements.
How crawler roles are verified
Each crawler record links to primary owner documentation where available and displays a manually maintained Last verified date. Search crawling, search-control tokens, model-training crawlers, user-request agents, and Common Crawl collection are not grouped under one purpose.
- OAI-SearchBot: OpenAI source, verified 2026-06-11
- GPTBot: OpenAI source, verified 2026-06-11
- ChatGPT-User: OpenAI source, verified 2026-06-11
- Googlebot: Google source, verified 2026-06-11
- Google-Extended: Google source, verified 2026-06-11
- ClaudeBot: Anthropic source, verified 2026-06-11
- PerplexityBot: Perplexity source, verified 2026-06-11
- CCBot: Common Crawl source, verified 2026-06-11
How citation readiness is scored
The overall whole-number score is the equal-weight average of seven categories: crawl access, extractability, source quality, freshness, entity clarity, structured data, and internal discovery. Failed checks state why they matter and whether they are objective or heuristic. A failed fetch reduces confidence rather than being converted into a clean low-content score.
What the tools cannot guarantee
No result guarantees that a crawler will visit, that a search engine will index or rank a page, that an AI system will include or cite it, or that content will be used for training. Robots.txt is a directive for compliant crawlers and is not access control. llms.txt is an emerging optional convention and is not a Google Search or AI Overview requirement.
Sources and update policy
Platform claims are based on current primary documentation. Google states that normal Search crawl, index, and snippet eligibility are the foundation for AI Overviews and AI Mode; no special AI file or schema is required. Verification dates are changed only after a manual source review, not automatically during deployment.
How to report an error
Report the affected AI Index Check URL, the exact statement or result, the expected correction, and a primary source URL to the site operator through the contact channel published for this domain. Corrections should update both the shared crawler registry and any dependent result copy so the product remains internally consistent.