How to Rank in ChatGPT - Build an AI-Readable Site ChatGPT Crawlers Actually Understand (Part 2 of 5) | Zeover Research Blog on Organic GEO and How To Optimize for AI Searches

Is your site blocking ChatGPT without realizing it? Zeover audits your AI-readability against 100+ metrics, tests whether GPTBot and OAI-SearchBot can actually fetch your pages, and generates the structured data that earns citations. Run your free audit.

This is part two of our series on how to rank in ChatGPT. Part 1 covered what ChatGPT actually cites (roughly 48% third-party sources, 2.62 citations per answer, a different citation fingerprint than Gemini, Claude, Grok, or Perplexity). This part is about making sure ChatGPT can read your site in the first place.

An alarming share of the sites we audit are technically blocking one or more of ChatGPT’s crawlers without the marketing team knowing. Others have no schema markup, unparseable headings, and content structured for 2015-era SEO rather than AI retrieval. You can’t rank in ChatGPT if its crawlers can’t reach your pages. You won’t get cited if what they reach is illegible.

TL;DR

ChatGPT uses three distinct user agents: GPTBot (training crawl), OAI-SearchBot (search-time indexing), and ChatGPT-User (live fetches during user conversations). Blocking any one hurts visibility.
GPTBot’s share of AI crawler traffic rose from 5% to 30% between May 2024 and May 2025. It’s the single most important AI crawler to allow.
Structured data (schema.org) makes a page 2.5x more likely to appear in AI answers. FAQPage schema has an especially strong effect on AI Overview appearances.
Short-and-to-the-point writing earns more ChatGPT citations than long SEO-era prose. A large share of AI citations come from the opening passages of a page.
llms.txt is the low-cost signal most brands still haven’t published. Five minutes to set up. It tells AI crawlers what your site is and which pages matter.

ChatGPT’s Crawler Stack

ChatGPT doesn’t use one crawler. It uses three, and they do different jobs.

GPTBot is OpenAI’s training and content discovery crawler. If GPTBot can’t reach a page, that content won’t be considered for future ChatGPT model updates. Cloudflare’s analysis of global AI crawler traffic found GPTBot’s share rising from 5% to 30% between May 2024 and May 2025 - a 305% increase in request volume over the period.

OAI-SearchBot is OpenAI’s search-time indexing crawler, used to keep ChatGPT’s web index current. This is separate from training data. Blocking OAI-SearchBot specifically means your site won’t surface in ChatGPT’s live search results even if it’s included in the training corpus.

ChatGPT-User is the user agent ChatGPT uses when it fetches a URL during an active conversation (the “browse the web” behavior when users ask about something real-time or cite a URL). Request volume for ChatGPT-User grew by over 2,800% between 2024 and 2025.

All three need to be allowed in your robots.txt. Nearly 60% of reputable websites now block at least one AI user agent, forbidding an average of 15.5 AI crawlers each. If you’re in that 60%, step one is unblocking.

Check Your robots.txt First

The first file to audit is your robots.txt. Common patterns we see in it that quietly break ChatGPT visibility:

User-agent: GPTBot / Disallow: / - outright block, usually added in 2023 during the “should we let AI crawl us” debate.
User-agent: * / Disallow: / - catch-all block that hits every crawler.
Crawl-delay directives set so high that GPTBot gives up before finishing a meaningful crawl.
Blocking specific directories (blog, case studies, pricing) that happen to be exactly the pages you’d want cited.

The fix is straightforward: unless you have a specific licensing reason to keep AI engines out, allow GPTBot, OAI-SearchBot, ChatGPT-User, Google-Extended (for Gemini), ClaudeBot (for Claude), and PerplexityBot. Also allow whatever xAI’s crawler is called if you’re targeting Grok.

For most brands, letting AI engines crawl is not the decision to stress about. Organic GEO depends on AI engines being able to read what you’ve published. Blocking by default trades visibility for a hypothetical concern about training data.

llms.txt - The Low-Cost Signal Most Brands Still Haven’t Published

llms.txt is a markdown file at the root of your domain that gives AI crawlers a structured guide to what your site is and which pages matter. It was proposed in late 2024 and has been adopted by hundreds of thousands of sites.

No AI provider has publicly committed to consistently following llms.txt instructions. But the file costs nothing to publish, takes five minutes to generate, and no tested AI engine has penalized sites for having one. The downside risk is zero.

A minimal llms.txt that works:

# Your Brand

> One-paragraph description of what you do, who you serve, and one or two specific differentiators. Concrete, not marketing language.

## Product

- [Feature page](https://yoursite.com/feature): what's on it, in one line
- [Pricing](https://yoursite.com/pricing): what your pricing model actually is
- [Docs](https://yoursite.com/docs): comprehensive product documentation

## Resources

- [Research report](https://yoursite.com/research): any proprietary data you've published
- [Case studies](https://yoursite.com/customers): specific customer outcomes

Keep the file accurate. A stale llms.txt with dead links is worse than no llms.txt because it hands AI crawlers authoritative-looking data that contradicts your live site.

Schema.org - The Non-Optional Layer

Schema markup is the second biggest factor after unblocking crawlers. Content with proper schema has a 2.5x higher chance of appearing in AI answers, according to 2025 analysis. Pages with FAQPage schema show an especially strong lift in Google AI Overviews, and similar mechanisms work on ChatGPT.

The schema types that matter most for ranking in ChatGPT:

Organization on your homepage or as an @id reference across your site. This is the anchor that connects content across your domain to a single entity ChatGPT can build a mental model of. Include name, url, logo, sameAs (social profiles), description, and address.

FAQPage on every service page, product page, and substantive blog post. FAQPage schema pairs question and answer explicitly, which is exactly the structure ChatGPT extracts when answering user questions. Write the questions the way customers actually ask them.

HowTo on tutorials and step-by-step guides. This schema type declares each step, which helps ChatGPT extract sequences intact when users ask procedural questions.

Product on anything you sell. Price, availability, ratings, reviews, SKU. ChatGPT cites products more confidently when the facts are explicit in structured data rather than buried in prose.

Author on every piece of editorial content. AI engines evaluate E-E-A-T signals, and attribution signals trust. Pages with author credentials, bios, and linked social profiles get cited at higher rates than anonymous pages.

Use JSON-LD. Google’s official 2025 guidance recommends it specifically, and it’s easier to maintain because it lives separately from your visible HTML.

Short-and-to-the-Point Writing Beats SEO-Era Prose

ChatGPT extracts content in passages, not full essays. That makes verbose content a liability. Short, declarative writing with direct answers near the top of each section is the pattern ChatGPT rewards.

The KDD 2024 GEO paper from Princeton, Georgia Tech, and IIT Delhi tested nine optimization techniques across 10,000 queries. Adding quotations from named sources improved AI visibility by 41%. Adding statistics with attribution improved it by 33%. Citing external sources improved it by 28% (and up to 115% for lower-ranked content). Conversely, wordy prose, filler, and buried claims correlated with lower visibility.

Multiple industry analyses of AI citation patterns converge on the same finding: a significant share of citations come from the first 30% or so of a page’s content. Front-load your answer. Open every section with the key fact. Elaborate afterward.

Three practical rules for ChatGPT-friendly writing:

Declarative sentences that stand alone. “Value-based pricing outperforms cost-plus pricing by 24% in gross margin.” That sentence is citable verbatim. “We’ve found that pricing is complicated, and depending on your model…” isn’t.
Self-contained sections. A reader (or a crawler) should be able to lift any H2-to-H2 unit out of the page and still get a complete answer to one specific question.
Clean heading hierarchy. H1 for the topic. H2s for main sections. H3s within. No skipping levels, no using headings for decorative sizing.

For the deeper playbook on this, see Part 3 of our companion series on content machine-readability.

What Carries Over to Gemini, Claude, Grok, and Perplexity

Technical hygiene is where the five engines converge. The fixes that help you rank in ChatGPT also help you rank in Gemini (which benefits from the same schema markup, though with more weight on your own domain), Claude (which reads the same structured data), Grok (same crawler considerations), and Perplexity (which values diverse cited content from technically clean sites).

ChatGPT is the strictest judge of technical hygiene. A site that can earn ChatGPT citations is technically clean enough to earn citations on every other engine. The reverse isn’t true - a site Gemini happily cites may fail against ChatGPT’s higher third-party and structure bars.

How Zeover Fits

Zeover audits your site against more than 100 AI-readability metrics, including all three ChatGPT user agents, llms.txt, schema coverage, and content-structure signals. When the audit flags something, you get specific remediation steps - which pages are missing FAQPage schema, which crawler is blocked in your robots.txt, which content is too verbose for AI extraction.

The platform also runs ongoing tests across ChatGPT, Claude, Gemini, and Grok so you don’t have to. The research and the testing are ours. Your job is the part we can’t do from the outside - build the product your customers actually want.

Previously in This Series

Part 1 - What ChatGPT Cites and How Gemini, Claude, Grok, and Perplexity Differ