GEO

Why your site doesn't appear in ChatGPT (and how to fix it)

The 7 reasons your site doesn't appear in ChatGPT and how to fix each one: SSR, schema, authority, external citations, structured content.

Published · 2026-05-086 min read

If you opened ChatGPT, asked about your category, and your brand wasn't mentioned in any response, it's not bad luck. There are specific technical and content reasons. This post covers them in probability order, with how to diagnose each one and how to fix it.

If you haven't audited yet, do it first (5 minutes): How to appear in ChatGPT: 2026 technical guide.

Reason 1: your site is pure CSR (SPA without SSR)

It's the most common cause and the most invisible. If your site was built with React, Vue, Svelte, or any SPA framework and renders client-side without SSR, LLM crawlers see an empty page.

How to diagnose:

Open your site in browser. Right-click → "View page source" (or Cmd/Ctrl+U).
If you see all the page content rendered in HTML, OK.
If you see an empty <div id="root"></div> and a <script> loading JS, the LLM doesn't read your content.

How to fix:

Migrate to Next.js, Remix, SvelteKit, or equivalent with SSR/SSG.
If you can't migrate the full frontend, consider at least pre-rendering critical pages (landing, blog, service pages) with services like Prerender.io.

Reason 2: your schema.org is empty or wrong

Without schema, the LLM has to infer what type of entity you are from plain text. With schema, it knows directly. And LLMs prefer not to infer when there's ambiguity: if they don't know what you are, they don't cite you.

How to diagnose:

Open Schema Markup Validator and paste your URL.
If it returns no schema, you have nothing.
If it returns Organization and nothing else, the minimum. You're missing.
The ideal: Organization or ProfessionalService, WebSite, and per page: Article with Person author, BreadcrumbList, FAQPage when applicable.

How to fix:

Implement JSON-LD in <head> with the correct types. If you use Next.js, the standard pattern is to generate the JSON from a central module and serialize it inside <script type="application/ld+json">.
Verify with Schema Validator and with Google Rich Results Test.

Reason 3: zero domain authority

LLMs (especially when responding without real-time browsing) prioritize citing brands that appear in many authority sites. If no one links to you, no one mentions you in publications, you're not in recognized directories, the LLM has no signal that you exist.

How to diagnose:

Search your brand + city in Google without quotes. Do mentions appear in third-party sites (not your own)? If only you appear, zero authority.
Search yourself in Wikipedia. Do you have a page? Are you cited in any?
Search yourself in major Mexican publications (Forbes Mexico, Expansión, Marketing4eCommerce). Are there mentions?

How to fix (slow process, 6-12 months):

Outreach to 5 publications in your industry asking for interview or opinion column.
Speaking at relevant events (generates mention + link in event site).
Case studies published with real clients (with permission, reciprocal link).
Being in recognized vertical directories (not link farms, but real directories like Clutch, GoodFirms for agencies).

Reason 4: generic content without clear opinion

If your copy is "we are leaders in innovative digital solutions", the LLM has nothing to extract. Empty phrases don't get cited. What gets cited are verifiable data, sentences with stance, concrete examples.

How to diagnose:

Open 3 of your pages (home, a service, a blog post).
Count: how many statements have a datum with source and link? How many sentences make sense read alone, out of context?
If most is brand prose without data, you're not extractable.

How to fix:

Rewrite critical pages with focus on data, not brand.
Each important statement must have a source with link, ideally external.
Self-contained sentences: each sentence in the paragraph should be quotable alone.

Reason 5: you block AI bots in robots.txt without knowing

Some brands, ill-advised, added blocks against GPTBot and similar thinking "Google is the only thing that matters" or out of fear of AI training. The result: zero presence in ChatGPT and others.

How to diagnose:

Open yourdomain.com/robots.txt.
Look for lines like User-agent: GPTBot followed by Disallow: /. If they're there, you're blocked.
Other agents to verify: ChatGPT-User, ClaudeBot, Claude-Web, PerplexityBot, Google-Extended, Amazonbot, CCBot, Bytespider.

How to fix:

For each agent you want to allow, add User-agent: BotName followed by Allow: /.
Consider leaving open at least the 4 majors (GPTBot, ClaudeBot, PerplexityBot, Google-Extended). Blocking them is an own goal.

Reason 6: your content is only in Spanish but the corpus is 90% English

Large LLMs (GPT, Claude, Gemini) were trained with mostly English corpus. For questions in Spanish, models do internal translations and may not find enough material in Spanish about your niche.

This doesn't mean "publish only in English". It means:

If your market is Mexico/LATAM, yes you publish in Spanish, but also publish an English version when the topic is technical (devs read in English).
For universal topics (GEO, schema.org, Next.js), having an EN version gives you citation advantage.

How to diagnose:

Ask ChatGPT in Spanish about your topic → see what it cites.
Ask the same in English → see what it cites. If the second is richer, there's corpus imbalance in your niche.

How to fix:

Configure i18n in your site if not yet.
Publish blog post in EN for the most technical topics. ChatGPT sometimes searches in EN even if the question is in ES.

Reason 7: your sitemap and architecture don't help deep indexing

If your sitemap.xml is static from 2022 or doesn't exist, if your architecture has no clear hierarchy (Home → Category → Service), if your internal linking is disorganized, crawlers (Google's and AI's) struggle to discover your best pages.

How to diagnose:

Open yourdomain.com/sitemap.xml. Does it exist? Is it updated (lastmod near today)?
Count how many clicks from your home to your best blog post. If it's more than 3, the content is buried.
Check Google Search Console which pages Google indexed vs how many you have. If the ratio is low (50% or less), bad architecture.

How to fix:

Dynamic sitemap that regenerates on each deploy.
Architecture with maximum 3 levels from home to content.
Internal linking between articles of the same cluster (each blog post links to 2-3 most relevant).

Quick diagnosis: which is your case

Decision table:

| Symptom | Probable reason | |---|---| | Your site is SPA in React/Vue, view-source empty | #1 (CSR without SSR) | | Schema Validator finds nothing | #2 (empty schema) | | Searching your brand only shows yourself | #3 (zero authority) | | Your texts are "leaders in solutions..." | #4 (content without opinion) | | robots.txt has Disallow for AI bots | #5 (accidental blocking) | | Your content only exists in Spanish | #6 (imbalanced corpus) | | Old sitemap or no sitemap | #7 (architecture) |

The most common we see in Mexican B2B sites: #1 + #2 + #4 at the same time. Three blockers reinforcing each other: the LLM doesn't read your HTML, doesn't know who you are, and even if it did, you wouldn't have quotable sentences.

If when you diagnose, 3 or more reasons apply, the path is deep rework, not patch. Write us and we do free technical audit with priorities by impact.