Research: why entity consensus affects AI citation.
Large language models do not browse the web, form opinions, or play favorites. They surface entities that appear consistently across independent, authoritative sources, and they ignore entities that don't. The research is clear on this. What follows is what the data says.
All statistics below link to their primary source. Peer-reviewed and large-scale industry research is cited first. Agency research is flagged as such.
Most companies are invisible to AI.
62% of enterprise brands are "technically invisible" to generative AI models.[7] When asked direct, unbranded questions about their core services, AI failed to cite them in 81% of test cases. This is not a content quality problem. It is a corroboration problem.
AI systems do not index the web the way search engines do. They synthesize answers from what they have been trained on and what they can retrieve in real time. If your company exists only on your own website, you are a single source. Single sources get hedged, qualified, or omitted entirely.
The corroboration threshold
AI systems require approximately 2–3 independent, high-confidence sources corroborating the same claim before they commit to recommending a brand consistently. Below that threshold, AI hedges. Above it, AI asserts.[6]
Only 11% of domains are cited by both ChatGPT and Perplexity.[5] The remaining 89% are either invisible to one or both systems, or are referenced inconsistently. Only 12.4% of Fortune 1000 companies have valid Organization Schema linked to a Knowledge Graph ID.[7]
Off-site mentions correlate with AI citation more than backlinks.
The largest studies to date — covering 75,000 brands, 26,573 AI model calls, and 17.2 million citations — converge on the same conclusion: AI systems weight what independent sources say about you far more than what you say about yourself.
Brand mentions vs. backlinks
Ahrefs, 75,000 brands across ChatGPT, AI Mode, AI Overviews
Branded web mentions correlate with AI visibility at r = 0.664–0.711. Backlinks correlate at r ≈ 0.10. YouTube mentions are the single strongest predictor at r ≈ 0.737. Content volume shows almost no relationship: r ≈ 0.194.[2]
The same brands appear across ChatGPT, AI Mode, and AI Overviews with a 0.779 correlation. The models largely agree on who to cite.[2]
Source corroboration and recommendation strength
Surfer SEO, 922 prompts, 26,573 AI model calls, 289,105 URLs
Brands mentioned on a higher percentage of AI-cited source pages are ranked more strongly in AI recommendations, with a Spearman correlation of r = 0.41.[3] This is the direct empirical link between off-site corroboration and AI recommendation position.
Blog posts — both brand-owned and third-party — show the strongest correlation to recommendation strength compared to all other content types.[3]
Where AI citations actually come from
Yext, 17.2 million AI citations, Q4 2025
Only ~9% of AI-generated responses cite the brand's own website. The other 91% comes from third-party sources.[4] Even when AI systems prefer your content on a per-URL basis, 91% of the total signal pathway runs through sources you do not own.
The citation method effect
Princeton, Accepted KDD 2024, peer-reviewed, 10,000 queries
Generative Engine Optimization methods can boost visibility up to 40% in generative engine responses. The single most impactful tactic: citing sources, which produced a 115.1% visibility increase for sites ranked 5th in SERP.[1]
Adding statistics increased visibility by 22%. Adding quotations increased visibility by 37% on Perplexity. Keyword stuffing had a negative impact.[1]
Volume without independence is worthless.
In a controlled experiment, Authoritas seeded 11 fictional experts into 600+ UK press articles. The result: zero AI recommendations across 9 AI models and 55 topic questions.[6]
Press coverage alone, even at scale, does not produce AI citations. The fictional experts had volume. They did not have independent corroboration from authoritative sources. The AI models detected this.
What corroboration requires
What Entities publishes.
Each entity record is a structured, independently hosted source containing canonical facts, per-fact citations, disambiguation statements, and sameAs links to corresponding records on authoritative platforms.
Records contain only citable facts, not marketing copy. Every field can be cross-referenced against Wikidata, Crunchbase, LinkedIn, and the entity's own site — published in a machine-readable format designed for retrieval.
What a record contains
- Canonical entity name and type
- Core facts with per-fact source citations
- Disambiguation statement
- sameAs links to Wikidata, Crunchbase, LinkedIn
- Schema.org JSON-LD output
- Open API endpoint (no auth required)
- dateModified for freshness signals
What a record does not contain
- Marketing copy or value propositions
- Taglines or brand messaging
- Self-reported claims without public sources
- Subjective descriptions or rankings
- Paid placement or sponsored positioning
Distribution
Records are distributed as public web pages, Schema.org JSON-LD endpoints, an open REST API, and via llms.txt for AI agent discovery. Sites on 4+ platforms are 2.8× more likely to appear in ChatGPT responses.[5]
Web
Public entity page with structured data
JSON-LD
Schema.org compliant machine-readable output
API
Open REST endpoint, no authentication
llms.txt
AI agent discovery feed
Three listing levels.
All tiers produce structured, machine-readable entity data. Higher tiers add domain verification, per-fact citations, relationship graphs, and extended JSON-LD with products, services, and leadership.
Free
Listed
$1,000/yr
Verified Organization
1 organization
$10,000
Entity Graph
1-time fee
Sources
-
[1]
GEO: Generative Engine Optimization
Princeton / KDD 2024 · Peer-reviewed
-
[2]
AI Brand Visibility Correlations: 75,000 brands
Ahrefs, 2025 · Large-scale industry research
-
[3]
Brand Mention Effect on AI Recommendations: 26,573 AI model calls
Surfer SEO, 2025 · Large-scale industry research
-
[4]
AI Citation Behavior Across Models: 17.2M citations
Yext, Q4 2025 · Large-scale industry research
-
[5]
2025 AI Citation & LLM Visibility Report
Digital Bloom, 2025 · Industry research
-
[6]
Rand Fishkin proved AI recommendations are inconsistent: here's why
Authoritas via Search Engine Land, 2025 · Controlled experiment
-
[7]
2026 State of Generative Search: n=1,000 enterprise domains
Fuel Online, 2026 · Agency research
Start with the data.
Submit your organization for a free listing, or explore the registry to see how structured entity records work.