Research

Research: why entity consensus affects AI citation.

Large language models do not browse the web, form opinions, or play favorites. They surface entities that appear consistently across independent, authoritative sources, and they ignore entities that don't. The research is clear on this. What follows is what the data says.

All statistics below link to their primary source. Peer-reviewed and large-scale industry research is cited first. Agency research is flagged as such.

91% of AI answers cite third-party sources, not the brand's own site Yext, 17.2M citations, Q4 2025

r = 0.66 correlation between brand mentions and AI visibility (vs. r = 0.10 for backlinks) Ahrefs, 75K brands

40% visibility boost from structured, cited content in generative engines Princeton, KDD 2024

11% of domains are cited by both ChatGPT and Perplexity Digital Bloom, 2025

Problem

Most companies are invisible to AI.

62% of enterprise brands are "technically invisible" to generative AI models.^[7] When asked direct, unbranded questions about their core services, AI failed to cite them in 81% of test cases. This is not a content quality problem. It is a corroboration problem.

AI systems do not index the web the way search engines do. They synthesize answers from what they have been trained on and what they can retrieve in real time. If your company exists only on your own website, you are a single source. Single sources get hedged, qualified, or omitted entirely.

The corroboration threshold

AI systems require approximately 2–3 independent, high-confidence sources corroborating the same claim before they commit to recommending a brand consistently. Below that threshold, AI hedges. Above it, AI asserts.^[6]

Only 11% of domains are cited by both ChatGPT and Perplexity.^[5] The remaining 89% are either invisible to one or both systems, or are referenced inconsistently. Only 12.4% of Fortune 1000 companies have valid Organization Schema linked to a Knowledge Graph ID.^[7]

Evidence

Off-site mentions correlate with AI citation more than backlinks.

The largest studies to date — covering 75,000 brands, 26,573 AI model calls, and 17.2 million citations — converge on the same conclusion: AI systems weight what independent sources say about you far more than what you say about yourself.

Brand mentions vs. backlinks

Ahrefs, 75,000 brands across ChatGPT, AI Mode, AI Overviews

Branded web mentions correlate with AI visibility at r = 0.664–0.711. Backlinks correlate at r ≈ 0.10. YouTube mentions are the single strongest predictor at r ≈ 0.737. Content volume shows almost no relationship: r ≈ 0.194.^[2]

The same brands appear across ChatGPT, AI Mode, and AI Overviews with a 0.779 correlation. The models largely agree on who to cite.^[2]

Source corroboration and recommendation strength

Surfer SEO, 922 prompts, 26,573 AI model calls, 289,105 URLs

Brands mentioned on a higher percentage of AI-cited source pages are ranked more strongly in AI recommendations, with a Spearman correlation of r = 0.41.^[3] This is the direct empirical link between off-site corroboration and AI recommendation position.

Blog posts — both brand-owned and third-party — show the strongest correlation to recommendation strength compared to all other content types.^[3]

Where AI citations actually come from

Yext, 17.2 million AI citations, Q4 2025

Only ~9% of AI-generated responses cite the brand's own website. The other 91% comes from third-party sources.^[4] Even when AI systems prefer your content on a per-URL basis, 91% of the total signal pathway runs through sources you do not own.

The citation method effect

Princeton, Accepted KDD 2024, peer-reviewed, 10,000 queries

Generative Engine Optimization methods can boost visibility up to 40% in generative engine responses. The single most impactful tactic: citing sources, which produced a 115.1% visibility increase for sites ranked 5th in SERP.^[1]

Adding statistics increased visibility by 22%. Adding quotations increased visibility by 37% on Perplexity. Keyword stuffing had a negative impact.^[1]

Proof

Volume without independence is worthless.

In a controlled experiment, Authoritas seeded 11 fictional experts into 600+ UK press articles. The result: zero AI recommendations across 9 AI models and 55 topic questions.^[6]

Press coverage alone, even at scale, does not produce AI citations. The fictional experts had volume. They did not have independent corroboration from authoritative sources. The AI models detected this.

What corroboration requires

Independence Multiple sources that are not derivatives of each other. Press releases republished across syndication networks count as one source.

Authority Sources that the AI system considers reliable. Wikipedia, Wikidata, Crunchbase, LinkedIn, and domain-specific databases carry weight.

Consistency The same core facts repeated across sources. Contradictory data across platforms reduces AI confidence.

Structure Machine-readable formats (Schema.org, JSON-LD) that AI systems can parse without inference. Brands with valid Organization Schema are 3.5× more likely to be correctly identified and cited.

Registry

What Entities publishes.

Each entity record is a structured, independently hosted source containing canonical facts, per-fact citations, disambiguation statements, and sameAs links to corresponding records on authoritative platforms.

Records contain only citable facts, not marketing copy. Every field can be cross-referenced against Wikidata, Crunchbase, LinkedIn, and the entity's own site — published in a machine-readable format designed for retrieval.

What a record contains

Canonical entity name and type
Core facts with per-fact source citations
Disambiguation statement
sameAs links to Wikidata, Crunchbase, LinkedIn
Schema.org JSON-LD output
Open API endpoint (no auth required)
dateModified for freshness signals

What a record does not contain

Marketing copy or value propositions
Taglines or brand messaging
Self-reported claims without public sources
Subjective descriptions or rankings
Paid placement or sponsored positioning

Distribution

Records are distributed as public web pages, Schema.org JSON-LD endpoints, an open REST API, and via llms.txt for AI agent discovery. Sites on 4+ platforms are 2.8× more likely to appear in ChatGPT responses.^[5]

Web

Public entity page with structured data

JSON-LD

Schema.org compliant machine-readable output

API

Open REST endpoint, no authentication

llms.txt

AI agent discovery feed

Tiers

Three listing levels.

All tiers produce structured, machine-readable entity data. Higher tiers add domain verification, per-fact citations, relationship graphs, and extended JSON-LD with products, services, and leadership.

Free

Listed

$1,000/yr

Verified Organization

1 organization

$10,000

Entity Graph

1-time fee

Canonical name, type, industry

✓

Disambiguation statement

✓

Basic JSON-LD

✓

Open API access

✓

Domain verification

✓

Relationship graphs

✓

llms.txt inclusion

✓

Per-fact source citations

✓

Linked people entities (founders, leadership)

✓

Linked product entities

✓

Linked method entities

✓

Products, services, partnerships

✓

Extended JSON-LD

✓

Submit an entity View full pricing

References

Sources

[1]
GEO: Generative Engine Optimization
Princeton / KDD 2024 · Peer-reviewed
[2]
AI Brand Visibility Correlations: 75,000 brands
Ahrefs, 2025 · Large-scale industry research
[3]
Brand Mention Effect on AI Recommendations: 26,573 AI model calls
Surfer SEO, 2025 · Large-scale industry research
[4]
AI Citation Behavior Across Models: 17.2M citations
Yext, Q4 2025 · Large-scale industry research
[5]
2025 AI Citation & LLM Visibility Report
Digital Bloom, 2025 · Industry research
[6]
Rand Fishkin proved AI recommendations are inconsistent: here's why
Authoritas via Search Engine Land, 2025 · Controlled experiment
[7]
2026 State of Generative Search: n=1,000 enterprise domains
Fuel Online, 2026 · Agency research