TL;DR — Schema.org now publishes monthly real-world usage statistics for every markup type and property, aggregated by domain and grouped into popularity buckets. For the first time, you can prioritise your structured data based on data instead of guesswork — a direct lever to be understood and cited by Google and generative AI alike.
On June 4, 2026, Schema.org released its first-ever public dataset on real-world structured data usage across the web. Built in collaboration with Google and the Schema.org community, the initiative is detailed in the official Schema.org blog announcement and was covered by Search Engine Land on June 10. Until now, nobody — not even agencies — knew precisely which markup types were actually deployed on the web.
What the dataset contains
The dataset shows, for each Schema.org vocabulary term (the types like Organization or Product, and the properties like price or telephone), how many domains use it. Counts are aggregated at the domain level — a site deploying markup on 500 pages counts as a single domain — and grouped into popularity range buckets (for example "10K–100K" domains) rather than exact figures, to reduce noise and protect privacy.
Raw files are available in CSV and JSON on Schema.org's official GitHub repository, updated monthly, and displayed inline on each term's page on schema.org. Core infrastructure types such as Organization, WebPage and BreadcrumbList appear on millions of distinct domains.
Why it matters
Structured data has long been driven blind: teams added it "just in case", never knowing what others were doing. This transparency arrives at a pivotal moment. Google has deprecated FAQ rich results, yet markup remains the most reliable way to make a page legible to machines — and it now feeds the generative answers tracked in Search Console.
With 68% of Google searches ending without a click, being correctly understood by AI Overviews, ChatGPT and Perplexity is no longer a bonus — it is the condition for existing in the answer. Markup is what lets these systems unambiguously identify your business, prices, reviews and authors.
What it changes for small and mid-sized businesses
For the first time, you can make markup decisions based on facts. Three concrete actions:
- Audit your markup against real adoption. Make sure the non-negotiable types (
Organization,WebPage,BreadcrumbList,Article) are in place — they are massively adopted because they work. - Prioritise high commercial-signal types. For e-commerce,
ProductandReview; for a local service,LocalBusiness. These are the types Google and AI actually use to cite and recommend. - Spot differentiation opportunities. A type relevant to your industry but under-deployed is a low-competition niche: you become visible there before your competitors do.
The machine-legibility debate is bigger than markup: it overlaps with the one about llms.txt files and AI access to your content. Schema.org, by contrast, remains a standard universally understood by Google and LLMs.
What this data does not tell you
A few important limits, so you don't over-read the dataset:
- Domain-level, not page-level — usage frequency inside a site isn't measured, only whether the term appears on at least one page.
- Buckets, not exact numbers — bucketing protects privacy but prevents any precise count.
- A single perspective — the data comes from Google's public crawl; Schema.org explicitly invites other crawlers to contribute, acknowledging that "a truly comprehensive view of the web requires multiple perspectives".
- Presence ≠ quality — the dataset shows that markup exists, not that it is valid, complete or well implemented.
The Cicéro take
Structured data is finally out of the black box. This isn't a transparency gimmick — it's a decision tool. While most will keep stacking markup out of habit, those who cross-reference these statistics with their industry will know exactly where to focus to get cited by AI. Markup alone won't make you rank — but in 2026, without it, you're invisible to the machines that now write the answers.
Frequently asked questions
What is the Schema.org usage statistics dataset?
Does structured data still matter for SEO in 2026?
How should I prioritise my markup using these statistics?
Sources
- → Schema.org Blog — official usage-statistics dataset announcement (June 4, 2026)
- → Search Engine Land — coverage and SEO analysis (June 10, 2026)
- → Schema.org — official documentation of types and properties
Growth and SEO content strategist, I founded Cicéro to help businesses build lasting organic visibility — on Google and in AI-generated answers alike. Every piece of content we produce is designed to convert, not just to exist.
LinkedIn