JSON-LD structured data code on a screen, illustrating Schema.org markup analyzed at web scale

The essentials

  • On June 4, 2026, Schema.org and Google released the first public dataset measuring how structured data markup is actually used across the web.
  • Counts are aggregated per domain and shown in popularity ranges (e.g. "1 to 10 million domains"), updated monthly, available in CSV and JSON.
  • For the first time, you can prioritize your markup on real data rather than guesswork — a valuable signal for SEO and for AI citability.

On June 4, 2026, Schema.org and Google released the first public dataset measuring how structured data markup is actually used across the web, according to an announcement on the official Schema.org blog. In practice: for every markup type — Product, Article, FAQPage, LocalBusiness and more — we now know on how many domains it's truly deployed. Unprecedented transparency on a field that, until now, ran mostly on intuition.

Direct answer: what does the new Schema.org dataset show? It reports, type by type, how many domains use each markup, grouped into popularity ranges rather than raw counts. The data comes from Google's public web crawl, is aggregated at the domain level (one site counts once, no matter how many pages it has) and updated monthly.

What the dataset contains

Described as "a collaboration between Google and the Schema.org community," the dataset does not publish exact counts. It places each term into a bucket — for example "10K to 100K domains" or "1 to 10 million domains." Schema.org justifies this in its official documentation: buckets keep the data stable and protect site privacy by preventing competitors from tracking small changes on a specific domain.

The files are published on Schema.org's official GitHub repository in CSV and JSON, refreshed monthly, and the usage frequencies also appear directly on each term's page. According to Search Engine Land's analysis, the most widespread types — Product, Review, Article, FAQPage, LocalBusiness — sit in the one-million-to-ten-million domains range, while the core properties name, description, image and url top the ranking.

Why this is a valuable signal for SEO and GEO

Structured markup describes your content in a language Google and generative engines understand. The problem until now: nobody really knew what the web used. People marked up "just in case," without a reference point. These statistics change that on three fronts:

  • Identify the expected foundations. A type present on millions of domains (Article, Product, FAQPage) has become a de facto standard. If your sector uses it heavily and you skip it, you start with a readability handicap.
  • Spot the blind spots. Relevant but lightly adopted types can become an edge: data your competitors don't expose is data engines can only extract from you.
  • Strengthen AI citability. Content that's well described — author, date, source, entity — is more easily picked up by ChatGPT, Perplexity or AI Overviews. That's exactly the logic of Google's official guide to optimizing for AI.

Worth noting: the popularity of FAQPage is telling. Google has nonetheless retired most FAQ rich results from its results pages. Proof that a type can stay massively deployed for its structuring value — useful to AI — even when its enhanced display disappears.

Is your markup sending the right signals to Google and AI, or working against you?
Cicéro Studio combines GEO audit, editorial production and automated semantic internal linking, from €250 to €1,800 per month. Agency-quality work, software-grade productivity.

What to do now

No need to rewrite everything. Three concrete actions turn this dataset into an advantage:

  1. Map your current markup with Google's Rich Results Test, then compare it against the dominant types for your sector in the dataset.
  2. Fill the missing foundations in JSON-LD: Organization, Article or Product, BreadcrumbList, and the key properties (name, description, image, url, author, datePublished).
  3. Prioritize quality, not volume. Poorly populated or misleading markup is worthless. Consistency between your markup and your visible content remains the rule, on Google as in the generative answers where your brand's credibility is now decided.

What these statistics do not say

The dataset measures adoption, not effectiveness. A widespread type is neither mandatory nor a ranking factor: Google has repeated it — markup helps engines understand and display content, it doesn't make it rank better on its own. The numbers reflect only Google's indexed web — sites blocked by robots.txt are excluded — and don't distinguish formats (JSON-LD, Microdata, RDFa). Finally, the buckets stay deliberately wide: they show a trend, not a precise counter. Read it as a map of the landscape, not a verdict.

The Cicéro take

This release pulls structured data out of the realm of belief. We knew how to mark up; we didn't know what mattered at web scale. Now we prioritize on hard data. But the real battle of 2026 isn't adding more markup — it's describing your content so clearly, so well-sourced and so consistently that no engine, human or AI, has a reason to prefer a competitor. Markup is the grammar. Content is still the point.

Frequently asked questions

What does the new Schema.org dataset show?
It shows, for each markup type (Product, Article, FAQPage, LocalBusiness…), how many domains use it, grouped into popularity ranges such as "1 to 10 million domains." The data comes from Google's public web crawl, is aggregated per domain and updated monthly. Announced June 4, 2026 on the official Schema.org blog.
Are these usage statistics a Google ranking factor?
No. A widely used type is neither mandatory nor an advantage for ranking. The statistics measure adoption across the web, not SEO effectiveness. They help you prioritize and understand the markup landscape, not guarantee a position gain.
How do I use this data for SEO and GEO?
Identify the most adopted types for your sector (Article, Product, FAQPage, LocalBusiness): those are the expected foundations. Implement them cleanly in JSON-LD, then differentiate with well-populated properties (author, source, date) that strengthen E-E-A-T and citability by AI engines such as ChatGPT, Perplexity and AI Overviews.

Sources

Alexis Dollé, founder of Cicéro
Alexis Dollé
CEO & Founder

Growth and SEO content strategist, I founded Cicéro to help businesses build lasting organic visibility — on Google and in AI-generated answers alike. Every piece of content we produce is designed to convert, not just to exist.

LinkedIn