iPhone showing the Siri assistant, illustrating how Apple Intelligence uses web content

The gist

  • On June 8, 2026, Apple rewrote its "About Applebot" documentation: crawled content is now used to train its foundation models and power answers in Siri and Apple Intelligence.
  • Two distinct control levers: Applebot-Extended in robots.txt (blocks training) and the nosnippet meta tag (blocks use as context in answers).
  • Blocking both mechanisms does not remove your site from Apple's search index: training, citation and indexing are controlled separately.
  • The point isn't to block everything, but to decide: being cited in Apple's answers is a visibility opportunity as much as a risk to weigh.

On June 8, 2026, Apple published a rewrite of its official "About Applebot" documentation, extending the crawler's role well beyond search indexing. The document formalizes for the first time that data fetched by Applebot "may be used to help train Apple foundation models powering generative AI features", explicitly including Apple Intelligence and Siri. This is a change in nature: the crawler that once served Spotlight and Siri becomes a feeder for Apple's consumer AI.

Direct answer: does your website now feed Siri and Apple Intelligence? Yes, by default. As of June 8, 2026, Apple can use content crawled by Applebot to train its models and generate answers, unless you explicitly enable the opt-outs (Applebot-Extended in robots.txt and the nosnippet meta tag).

What Apple changed exactly

Until now, Applebot had one clear job: index the web for Siri suggestions and Spotlight search. The new documentation adds an explicit AI section. Two uses now coexist:

  • Training. Crawled data "may help train Apple foundation models" that power generative features across its products.
  • Real-time generation. Apple states the data may "provide additional context and up-to-date content when models generate output", for example broad world-knowledge questions in Siri and Search, with links to the sources used.

This second mechanism matters most for visibility: it decides whether your site is cited as a source in a Siri answer. It's the same logic that already structures optimizing for Google's generative answers: being the clean, reliable source the model wants to reuse.

The two control levers (don't confuse them)

Apple documents two independent controls. Conflating them is the most common mistake:

1. Applebot-Extended, for training

To stop your content from training Apple's foundation models, add a rule disallowing the Applebot-Extended agent in your robots.txt. Crucial point: Applebot-Extended does not crawl on its own. It only determines how data already fetched by Applebot can be used. Blocking it therefore does not affect your indexing in Apple search.

2. The nosnippet meta tag, for generation

To stop Apple from using your page as context in a generated answer, apply the nosnippet meta tag. Apple states it "will not use data tagged nosnippet" as additional context. Note: this control applies at the whole-page level, not section by section.

Key takeaway: blocking Applebot-Extended and applying nosnippet does not remove your site from Apple's search index. The three functions (indexing, training, citation) are controlled separately. You can stay findable while refusing training, or accept citation while blocking training.

Why this matters for SMBs

Apple Intelligence and the new Siri ship on more than a billion active iPhones. When Siri answers a world-knowledge or industry question by citing its sources, being among those sources becomes a real visibility channel, just like AI Overviews or Perplexity. The technical context points the same way: AI crawlers already accounted for 4.2% of all HTML requests on Cloudflare's network in 2025, a signal that AI bots now weigh as much as human traffic.

The defensive reflex, "I'll block everything," is rarely the right one. Cutting yourself off from Apple Intelligence means giving up appearing in the answers of an assistant that reaches hundreds of millions of users. The real question isn't "how do I protect myself" but "is my content clear and credible enough that an AI cites it rather than a competitor's?" That's exactly the territory of brand credibility in AI answers.

Is your content cited by Siri, ChatGPT and Google, or invisible to AI?
Cicéro Studio combines GEO audit, editorial production and automated semantic linking, from €250 to €1,800 per month. Agency-quality work, software-grade productivity.

What to do now

  • Audit your robots.txt. Check whether Applebot-Extended is already mentioned. Without an explicit rule, you allow training by default. Decide knowingly, don't inherit the default setting.
  • Separate training vs citation. Many brands are better off refusing training (Applebot-Extended) while staying citable (no nosnippet) to capture visibility in Siri answers.
  • Strengthen citability. Direct answers, named sources, verifiable data, clear structure. That's the difference between content used as a source and content ignored, on Apple as everywhere, now that 68% of Google searches already end without a click.

What this announcement does not say

Apple does not disclose how much traffic or how many citations Apple Intelligence currently sends back to publishers, so there's no way to quantify the real visibility gain at this stage. The documentation also doesn't give a per-market rollout timeline or the exact impact on sites that already block Applebot. Finally, it doesn't address the copyright question around training data, which remains open. This article describes the official control mechanisms, not a guarantee of results.

The Cicéro take

Apple is normalizing what Google, OpenAI and Perplexity have already established: your content is fuel for AI, and you get to accept it or not. The trap would be treating this as a purely defensive topic. The right stance is offensive: decide clearly what to block, then do everything to be the source the AI chooses to cite. Visibility in 2026 is won in the answers, not just in the blue links.

Frequently asked questions

What is Applebot and what did Apple change on June 8, 2026?
Applebot is Apple's web crawler, historically dedicated to indexing for Siri and Spotlight. On June 8, 2026, Apple rewrote its "About Applebot" documentation to formalize that crawled data may now help train its foundation models and power generative answers in Siri, Apple Intelligence and its developer tools.
How do I stop Apple from using my content to train its AI models?
Add a rule disallowing the Applebot-Extended user agent in your robots.txt file. This blocks the use of your content for training Apple's foundation models without affecting your indexing in Apple search. To block the use of your content as context in generated answers, apply the nosnippet meta tag at the page level.
Does blocking Applebot-Extended remove my site from Apple search?
No. Applebot-Extended does not crawl on its own: it only determines how data already fetched by Applebot is used. Blocking Applebot-Extended and applying nosnippet does not remove your site from Apple's search index. Training, citation and indexing controls operate independently.

Sources

Alexis Dollé, founder of Cicéro
Alexis Dollé
CEO & Founder

Growth and SEO content strategist, I founded Cicéro to help businesses build lasting organic visibility, on Google and in AI-generated answers alike. Every piece of content we produce is designed to convert, not just to exist.

LinkedIn