Cloudflare’s AI Crawler Rules Can Block Googlebot

0 0 3 minutes read

Cloudflare’s AI Crawler Rules Can Block Googlebot

Cloudflare is updating its way of identifying and blocking AI searches, which may result in Googlebot being blocked from sites that prevent AI training. The company announced this update as part of its second Content Independence Day.

The new controls allow websites to automatically control traffic based on three behaviors rather than a single “block AI bots” switch. They are live now for all customers, including the free tier. A different set of automatic changes goes into effect on September 15.

Three Ways to Filter AI Crawlers

Cloudflare now filters crawlers by what they do on the site rather than what they count as “AI.” The company divides AI use cases into three categories:

Searches index the site to answer questions later, and Cloudflare associates this behavior with referral traffic.
Agent, real-time bots that work for a person, like ChatGPT-User or browser agents like Gemini or Claude using Chrome.
Training, a crawl that pulls content to train or fine-tune a model.

Cloudflare says bot operators should run separate scans for each behavior so websites can see why the bot is visiting and decide whether to allow or block it.

What’s Changing September 15th

Two automatic changes go into effect on September 15. For new customers and new sites for existing customers, Coaching and Agent Searches will be automatically blocked from pages that display ads, while Search will remain enabled. Cloudflare’s press release also states that existing free customers who have not changed their settings by September 15 will be moved to this default.

The second change goes further. Cloudflare will begin treating multipurpose searches based on their overall behavior, using a strict rule of thumb. For example, a browser that does both Search and Training will be blocked if the site blocks training. Cloudflare uses Googlebot, Applebot, and Bingbot as examples, as each crawls in both search and AI training. If a site has already enabled the old “Block AI bots” setting, it will be covered by this new rule.

If you want to keep those searches, you can update or change these settings in your Cloudflare dashboard anytime before September 15. Cloudflare says it will continue to notify customers before the date.

New Features of How Bots Use Content

Cloudflare also checks for a content usage signal that extends Content Signals to robots.txt. It consists of three standards, from the most to the least restrictive: immediate, which does not retain anything; reference, which points and links back and is a new default; and it is full, compact and reproducible. Cloudflare says these are preferences and are not blocking in themselves.

The company has updated the definition of “Verified” for bots. Now, the verified bot is not allowed by default everywhere; rather, its reach depends on its category. Additionally, bots that replicate content at all are not eligible for verification. Cloudflare has launched a searchable directory, BotBase, for Enterprise Bot Management users, which shows the classification of each tracked bot and the detection ID that can be copied for security rules.

Post-Transformation Report

The update came with a Cloudflare report marking one year since the first day of content freedom. According to the report, AI training now accounts for the majority of search requests in its network, up from about 20% in the spring of 2025. It also notes that daily requests for an AI agent increased more than 1,700% during the year. These statistics are based on Cloudflare network traffic and are not representative of the entire web.

Why This Matters

The September 15 rule integrates AI training blocks to search for transparency on the Cloudflare network. If a site blocks Training to protect its content from AI models, it may also block Googlebot unintentionally, since Cloudflare’s block works at the network level, making it harder to pass than a simple line of robots.txt that Google can ignore since Cloudflare’s block works at the network level, since robots.txt is a directive that will advise searchers. Losing Googlebot access means that the site will not be indexed effectively, which can ultimately affect its visibility in search results.

I followed publishers who moved to an auto-reject setup and blocked both retrieval and training more bots last year. The exposure is the same each time. Blocking the training layer may also block the search layer that keeps the site accessible.

Looking Forward

Websites using Cloudflare should review their AI blocking settings on September 15, and decide whether to keep search engines enabled. The integrated transparency rule mainly affects those who have previously turned on “Block AI bots” and have not adjusted their settings since then. Free users who do not change their settings will have them updated to the new defaults that day.

Cloudflare wants mixed-purpose crawler operators to classify those bots by behavior next year. Whether major operators differentiate their bots by behavior will determine whether this becomes a real decision, rather than a compromise between preventing AI training and maintaining search visibility.

Featured Image: jackpress/Shutterstock