Blog Insights
AI Crawlers Are Breaking .Gov Websites — And What You Can Do About It

In recent years, AI-powered web crawlers have shifted from a niche technical consideration to a pressing operational challenge for government websites. Platforms like OpenAI, Anthropic, Perplexity, and a growing list of smaller AI companies are aggressively indexing .gov domains to feed large language models (LLMs). While these tools can improve information access for the public, the way they interact with websites is creating unexpected and sometimes severe infrastructure headaches.

What Are AI Crawlers?

A crawler, or “bot,” is an automated program that scans and extracts information from websites. While this technology has been used for decades by search engines, AI crawlers act more aggressively. Many ignore robots.txt instructions, spoof legitimate user agents, target sitemap.xml and search pages, and operate without authentication or prior notice. Some common examples include Amazonbot, CCBot, ClaudeBot, facebookexternalhit, and python-requests.

The Impact on .Gov Websites

When AI crawlers hit government sites, the effects can be disruptive and costly. Traffic can spike to ten times the normal load in minutes, and requests for dynamic URLs can bypass caches, forcing Drupal to generate fresh pages for each request. This leads to slower performance, failed builds, server timeouts, and higher server and CDN expenses. In extreme cases, the symptoms can resemble a denial-of-service (DoS) attack, even though the source is not malicious.

Detecting Problematic Crawls

The first step in managing AI crawler traffic is recognizing when it occurs. Monitoring cache hit/miss ratios, server response times, and unusual traffic patterns with tools like New Relic, Blackfire.io, or Cloudflare analytics can reveal potential issues. Reviewing logs for repetitive user agents and doing IP lookups helps confirm the source and legitimacy of traffic. Even simple tools like GoAccess or Drupal’s watchdog logs can be valuable for spotting trends.

Mitigation Strategies

Once you identify a problem, a few targeted steps can make a significant difference:

  • Block or challenge at the edge: Use services like Cloudflare WAF, Akamai, or Anubis to filter AI crawlers before they reach your site.
  • Optimize caching: Normalize query strings, reduce unnecessary dynamic pages, and cache heavy elements separately to serve more requests from the cache.
  • Rate limit or hide high-cost endpoints: Restrict access to pages that are resource-intensive or easily exploited by automated bots.

Drupal-Specific Defenses

For Drupal-powered government sites, there are ways to harden your configuration:

  • Limit uncacheable views and reduce exposed filters in the Search API.
  • Tune Page Cache and Dynamic Page Cache, and enable BigPipe for large pages.
  • Consider modules like Bot Blocker or Facet Bot Blocker (If using Facets) for additional protection.

Balancing Accessibility and Protection

Not all AI crawler activity is harmful. For some public-sector content, such as guides, open data, and research publications, allowing AI indexing can improve discoverability in AI-powered search tools like ChatGPT and Claude. The key is to whitelist high-value public pages while blocking sensitive or high-cost areas, using headers or robots.txt for selective discoverability.

Key Takeaways

AI crawlers are not inherently malicious, but without careful monitoring and configuration, they can strain infrastructure and inflate costs. Combining caching best practices, bot management, and Drupal-specific optimizations will help keep government websites secure, performant, and accessible to the public.

Learn More at Drupal GovCon 2025

Join us at Drupal GovCon 2025 for my session, AI Crawlers Are Breaking .Gov Websites, on Thursday, August 14th from 9:00–9:45 a.m. in the Charles Carroll Room. You can also catch Mapping Success: Building Effective Product Roadmaps for Drupal Projects, presented by Zachary Grimshaw, on Friday, August 15th from 10:00–10:45 a.m. in the Margaret Brent Room. And don’t miss the Acquia Community Party on Thursday evening at The Hall CP, where we’ll be co-sponsoring an evening of networking, music, and great conversation. If you’ll be at the conference or want to connect on the topic of AI crawlers and Drupal site performance, we’d love to hear from you.

Written by

Are you ready to create impact?

We'd love to connect and discuss your next project.