The robots.txt is a text file in the root directory of your website that tells search engine bots which areas they may crawl and which not. It is not a security mechanism but a control instrument for crawling and indexing. arocom configures the robots.txt in every Drupal project including rules for AI bots like GPTBot and Google-Extended.
Close-up of a traffic light showing a green pedestrian and bicycle signal on an urban street. — robots.txt richtig konfigurieren: Praxis-Guide

Configuring robots.txt Correctly: The Practical Guide

Last updated: March 2026 · Reading time: 6 minutes

The robots.txt follows the Robots Exclusion Standard and controls which parts of your website search engines crawl. The file is purely advisory: reputable bots follow it, malicious bots ignore it.

In 2026, the robots.txt has a new dimension: alongside Google and Bing, AI bots like GPTBot (OpenAI), Google-Extended (Gemini), and CCBot (Common Crawl) arrive. The question is no longer just "What should Google crawl?" but "Which AI systems may use your content?"

Structure of a robots.txt

The robots.txt consists of user-agent blocks with allow and disallow rules:

User-agent: Names the bot (e.g., Googlebot, Bingbot, GPTBot). The wildcard * applies to all bots.

Disallow: Paths the bot should not crawl. An empty value allows everything.

Allow: Overrides a previous disallow for specific paths. Useful for exceptions within blocked directories.

Sitemap: References the XML sitemap. Google recommends specifying the sitemap URL here.

Controlling AI Bots in robots.txt

Since 2024, AI bots crawl the web systematically. The most important ones:

GPTBot (OpenAI): Crawls for ChatGPT training and real-time search. Allow = your content can appear in ChatGPT answers.

Google-Extended: Crawls for Gemini training. Independent from Googlebot — you can block Google-Extended without affecting your Google ranking.

CCBot (Common Crawl): Crawls for the Common Crawl dataset used by many AI models.

arocom recommends: allow GPTBot and Google-Extended (for GEO visibility), block CCBot (no direct benefit, high crawl overhead). Additionally, provide an AI.txt with "Preference: allow-with-attribution."

Configuring robots.txt in Drupal

Drupal ships with a default robots.txt. For production websites, it must be customized:

  • Block admin paths (/admin/, /user/login)
  • Block internal search pages (/search/)
  • Selectively control pagination pages
  • Completely block staging environments
  • Add sitemap URL

In Drupal, you can maintain the robots.txt as a static file or generate it dynamically via the RobotsTxt module. arocom uses the static variant — it is faster and prevents errors from module updates.

Have Your robots.txt Reviewed

Check your robots.txt: yourdomain.com/robots.txt. Does it accidentally block important pages? Are rules for AI bots missing? The Future Check by arocom reviews this systematically — as part of the technical SEO analysis.

Does the robots.txt protect my content from access?

No. The robots.txt is not a security mechanism. Reputable bots follow it, but anyone can access the content through a browser. For real access protection, you need authentication.

Can a wrong robots.txt destroy my ranking?

Yes. A Disallow: / blocks all crawling. Google then removes all pages from the index. This frequently happens during relaunches when the staging robots.txt is accidentally applied to the production environment.

Should I block or allow AI bots?

That depends on your GEO strategy. Those who want to be cited in AI answers must allow GPTBot and Google-Extended. Those who do not want this can block them. arocom recommends: allow with attribution preference.

Where do I find my robots.txt?

The robots.txt is always at yourdomain.com/robots.txt. In Google Search Console under Settings > robots.txt, you can check how Google interprets it.

Read more

Discover a random article

Online Advertising...
Pagespeed Optimiza...
Featured Snippets ...
Web Presence 2026:...
Information Archit...
Google Analytics 4...
Episerver vs. Drup...
WDF*IDF: Term Weig...

Questions about this topic? We'd love to help.

Free · PDF document

GEO & SEO Guide

Guide: How to optimize your website for search engines and AI systems.

Was this article helpful?