Close-up of a traffic light showing a green pedestrian and bicycle signal on an urban street. — robots.txt richtig konfigurieren: Praxis-Guide

SEO & GEO · Laura Murgia · March 2026 Updated Beginner Technical

Configuring robots.txt Correctly: The Practical Guide

The robots.txt follows the Robots Exclusion Standard and controls which parts of your website search engines crawl. The file is purely advisory: reputable bots follow it, malicious bots ignore it.

In 2026, the robots.txt has a new dimension: alongside Google and Bing, AI bots like GPTBot (OpenAI), Google-Extended (Gemini), and CCBot (Common Crawl) arrive. The question is no longer just "What should Google crawl?" but "Which AI systems may use your content?"

The robots.txt consists of user-agent blocks with allow and disallow rules:

User-agent: Names the bot (e.g., Googlebot, Bingbot, GPTBot). The wildcard * applies to all bots.

Disallow: Paths the bot should not crawl. An empty value allows everything.

Allow: Overrides a previous disallow for specific paths. Useful for exceptions within blocked directories.

Sitemap: References the XML sitemap. Google recommends specifying the sitemap URL here.

Since 2024, AI bots crawl the web systematically. The most important ones:

GPTBot (OpenAI): Crawls for ChatGPT training and real-time search. Allow = your content can appear in ChatGPT answers.

Google-Extended: Crawls for Gemini training. Independent from Googlebot — you can block Google-Extended without affecting your Google ranking.

CCBot (Common Crawl): Crawls for the Common Crawl dataset used by many AI models.

arocom recommends: allow GPTBot and Google-Extended (for GEO visibility), block CCBot (no direct benefit, high crawl overhead). Additionally, provide an AI.txt with "Preference: allow-with-attribution."

Drupal ships with a default robots.txt. For production websites, it must be customized:

Block admin paths (/admin/, /user/login)
Block internal search pages (/search/)
Selectively control pagination pages
Completely block staging environments
Add sitemap URL

In Drupal, you can maintain the robots.txt as a static file or generate it dynamically via the RobotsTxt module. arocom uses the static variant — it is faster and prevents errors from module updates.

Check your robots.txt: yourdomain.com/robots.txt. Does it accidentally block important pages? Are rules for AI bots missing? The Future Check by arocom reviews this systematically — as part of the technical SEO analysis.

Does the robots.txt protect my content from access?

No. The robots.txt is not a security mechanism. Reputable bots follow it, but anyone can access the content through a browser. For real access protection, you need authentication.

Can a wrong robots.txt destroy my ranking?

Yes. A Disallow: / blocks all crawling. Google then removes all pages from the index. This frequently happens during relaunches when the staging robots.txt is accidentally applied to the production environment.

Should I block or allow AI bots?

That depends on your GEO strategy. Those who want to be cited in AI answers must allow GPTBot and Google-Extended. Those who do not want this can block them. arocom recommends: allow with attribution preference.

Where do I find my robots.txt?

The robots.txt is always at yourdomain.com/robots.txt. In Google Search Console under Settings > robots.txt, you can check how Google interprets it.

How does SEO & GEO hold up on your website? The Future Check shows where the biggest levers are — in 2–4 weeks.

Request Future Check Or get in touch

Check your own site

How GEO-ready is your website?

Enter a URL — in 5 seconds you see schema, heading hierarchy, executive summary and more, evaluated for AI citation engines.

Analyzes the publicly visible structure of a website. No personal data.

Go deeper

GEO-SEO Guide

How to optimise your website for search engines and AI systems.

Was this article helpful?

Configuring robots.txt Correctly: The Practical Guide

Structure of a robots.txt

Controlling AI Bots in robots.txt

Configuring robots.txt in Drupal

Have Your robots.txt Reviewed

How GEO-ready is your website?

Go deeper

Read next

GEO-SEO Guide