🤖 robots.txt Builder — Create Your Crawler Rules

Build a robots.txt file visually. Add allow/disallow rules, block AI crawlers (GPTBot, CCBot), set crawl delay. Download instantly. Free online robots.txt generator.

Use the visual rules builder to add User-agent directives, allow/disallow paths, and optionally set a crawl delay. One-click presets for Allow All, Block All, Block AI Bots (GPTBot, CCBot, anthropic-ai), and SEO-Friendly configurations. Download the finished file.

Presets:

Rules Builder

Sitemap URL (optional):

How to Use

Choose a preset

Start with a preset: Allow All, Block All, Block AI Bots (GPTBot, ClaudeBot, etc.), or SEO-Friendly.

Customize the rules

Add or remove User-agent rules and Disallow/Allow paths using the rule builder below the presets.

Download your file

Add your sitemap URL (optional), then click Download to save your robots.txt file.

Frequently Asked Questions

What is a robots.txt file? +

robots.txt is a file placed at the root of your website (example.com/robots.txt) that tells web crawlers which pages they can or cannot request. It follows the Robots Exclusion Protocol and is the first thing most crawlers fetch.

Does robots.txt prevent pages from being indexed? +

No — robots.txt prevents crawling, not indexing. If other pages link to a disallowed URL, Google can still index it without crawling it. To prevent indexing, use the noindex meta tag or X-Robots-Tag header.

How do I block AI training bots? +

Use the "Block AI Bots" preset. Common AI crawlers include: GPTBot (OpenAI), CCBot (Common Crawl), Google-Extended (Google AI), anthropic-ai (Anthropic), and ChatGPT-User. Add User-agent: BotName / Disallow: / for each.

What does "Disallow: /" mean? +

Disallow: / blocks all pages on the site for that user-agent. Disallow: /admin/ blocks only the /admin/ directory. Disallow: (empty) means allow everything. Allow: /public/ within a blocked section creates an exception.

Is robots.txt case-sensitive? +

Paths in robots.txt are case-sensitive on case-sensitive servers (most Linux servers). User-agent names are case-insensitive. So Disallow: /Admin/ and Disallow: /admin/ may be different paths.

Complete Guide: Robots.txt Generator

The robots.txt file is a plain-text file placed at the root of your website that instructs web crawlers which parts of your site they may or may not access. It implements the Robots Exclusion Protocol (REP), an informal standard that has been respected by major crawlers since 1994 and formally specified as an IETF standard in 2022.

File Format and Directives

Each block in robots.txt begins with a User-agent directive identifying the crawler, followed by one or more access rules:

User-agent: Googlebot
Disallow: /private/
Allow: /private/public-file.html

User-agent: *
Disallow: /admin/
Disallow: /cart/
Crawl-delay: 10

Sitemap: https://example.com/sitemap.xml

User-agent — The crawler name. Use * as a wildcard matching all crawlers not otherwise specified.
Disallow — Prevents crawling of the specified path. An empty value means "allow everything."
Allow — Overrides a broader Disallow for a more specific path. Useful when blocking a directory but permitting one file within it.
Crawl-delay — Requests the crawler wait N seconds between requests. Google does not honor this; use Google Search Console's crawl rate settings instead.
Sitemap — Points crawlers to your XML sitemap. Can appear outside any User-agent block.

Wildcard Patterns: * and $

Two wildcard characters are supported:

* matches any sequence of characters. Example: Disallow: /*.pdf$ would block all PDF files, but only when combined with the end anchor.
$ matches the end of the URL. Example: Disallow: /*.json$ blocks any URL ending in .json.

User-agent: *
# Block all URLs with ?session= parameter
Disallow: /*?session=

# Block all .xml files except sitemap
Disallow: /*.xml$
Allow: /sitemap.xml

Blocking AI Scrapers and Data Harvesters in 2024–2026

A wave of AI training crawlers has emerged, with their own User-agent strings. Add these blocks to opt out of AI training datasets:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

Common Mistakes That Break Your Site

One of the most damaging mistakes is blocking CSS and JavaScript files, which prevents Google from rendering your pages correctly. Googlebot renders pages like a browser; if it cannot load your stylesheets or scripts, it may misinterpret your content or see a blank page:

# BAD — never do this
User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /assets/

Other common mistakes include: blocking the entire site during development and forgetting to unblock it, using Disallow: / without removing it before launch, and placing the file at a non-root path (it must be exactly at https://yourdomain.com/robots.txt).

Testing Your robots.txt

Open Google Search Console → Settings → robots.txt to use the built-in tester.
Enter specific URLs to see whether they are allowed or blocked.
Use the Test button before saving any changes to production.

What robots.txt Cannot Do

This is critical: robots.txt is advisory, not a security mechanism. Malicious bots and scrapers do not respect it. Disallowing a URL in robots.txt does not prevent it from appearing in search results if other sites link to it. For true access control, use authentication, firewall rules, or server-level blocks. To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header instead.

Disallow vs noindex

Disallow in robots.txt tells crawlers not to visit a URL. The noindex directive tells crawlers not to include a visited URL in search results. They solve different problems. If you Disallow a page, Google can never read a noindex tag on it — so disallowed pages can still appear in results based on external links.

Generate and validate your Sitemap to accompany your robots.txt.
Ensure meta robots tags are set correctly with the Meta Tag Generator.