🤖 robots.txt Builder — Create Your Crawler Rules
Build a robots.txt file visually. Add allow/disallow rules, block AI crawlers (GPTBot, CCBot), set crawl delay. Download instantly. Free online robots.txt generator.
Use the visual rules builder to add User-agent directives, allow/disallow paths, and optionally set a crawl delay. One-click presets for Allow All, Block All, Block AI Bots (GPTBot, CCBot, anthropic-ai), and SEO-Friendly configurations. Download the finished file.
How to Use
Choose a preset
Start with a preset: Allow All, Block All, Block AI Bots (GPTBot, ClaudeBot, etc.), or SEO-Friendly.
Customize the rules
Add or remove User-agent rules and Disallow/Allow paths using the rule builder below the presets.
Download your file
Add your sitemap URL (optional), then click Download to save your robots.txt file.
Frequently Asked Questions
Complete Guide: Robots.txt Generator
The robots.txt file is a plain-text file placed at the root of your website that instructs web crawlers which parts of your site they may or may not access. It implements the Robots Exclusion Protocol (REP), an informal standard that has been respected by major crawlers since 1994 and formally specified as an IETF standard in 2022.
File Format and Directives
Each block in robots.txt begins with a User-agent directive identifying the crawler, followed by one or more access rules:
User-agent: Googlebot
Disallow: /private/
Allow: /private/public-file.html
User-agent: *
Disallow: /admin/
Disallow: /cart/
Crawl-delay: 10
Sitemap: https://example.com/sitemap.xml
- User-agent — The crawler name. Use
*as a wildcard matching all crawlers not otherwise specified. - Disallow — Prevents crawling of the specified path. An empty value means "allow everything."
- Allow — Overrides a broader
Disallowfor a more specific path. Useful when blocking a directory but permitting one file within it. - Crawl-delay — Requests the crawler wait N seconds between requests. Google does not honor this; use Google Search Console's crawl rate settings instead.
- Sitemap — Points crawlers to your XML sitemap. Can appear outside any User-agent block.
Wildcard Patterns: * and $
Two wildcard characters are supported:
*matches any sequence of characters. Example:Disallow: /*.pdf$would block all PDF files, but only when combined with the end anchor.$matches the end of the URL. Example:Disallow: /*.json$blocks any URL ending in.json.
User-agent: *
# Block all URLs with ?session= parameter
Disallow: /*?session=
# Block all .xml files except sitemap
Disallow: /*.xml$
Allow: /sitemap.xml
Blocking AI Scrapers and Data Harvesters in 2024–2026
A wave of AI training crawlers has emerged, with their own User-agent strings. Add these blocks to opt out of AI training datasets:
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
Common Mistakes That Break Your Site
One of the most damaging mistakes is blocking CSS and JavaScript files, which prevents Google from rendering your pages correctly. Googlebot renders pages like a browser; if it cannot load your stylesheets or scripts, it may misinterpret your content or see a blank page:
# BAD — never do this
User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /assets/
Other common mistakes include: blocking the entire site during development and forgetting to unblock it, using Disallow: / without removing it before launch, and placing the file at a non-root path (it must be exactly at https://yourdomain.com/robots.txt).
Testing Your robots.txt
- Open Google Search Console → Settings → robots.txt to use the built-in tester.
- Enter specific URLs to see whether they are allowed or blocked.
- Use the Test button before saving any changes to production.
What robots.txt Cannot Do
This is critical: robots.txt is advisory, not a security mechanism. Malicious bots and scrapers do not respect it. Disallowing a URL in robots.txt does not prevent it from appearing in search results if other sites link to it. For true access control, use authentication, firewall rules, or server-level blocks. To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header instead.
Disallow vs noindex
Disallow in robots.txt tells crawlers not to visit a URL. The noindex directive tells crawlers not to include a visited URL in search results. They solve different problems. If you Disallow a page, Google can never read a noindex tag on it — so disallowed pages can still appear in results based on external links.
- Generate and validate your Sitemap to accompany your robots.txt.
- Ensure meta robots tags are set correctly with the Meta Tag Generator.