robots.txt Generator
Generate a robots.txt file to control search engine crawling.
What Is robots.txt?
robots.txt is a plain-text file placed at the root of a website that tells search engine
crawlers (like Googlebot and Bingbot) which pages or directories they are allowed or not allowed to crawl.
It follows the Robots Exclusion Protocol, an informal standard since 1994.
The file does not prevent pages from being indexed — it only controls crawling. To
prevent indexing, use the noindex meta tag or HTTP header instead.
How robots.txt Works
When a search engine crawler visits your site, it first checks https://yoursite.com/robots.txt. The file contains directives like:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public/
Sitemap: https://yoursite.com/sitemap.xml
- User-agent: Which crawler the rules apply to (
*means all crawlers). - Disallow: Paths the crawler should not access.
- Allow: Exceptions within disallowed directories.
- Sitemap: Location of your XML sitemap for discovery.
Common Use Cases
- Hide admin panels: Prevent search engines from crawling
/admin/or/wp-admin/. - Block duplicate content: Disallow URL parameters or print-friendly page versions.
- Protect resources: Block crawlers from API endpoints or internal tools.
- Link to your sitemap: Help crawlers find your sitemap automatically.
Best Practices
- Always place
robots.txtat the root:https://yoursite.com/robots.txt - Do not use robots.txt to hide sensitive data — anyone can read the file. Use authentication instead.
- Test your robots.txt with Google Search Console's robots.txt Tester.
- Include a
Sitemap:directive pointing to your XML sitemap.