breitbart.com
robots.txt
Robots Exclusion Standard data for breitbart.com
Resource Scan
Scan Details
Site Domain | breitbart.com |
Base Domain | breitbart.com |
Scan Status | Ok |
Last Scan | 2024-09-27T11:33:37+00:00 |
Next Scan | 2024-10-04T11:33:37+00:00 |
Last Scan
Scanned | 2024-09-27T11:33:37+00:00 |
URL | https://breitbart.com/robots.txt |
Redirect | https://www.breitbart.com/robots.txt |
Redirect Domain | www.breitbart.com |
Redirect Base | breitbart.com |
Domain IPs | 34.117.28.18 |
Redirect IPs | 34.117.28.18 |
Response IP | 34.117.28.18 |
Found | Yes |
Hash | cfc6c72358b4b744656bd0e9134fb963384cab325b13843ae2e0ab80c7ae0c13 |
SimHash | 07559e124633 |
Groups
*
Rule | Path |
---|---|
Disallow | /cgi-bin |
Disallow | /wp-admin |
Disallow | /wp-includes |
Disallow | /wp-content |
Disallow | /xmlrpc.php |
Disallow | /trackback/ |
Disallow | /comment-page- |
Disallow | /_wp_link_placeholder |
Other Records
Field | Value |
---|---|
sitemap | https://www.breitbart.com/sitemap_index.xml |
sitemap | https://www.breitbart.com/sitemap_news.xml |
sitemap | https://www.breitbart.com/sitemap_default.xml |
sitemap | https://www.breitbart.com/sitemap_video.xml |
sitemap | https://www.breitbart.com/news_sitemap.xml |
sitemap | https://www.breitbart.com/default_sitemap.xml |