20minuts.com
robots.txt
Robots Exclusion Standard data for 20minuts.com
Resource Scan
Scan Details
Site Domain | 20minuts.com |
Base Domain | 20minuts.com |
Scan Status | Failed |
Failure Stage | Fetching resource. |
Failure Reason | Couldn't connect to server. |
Last Scan | 2024-06-09T02:05:03+00:00 |
Next Scan | 2024-09-07T02:05:03+00:00 |
Last Successful Scan
Scanned | 2022-09-28T03:08:03+00:00 |
URL | https://20minuts.com/robots.txt |
Redirect | https://www.20minuts.com/robots.txt |
Redirect Domain | www.20minuts.com |
Redirect Base | 20minuts.com |
Response IP | 172.67.158.100, 104.21.58.86 |
Found | Yes |
Hash | 493774da77569c77b9c090ebc00d90fdd55242109888672d0a0c744377fd03d5 |
SimHash | 224b50d24e05 |
Groups
*
Rule | Path |
---|---|
Disallow | /article/*/commentaires* |
Disallow | /resultats-examen/recherche/ |
Disallow | /resultats-examen/candidat/ |
Disallow | /embed/elections/resultats/ |
Disallow | /v-ajax |
Disallow | /v-esi |
Disallow | /search |
Other Records
Field | Value |
---|---|
sitemap | https://www.20minutes.fr/sitemap-arbo.xml |
Warnings
- 4 invalid lines.