huffpost.com
robots.txt

Robots Exclusion Standard data for huffpost.com

Resource Scan

Scan Details

Site Domain huffpost.com
Base Domain huffpost.com
Scan Status Ok
Last Scan2024-04-29T21:25:00+00:00
Next Scan 2024-05-06T21:25:00+00:00

Last Scan

Scanned2024-04-29T21:25:00+00:00
URL https://huffpost.com/robots.txt
Redirect https://www.huffpost.com/robots.txt
Redirect Domain www.huffpost.com
Redirect Base huffpost.com
Domain IPs 18.155.68.100, 18.155.68.129, 18.155.68.56, 18.155.68.8
Redirect IPs 151.101.130.114, 151.101.194.114, 151.101.2.114, 151.101.66.114
Response IP 199.232.46.114
Found Yes
Hash cb8ba743aabc399ea7609cc71b6d1cbb359b95d1c2fa9aa9bb17035a7d8bcb71
SimHash 4e3c9f622561

Groups

grapeshot

Rule Path
Disallow /member
Disallow /*?*err_code=404
Disallow /search
Disallow /search/?*

*

Rule Path
Disallow /*?*page=
Disallow /member
Disallow /*?*err_code=404
Disallow /search
Disallow /search/?*
Disallow /mapi/v4/*/user/*
Disallow /embed

Other Records

Field Value
crawl-delay 4

googlebot

Rule Path
Allow /
Disallow /*?*err_code=404
Disallow /search
Disallow /search/?*

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.huffpost.com/sitemaps/sitemap-v1.xml
sitemap https://www.huffpost.com/sitemaps/sitemap-google-news.xml
sitemap https://www.huffpost.com/sitemaps/sitemap-google-video.xml
sitemap https://www.huffpost.com/sitemaps/sections.xml
sitemap https://www.huffpost.com/sitemaps-huffingtonpost/sitemap.xml
sitemap https://www.huffpost.com/sitemaps-huffingtonpost/sections.xml

Comments

  • Cambria robots
  • archives
  • huffingtonpost.com archive sitemaps