internhq.com
robots.txt

Robots Exclusion Standard data for internhq.com

Resource Scan

Scan Details

Site Domain internhq.com
Base Domain internhq.com
Scan Status Failed
Failure ReasonScan timed out.
Last Scan2024-05-25T17:00:57+00:00
Next Scan 2024-08-23T17:00:57+00:00

Last Successful Scan

Scanned2022-10-01T21:57:01+00:00
URL https://www.internhq.com/robots.txt
Response IP 13.33.88.84, 13.33.88.67, 13.33.88.21, 13.33.88.3
Found Yes
Hash c4310d5e8f16b2f86d47ea11ab36fd4c4cec6c6337c310d2ee5a59876215a483
SimHash 3b44d8138192

Groups

*

Rule Path
Disallow /*.pdf$

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://www.internhq.com/sitemap.xml

Comments

  • Don't index PDFs: