triangletribune.com
robots.txt

Robots Exclusion Standard data for triangletribune.com

Resource Scan

Scan Details

Site Domain triangletribune.com
Base Domain triangletribune.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-11-02T23:48:25+00:00
Next Scan 2025-01-31T23:48:25+00:00

Last Successful Scan

Scanned2023-07-12T15:56:31+00:00
URL https://triangletribune.com/robots.txt
Redirect http://www.triangletribune.com/robots.txt
Redirect Domain www.triangletribune.com
Redirect Base triangletribune.com
Domain IPs 104.21.22.224, 172.67.207.138, 2606:4700:3037::6815:16e0, 2606:4700:3037::ac43:cf8a
Redirect IPs 104.21.22.224, 172.67.207.138, 2606:4700:3037::6815:16e0, 2606:4700:3037::ac43:cf8a
Response IP 104.21.22.224
Found Yes
Hash eda8ba5ee9781486935d6570ae97827322b99fb298d2c8e603501d4151b4eda8
SimHash adc89ae0e5b2

Groups

*

Rule Path
Disallow /*print%3Dpdf*

Other Records

Field Value
crawl-delay 5

Comments

  • ROBOTS.TXT
  • www.triangletribune.com
  • Google
  • User-agent: Googlebot
  • Disallow:
  • Yahoo
  • User-agent: Slurp
  • Disallow:
  • Alta-Vista
  • User-agent: Scooter
  • Disallow:
  • Excite
  • User-agent: ArchitextSpider
  • Disallow:
  • InfoSeek
  • User-agent: UltraSeek
  • Disallow:
  • Lycos
  • User-agent: Lycos_Spider_(T-Rex)
  • Disallow:
  • LookSmart
  • User-agent: MantraAgent
  • Disallow:
  • Alltheweb
  • User-agent: FAST-WebCrawler
  • Disallow: