it.webqc.org
robots.txt

Robots Exclusion Standard data for it.webqc.org

Resource Scan

Scan Details

Site Domain it.webqc.org
Base Domain webqc.org
Scan Status Ok
Last Scan2025-08-16T08:19:54+00:00
Next Scan 2025-09-15T08:19:54+00:00

Last Scan

Scanned2025-08-16T08:19:54+00:00
URL https://it.webqc.org/robots.txt
Domain IPs 104.21.26.168, 172.67.137.100, 2606:4700:3033::ac43:8964, 2606:4700:3037::6815:1aa8
Response IP 104.21.26.168
Found Yes
Hash 20a692a9e716deb62770a2021e8c720d46f5a3b84ef1c0b4959b0edb96d68274
SimHash e2762d46c91c

Groups

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 1

gptbot
claudebot

Rule Path
Disallow /balancedchemicalequations

gptbot
claudebot

Rule Path
Disallow /molecularweightcalculated

*

Rule Path
Disallow /cite.php

Comments

  • Prevent msn from overwhealming the server, e.g some msn bot ips hit site 99558 per day in Feb 2015
  • Changed to any agent since mail.ru started to overload it as well
  • Since Jan 7 2025 a few hosts like rate-limited-proxy-209-85-238-1.google.com started to issuing 200K+ daily requests.
  • Prevent building site content in LLM