/.well-known/

Log In Sign Up

it.webqc.org
robots.txt

Robots Exclusion Standard data for it.webqc.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	it.webqc.org
Base Domain	webqc.org
Scan Status	Ok
Last Scan	2025-08-16T08:19:54+00:00
Next Scan	2025-09-15T08:19:54+00:00

Last Scan

Scanned	2025-08-16T08:19:54+00:00
URL	https://it.webqc.org/robots.txt
Domain IPs	104.21.26.168, 172.67.137.100, 2606:4700:3033::ac43:8964, 2606:4700:3037::6815:1aa8
Response IP	104.21.26.168
Found	Yes
Hash	20a692a9e716deb62770a2021e8c720d46f5a3b84ef1c0b4959b0edb96d68274
SimHash	e2762d46c91c

Groups

*

No rules defined. All paths allowed.

Other Records

Field

Value

crawl-delay

1

gptbot
claudebot

Rule

Path

Disallow

/balancedchemicalequations

gptbot
claudebot

Rule

Path

Disallow

/molecularweightcalculated

*

Rule

Path

Disallow

/cite.php

Back to top

Comments

Prevent msn from overwhealming the server, e.g some msn bot ips hit site 99558 per day in Feb 2015
Changed to any agent since mail.ru started to overload it as well
Since Jan 7 2025 a few hosts like rate-limited-proxy-209-85-238-1.google.com started to issuing 200K+ daily requests.
Prevent building site content in LLM

Back to top