documents-online.net
robots.txt

Robots Exclusion Standard data for documents-online.net

Resource Scan

Scan Details

Site Domain documents-online.net
Base Domain documents-online.net
Scan Status Ok
Last Scan2025-03-31T07:56:14+00:00
Next Scan 2025-04-07T07:56:14+00:00

Last Scan

Scanned2025-03-31T07:56:14+00:00
URL https://documents-online.net/robots.txt
Domain IPs 179.61.189.17, 2a02:4780:84:13e1:f6e5:c1e3:b3e5:14d6
Response IP 191.101.228.55
Found Yes
Hash cb76034354db99995f2fab2adc66620e932ad53d2b475c829ceb6ba2aae40ac3
SimHash 3e44d160cb51

Groups

mediapartners-google

Rule Path
Disallow

stress-agent

Rule Path
Disallow /

*

Rule Path
Disallow /manual/
Disallow /manual-1.3/
Disallow /manual-2.0/
Disallow /manual-2.2/
Disallow /addon-modules/
Disallow /doc/
Disallow /images/
Disallow /properties/
Disallow /fromtheworldand/
Disallow /all_our_e-mail_addresses
Disallow /admin/

Comments

  • disallow stress test
  • exclude help system from robots
  • the next line is a spam bot trap, for grepping the logs. you should _really_ change this to something else...
  • same idea here...
  • but allow htdig to index our doc-tree
  • User-agent: htdig
  • Disallow: