dreamwidth.org
robots.txt

Robots Exclusion Standard data for dreamwidth.org

Resource Scan

Scan Details

Site Domain dreamwidth.org
Base Domain dreamwidth.org
Scan Status Ok
Last Scan2024-05-03T16:55:56+00:00
Next Scan 2024-06-02T16:55:56+00:00

Last Scan

Scanned2024-05-03T16:55:56+00:00
URL https://www.dreamwidth.org/robots.txt
Domain IPs 52.84.150.45, 52.84.150.52, 52.84.150.61, 52.84.150.63
Response IP 52.84.150.63
Found Yes
Hash 483a5c76a0d854ff53869898ede102aa480be424eddc8f8ba587e66978bc6f16
SimHash ed3d1d428fd9

Groups

*

Rule Path
Disallow /directorysearch
Disallow /latest
Disallow /search
Disallow /tools/tellafriend

Comments

  • Blocked journals aren't listed here because robots.txt files
  • can't be above 50k or so, depending on the spider.
  • Instead, blocked journals have HTML inserted in them which
  • should prevent behaved spiders from indexing it.
  • Note that https://username.dreamwidth.org journals have an
  • autogenerated robots.txt, since it can be small.