lj.rossia.org
robots.txt

Robots Exclusion Standard data for lj.rossia.org

Resource Scan

Scan Details

Site Domain lj.rossia.org
Base Domain rossia.org
Scan Status Ok
Last Scan2025-12-02T01:22:23+00:00
Next Scan 2026-01-01T01:22:23+00:00

Last Scan

Scanned2025-12-02T01:22:23+00:00
URL https://lj.rossia.org/robots.txt
Redirect http://lj.rossia.org/robots.txt
Domain IPs 163.172.215.104
Response IP 163.172.215.104
Found Yes
Hash f7e971cef21ac00d52f1a3cb47303e4725d9746237820b0269d5e77b9522f4e2
SimHash e57db8508fb1

Groups

*

Rule Path
Disallow /directory
Disallow /interests
Disallow /tools/tell
Disallow /tools/memadd
Disallow /tools/search.bml
Disallow /friends/
Disallow /interface/
Disallow /translate/
Disallow /comments/
Disallow /numreplies/
Disallow /users/imp_
Disallow /userinfo.bml?user=imp_
Disallow /talk
Disallow /stats/stats.txt
Disallow /create
Disallow /update

Other Records

Field Value
crawl-delay 1

Comments

  • Blocked journals aren't listed here because robots.txt files
  • can't be above 50k or so, depending on the spider.
  • Instead, blocked journals have HTML inserted in them which
  • should prevent behaved spiders from indexing it.

Warnings

  • `host` is not a known field.