lj.ru
robots.txt

Robots Exclusion Standard data for lj.ru

Resource Scan

Scan Details

Site Domain lj.ru
Base Domain lj.ru
Scan Status Ok
Last Scan2024-09-19T03:56:07+00:00
Next Scan 2024-09-26T03:56:07+00:00

Last Scan

Scanned2024-09-19T03:56:07+00:00
URL https://lj.ru/robots.txt
Redirect https://www.livejournal.com/robots.txt
Redirect Domain www.livejournal.com
Redirect Base livejournal.com
Domain IPs 81.19.74.40, 81.19.74.41
Redirect IPs 81.19.74.0, 81.19.74.1
Response IP 81.19.74.1
Found Yes
Hash 56c6734a2f3f5cc47831b8561a13a6a1bd1cbc16aea19a0ecaf4993f922b3f01
SimHash 691d8c6205ed

Groups

yandex

Rule Path
Allow /
Disallow /allpics.bml
Disallow /update.bml
Disallow /identity
Disallow /login.bml
Disallow /manage
Disallow /poll
Disallow /profile
Disallow /schools
Disallow /todo
Disallow /tools
Disallow /update.bml
Disallow /userinfo.bml
Disallow /users
Allow /ratings/$
Disallow /ratings
Disallow /syn
Disallow /latest
Disallow /ljtimes
Disallow /talkread
Disallow /inbox
Disallow /misc
Disallow /legal
Disallow /checklistposts
Disallow /away
Disallow /rsearch
Disallow /gsearch

spbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

*

Rule Path
Allow /
Disallow /allpics.bml
Disallow /update.bml
Disallow /identity
Disallow /login.bml
Disallow /manage
Disallow /poll
Disallow /profile
Disallow /schools
Disallow /todo
Disallow /tools
Disallow /update.bml
Disallow /userinfo.bml
Disallow /users
Allow /ratings/$
Disallow /ratings
Disallow /syn
Disallow /latest
Disallow /ljtimes
Disallow /talkread
Disallow /inbox
Disallow /misc
Disallow /legal
Disallow /checklistposts
Disallow /away
Disallow /rsearch
Disallow /gsearch

Other Records

Field Value
sitemap https://www.livejournal.com/sitemap.xml

Comments

  • Blocked journals aren't listed here because robots.txt files
  • can't be above 50k or so, depending on the spider.
  • Instead, blocked journals have HTML inserted in them which
  • should prevent behaved spiders from indexing it.
  • Note that http://username.livejournal.com journals have an
  • autogenerated robots.txt, since it can be small.

Warnings

  • `clean-param` is not a known field.
  • `host` is not a known field.