livejournal.com
robots.txt

Robots Exclusion Standard data for livejournal.com

Resource Scan

Scan Details

Site Domain livejournal.com
Base Domain livejournal.com
Scan Status Ok
Last Scan2024-05-04T08:34:33+00:00
Next Scan 2024-05-11T08:34:33+00:00

Last Scan

Scanned2024-05-04T08:34:33+00:00
URL https://livejournal.com/robots.txt
Redirect https://www.livejournal.com/robots.txt
Redirect Domain www.livejournal.com
Redirect Base livejournal.com
Domain IPs 81.19.74.0, 81.19.74.1
Redirect IPs 81.19.74.0, 81.19.74.1
Response IP 81.19.74.1
Found Yes
Hash 22017ee87b4d7c1b4eee61132f739b65ec252836531160c409097bf51b87fd98
SimHash 491b8c6285cd

Groups

yandex

Rule Path
Allow /
Disallow /directory.bml
Disallow /allpics.bml
Disallow /update.bml
Disallow /identity
Disallow /login.bml
Disallow /manage
Disallow /poll
Disallow /profile
Disallow /schools
Disallow /todo
Disallow /tools
Disallow /update.bml
Disallow /userinfo.bml
Disallow /users
Allow /ratings/$
Disallow /ratings
Disallow /syn
Disallow /latest
Disallow /ljtimes
Disallow /talkread
Disallow /inbox
Disallow /misc
Disallow /legal
Disallow /checklistposts
Disallow /away

spbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

*

Rule Path
Allow /
Disallow /directory.bml
Disallow /allpics.bml
Disallow /update.bml
Disallow /identity
Disallow /login.bml
Disallow /manage
Disallow /poll
Disallow /profile
Disallow /schools
Disallow /todo
Disallow /tools
Disallow /update.bml
Disallow /userinfo.bml
Disallow /users
Allow /ratings/$
Disallow /ratings
Disallow /syn
Disallow /latest
Disallow /ljtimes
Disallow /talkread
Disallow /inbox
Disallow /misc
Disallow /legal
Disallow /checklistposts
Disallow /away

Other Records

Field Value
sitemap https://www.livejournal.com/sitemap.xml

Comments

  • Blocked journals aren't listed here because robots.txt files
  • can't be above 50k or so, depending on the spider.
  • Instead, blocked journals have HTML inserted in them which
  • should prevent behaved spiders from indexing it.
  • Note that http://username.livejournal.com journals have an
  • autogenerated robots.txt, since it can be small.

Warnings

  • `clean-param` is not a known field.
  • `host` is not a known field.