j31.co.uk
robots.txt

Robots Exclusion Standard data for j31.co.uk

Resource Scan

Scan Details

Site Domain j31.co.uk
Base Domain j31.co.uk
Scan Status Ok
Last Scan2024-11-11T23:15:36+00:00
Next Scan 2024-11-18T23:15:36+00:00

Last Scan

Scanned2024-11-11T23:15:36+00:00
URL http://j31.co.uk/robots.txt
Domain IPs 92.204.209.139
Response IP 92.204.209.139
Found Yes
Hash 016819581e1064609f3e33e6020385b867e86a7413fdc14344a761700c4bc581
SimHash 24049d13c4b5

Groups

googlebot

Rule Path
Disallow /thepub/index.php?*%3Bwap
Disallow /thepub/index.php?*%3Bwap2
Disallow /thepub/index.php?*%3Bimode

slurp

Rule Path
Disallow /thepub/
Allow /sitemap.xml$
Allow /robots.txt$
Allow /thepub/index.php$
Allow /thepub/index.php?topic=*.0$
Allow /thepub/index.php?topic=*.*0$
Allow /thepub/index.php?topic=*.*5$
Allow /thepub/index.php?board=*.0$
Allow /thepub/index.php?board=*.*0$
Allow /thepub/index.php?board=*.*5$
Disallow /thepub/index.php?*.msg
Disallow /thepub/index.php?topic=*.msg*0$
Disallow /thepub/index.php?topic=*.msg*5$
Disallow /thepub/index.php?*.new
Disallow /thepub/index.php?*%3B*

twiceler

Rule Path
Disallow /

w3c-checklink

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /thepub/index.php?PHPSESSID

*

Rule Path
Disallow /thepub/attachments/
Disallow /thepub/Packages/
Disallow /thepub/Smileys/
Disallow /thepub/Sources/
Disallow /thepub/Themes/
Disallow /thepub/index.php?action=activate
Disallow /thepub/index.php?action=admin
Disallow /thepub/index.php?action=calendar
Disallow /thepub/index.php?action=emailuser
Disallow /thepub/index.php?action=findmember
Disallow /thepub/index.php?action=help
Disallow /thepub/index.php?action=helpadmin
Disallow /thepub/index.php?action=login
Disallow /thepub/index.php?action=logout
Disallow /thepub/index.php?action=mlist
Disallow /thepub/index.php?action=modifykarma
Disallow /thepub/index.php?action=pm
Disallow /thepub/index.php?action=post
Disallow /thepub/index.php?action=printpage
Disallow /thepub/index.php?action=profile
Disallow /thepub/index.php?action=recent
Disallow /thepub/index.php?action=register
Disallow /thepub/index.php?action=reminder
Disallow /thepub/index.php?action=search
Disallow /thepub/index.php?action=theme
Disallow /thepub/index.php?action=unread
Disallow /thepub/index.php?action=unreadreplies
Disallow /thepub/index.php?action=verificationcode
Disallow /thepub/index.php?action=who
Disallow /thepub/index.php?theme
Disallow /thepub/index.php?action=stats%3Bexpand
Disallow /thepub/index.php?action=stats%3Bcollapse

Other Records

Field Value
sitemap http://YOURDOMAINHERE/sitemap.xml

Comments

  • Robots.txt
  • Based on:
  • YouPosted.com Smart Robots v3.05
  • My Sitemap - I don't provide it just for the fun of it
  • Google - Most Important bot
  • Unfortunately a robots.txt will only stop it crawling certain urls, and NOT adding any
  • urls which it comes across into its index. So we're relying on a meta noindex tag.
  • Don't index mobile versions
  • Yahoo - Too aggressive
  • So limit it as much as possible.
  • Disallow Everything
  • Now allow bits and then disallow bits
  • But don't allow these
  • Anything with a ; disallow
  • Bad bot - Often ignores robots.txt - Waste of bandwidth
  • Despite claiming on their website to be a search engine in development
  • I'm suspicious as to whether they are a harvester pretending to be SE
  • Stop following PHPSESSID's
  • Catch all (remainder)
  • Will be followed by any bots other than ones identified above
  • Uses BASIC robots.txt directives without wildcards, end-anchors etc
  • So Spiders should understand these (including MSNBOT)
  • Default SMF Folders
  • Default SMF Actions