merapahadforum.com
robots.txt

Robots Exclusion Standard data for merapahadforum.com

Resource Scan

Scan Details

Site Domain merapahadforum.com
Base Domain merapahadforum.com
Scan Status Ok
Last Scan2025-11-21T06:26:42+00:00
Next Scan 2025-11-28T06:26:42+00:00

Last Scan

Scanned2025-11-21T06:26:42+00:00
URL https://merapahadforum.com/robots.txt
Domain IPs 103.239.138.184
Response IP 103.239.138.184
Found Yes
Hash fa76d05ae5b0a450c08b139099c475c6f7d8b613ac9aca10b867067db7ff8003
SimHash 2410f913c5f5

Groups

googlebot

Rule Path
Allow /index.php?action=kitsitemap%3Bxml
Disallow /*?action*
Disallow /*sort%3D*
Disallow /*msg*
Disallow /?wap2
Disallow /?wap
Disallow /index.php?*%3Bwap
Disallow /index.php?*%3Bwap2
Disallow /index.php?*%3Bimode
Disallow /attachments/
Disallow /Packages/
Disallow /Smileys/
Disallow /Sources/
Disallow /Themes/
Disallow /index.php?theme

slurp

Rule Path
Disallow /
Allow /index.php?action=kitsitemap%3Bxml
Allow /kitsitemap.xml$
Allow /robots.txt$
Allow /index.php$
Allow /index.php?topic=*.0$
Allow /index.php?topic=*.*0$
Allow /index.php?topic=*.*5$
Allow /index.php?board=*.0$
Allow /index.php?board=*.*0$
Allow /index.php?board=*.*5$
Disallow /*?action*
Disallow /*sort%3D*
Disallow /*msg*
Disallow /index.php?*.msg
Disallow /index.php?topic=*.msg*0$
Disallow /index.php?topic=*.msg*5$
Disallow /index.php?*.new
Disallow /attachments/
Disallow /Packages/
Disallow /Smileys/
Disallow /Sources/
Disallow /Themes/
Disallow /index.php?theme
Disallow /index.php?*%3B*

twiceler

Rule Path
Disallow /

w3c-checklink

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /index.php?PHPSESSID

*

Rule Path
Disallow /attachments/
Disallow /Packages/
Disallow /Smileys/
Disallow /Sources/
Disallow /Themes/
Disallow /index.php?theme
Disallow /*?action*
Disallow /*sort%3D*
Disallow /*msg*

Comments

  • Robots.txt
  • My Sitemap - I don't provide it just for the fun of it
  • http://www.merapahadforum.com/index.php?action=kitsitemap;xml
  • Google - Most Important bot
  • Unfortunately a robots.txt will only stop it crawling certain urls, and NOT adding any
  • urls which it comes across into its index. So we're relying on a meta noindex tag.
  • Don't index mobile versions
  • Default SMF Actions
  • Yahoo - Too aggressive
  • So limit it as much as possible.
  • Disallow Everything
  • Now allow bits and then disallow bits
  • But don't allow these
  • Default SMF Actions
  • Anything with a ; disallow
  • Bad bot - Often ignores robots.txt - Waste of bandwidth
  • Despite claiming on their website to be a search engine in development
  • I'm suspicious as to whether they are a harvester pretending to be SE
  • Stop following PHPSESSID's
  • Catch all (remainder)
  • Will be followed by any bots other than ones identified above
  • Uses BASIC robots.txt directives without wildcards, end-anchors etc
  • So Spiders should understand these (including MSNBOT)
  • Default SMF Folders
  • Default SMF Actions