merapahadforum.com
robots.txt

Robots Exclusion Standard data for merapahadforum.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	merapahadforum.com
Base Domain	merapahadforum.com
Scan Status	Ok
Last Scan	2025-11-21T06:26:42+00:00
Next Scan	2025-11-28T06:26:42+00:00

Last Scan

Scanned	2025-11-21T06:26:42+00:00
URL	https://merapahadforum.com/robots.txt
Domain IPs	103.239.138.184
Response IP	103.239.138.184
Found	Yes
Hash	fa76d05ae5b0a450c08b139099c475c6f7d8b613ac9aca10b867067db7ff8003
SimHash	2410f913c5f5

Groups

googlebot

Rule	Path
Allow	/index.php?action=kitsitemap%3Bxml
Disallow	/?action
Disallow	/sort%3D
Disallow	/msg
Disallow	/?wap2
Disallow	/?wap
Disallow	/index.php?*%3Bwap
Disallow	/index.php?*%3Bwap2
Disallow	/index.php?*%3Bimode
Disallow	/attachments/
Disallow	/Packages/
Disallow	/Smileys/
Disallow	/Sources/
Disallow	/Themes/
Disallow	/index.php?theme

Rule

Path

Allow

/index.php?action=kitsitemap%3Bxml

Disallow

/*?action*

Disallow

/*sort%3D*

Disallow

/*msg*

Disallow

/?wap2

Disallow

/?wap

Disallow

/index.php?*%3Bwap

Disallow

/index.php?*%3Bwap2

Disallow

/index.php?*%3Bimode

Disallow

/attachments/

Disallow

/Packages/

Disallow

/Smileys/

Disallow

/Sources/

Disallow

/Themes/

Disallow

/index.php?theme

slurp

Rule	Path
Disallow	/
Allow	/index.php?action=kitsitemap%3Bxml
Allow	/kitsitemap.xml$
Allow	/robots.txt$
Allow	/index.php$
Allow	/index.php?topic=*.0$
Allow	/index.php?topic=.0$
Allow	/index.php?topic=.5$
Allow	/index.php?board=*.0$
Allow	/index.php?board=.0$
Allow	/index.php?board=.5$
Disallow	/?action
Disallow	/sort%3D
Disallow	/msg
Disallow	/index.php?*.msg
Disallow	/index.php?topic=.msg0$
Disallow	/index.php?topic=.msg5$
Disallow	/index.php?*.new
Disallow	/attachments/
Disallow	/Packages/
Disallow	/Smileys/
Disallow	/Sources/
Disallow	/Themes/
Disallow	/index.php?theme
Disallow	/index.php?%3B

Rule

Path

Disallow

Allow

/index.php?action=kitsitemap%3Bxml

Allow

/kitsitemap.xml$

Allow

/robots.txt$

Allow

/index.php$

Allow

/index.php?topic=*.0$

Allow

/index.php?topic=*.*0$

Allow

/index.php?topic=*.*5$

Allow

/index.php?board=*.0$

Allow

/index.php?board=*.*0$

Allow

/index.php?board=*.*5$

Disallow

/*?action*

Disallow

/*sort%3D*

Disallow

/*msg*

Disallow

/index.php?*.msg

Disallow

/index.php?topic=*.msg*0$

Disallow

/index.php?topic=*.msg*5$

Disallow

/index.php?*.new

Disallow

/attachments/

Disallow

/Packages/

Disallow

/Smileys/

Disallow

/Sources/

Disallow

/Themes/

Disallow

/index.php?theme

Disallow

/index.php?*%3B*

twiceler

Rule	Path
Disallow	/

Rule

Path

Disallow

w3c-checklink

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/index.php?PHPSESSID

Rule

Path

Disallow

/index.php?PHPSESSID

*

Rule	Path
Disallow	/attachments/
Disallow	/Packages/
Disallow	/Smileys/
Disallow	/Sources/
Disallow	/Themes/
Disallow	/index.php?theme
Disallow	/?action
Disallow	/sort%3D
Disallow	/msg

Rule

Path

Disallow

/attachments/

Disallow

/Packages/

Disallow

/Smileys/

Disallow

/Sources/

Disallow

/Themes/

Disallow

/index.php?theme

Disallow

/*?action*

Disallow

/*sort%3D*

Disallow

/*msg*

Comments

Robots.txt
My Sitemap - I don't provide it just for the fun of it
http://www.merapahadforum.com/index.php?action=kitsitemap;xml
Google - Most Important bot
Unfortunately a robots.txt will only stop it crawling certain urls, and NOT adding any
urls which it comes across into its index. So we're relying on a meta noindex tag.
Don't index mobile versions
Default SMF Actions
Yahoo - Too aggressive
So limit it as much as possible.
Disallow Everything
Now allow bits and then disallow bits
But don't allow these
Default SMF Actions
Anything with a ; disallow
Bad bot - Often ignores robots.txt - Waste of bandwidth
Despite claiming on their website to be a search engine in development
I'm suspicious as to whether they are a harvester pretending to be SE
Stop following PHPSESSID's
Catch all (remainder)
Will be followed by any bots other than ones identified above
Uses BASIC robots.txt directives without wildcards, end-anchors etc
So Spiders should understand these (including MSNBOT)
Default SMF Folders
Default SMF Actions

merapahadforum.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

googlebot

slurp

twiceler

w3c-checklink

turnitinbot

mj12bot

*

Comments

merapahadforum.com
robots.txt