whichbudget.com
robots.txt

Robots Exclusion Standard data for whichbudget.com

Resource Scan

Scan Details

Site Domain whichbudget.com
Base Domain whichbudget.com
Scan Status Ok
Last Scan2024-09-25T11:33:32+00:00
Next Scan 2024-10-02T11:33:32+00:00

Last Scan

Scanned2024-09-25T11:33:32+00:00
URL https://whichbudget.com/robots.txt
Redirect http://www.whichbudget.com/robots.txt
Redirect Domain www.whichbudget.com
Redirect Base whichbudget.com
Domain IPs 68.233.44.23
Redirect IPs 68.233.44.23
Response IP 68.233.44.23
Found Yes
Hash b68b355394b549123ddc52a16a8fa72a391d1eb76c7862208072cb6f5e18a26f
SimHash 28149d08c5f5

Groups

mj12bot/v1.4.5

Rule Path
Disallow /

googlebot

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot-mobile

Rule Path
Disallow

msnbot

Rule Path
Disallow

slurp

Rule Path
Disallow

teoma

Rule Path
Disallow

twiceler

Rule Path
Disallow

gigabot

Rule Path
Disallow

scrubby

Rule Path
Disallow

robozilla

Rule Path
Disallow

nutch

Rule Path
Disallow

ia_archiver

Rule Path
Disallow

baiduspider

Rule Path
Disallow

naverbot

Rule Path
Disallow

yeti

Rule Path
Disallow

yahoo-mmcrawler

Rule Path
Disallow

psbot

Rule Path
Disallow

asterias

Rule Path
Disallow

yahoo-blogs/v3.9

Rule Path
Disallow

*

Rule Path
Disallow /
Disallow /lib/
Disallow /ads/
Disallow /adm/
Disallow /office/
Disallow /wbdev/
Disallow /r/
Disallow /rx/
Disallow /stats/
Disallow /webmail/
Disallow /phpCollab/
Disallow /beta/
Disallow /plugins/
Disallow /modules/
Disallow /templates/
Disallow /cmd/
Disallow /deeplink/
Disallow /goodbye/

*

Rule Path
Disallow /

Other Records

Field Value
crawl-delay 600

googlebot
googlebot-image
mediapartners-google
msnbot
msnbot-media
slurp
yahoo-blogs
yahoo-mmcrawler

Rule Path
Disallow /includes/
Disallow /misc/
Disallow /modules/
Disallow /profiles/
Disallow /scripts/
Disallow /sites/
Disallow /themes/
Disallow /cmd/
Disallow /CHANGELOG.txt
Disallow /cron.php
Disallow /INSTALL.mysql.txt
Disallow /INSTALL.pgsql.txt
Disallow /install.php
Disallow /INSTALL.txt
Disallow /LICENSE.txt
Disallow /MAINTAINERS.txt
Disallow /update.php
Disallow /UPGRADE.txt
Disallow /xmlrpc.php
Disallow /admin/
Disallow /comment/reply/
Disallow /contact/
Disallow /logout/
Disallow /node/add/
Disallow /search/
Disallow /opensearch/
Disallow /user/register/
Disallow /user/password/
Disallow /user/login/
Disallow /?q=admin%2F
Disallow /?q=comment%2Freply%2F
Disallow /?q=contact%2F
Disallow /?q=logout%2F
Disallow /?q=node%2Fadd%2F
Disallow /?q=search%2F
Disallow /?q=user%2Fpassword%2F
Disallow /?q=user%2Fregister%2F
Disallow /?q=user%2Flogin%2F

Other Records

Field Value
crawl-delay 600

Comments

  • robots.txt generated at http://www.mcanerin.com
  • $Id: robots.txt,v 1.9.2.1 2008/12/10 20:12:19 goba Exp $
  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/wc/robots.html
  • For syntax checking, see:
  • http://www.sxw.org.uk/computing/robots/check.html
  • disallow all
  • but allow only important bots
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no clean URLs)