idahopotato.com
robots.txt

Robots Exclusion Standard data for idahopotato.com

Resource Scan

Scan Details

Site Domain idahopotato.com
Base Domain idahopotato.com
Scan Status Ok
Last Scan2025-03-20T02:51:30+00:00
Next Scan 2025-04-19T02:51:30+00:00

Last Scan

Scanned2025-03-20T02:51:30+00:00
URL https://idahopotato.com/robots.txt
Domain IPs 216.92.208.246
Response IP 216.92.208.246
Found Yes
Hash f824de53ebfdd2b0cdf6f6148741d5efd15e1893fd58d1963f38518d46aec034
SimHash 70d21ddc4d78

Groups

*

Rule Path
Disallow /sp-revision
Disallow /tag/
Disallow /recipes/tag/
Disallow /gallery/show/
Disallow /dr-potato/tag/

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

ahrefsbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

pingdom

Rule Path
Disallow /

xovibot

Rule Path
Disallow /

synapse

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

riddler

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

genieo

Rule Path
Disallow /

yandex

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

mozilla/5.0 (compatible; worldwebheritage.org/1.0; +crawl@worldwebheritage.org)

Rule Path
Disallow /

site24x7

Rule Path
Disallow /

seoscanners.net

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

spbot

Rule Path
Disallow /

Other Records

Field Value
sitemap http://idahopotato.com/sitemap.xml

Comments

  • Yahoo Slurp crawler
  • URL: http://help.yahoo.com/help/us/ysearch/slurp
  • IP: 68.180.228.120, 68.180.229.187
  • Purpose: Indexing for yahoo search
  • Impact: Very heavy load on the site.
  • Action: Dont disallow, but ask it to keep the pace down
  • AHrefs crawler
  • URL: https://ahrefs.com/robot
  • IP: 151.80.31.138
  • Purpose: SEO tool
  • Impact: Very heavy load on the site
  • Action: Disallow from indexing site
  • Bing crawler
  • URL: http://www.bing.com/bingbot.htm
  • IP:
  • Purpose: Bing search
  • Impact: Very heavy load on the site
  • Action: None
  • BingPreview crawler
  • URL: https://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0
  • IP:
  • Purpose: Bing snapshot of pages
  • Impact: Low load on the site
  • Action: None
  • SearchImprove Linkcheck crawler
  • URL:
  • IP:
  • Purpose: Siteimprove linkcheck
  • Impact: Heavy load on the site
  • Action: None
  • Siteimprove response
  • URL:
  • IP:
  • Purpose: Siteimprove response
  • Impact: Puts heavy load on the site
  • Action: None
  • Google crawler
  • URL: http://www.google.com/bot.html
  • IP:
  • Purpose: Google search
  • Impact: Puts heavy load on the site
  • Action: None
  • Netmester+Automated+Download
  • URL:
  • IP:
  • Purpose: ?
  • Impact: Puts heavy load on the site
  • Action: None
  • Siteimprove search
  • URL:
  • IP:
  • Purpose: Siteimprove search
  • Impact: Puts heavy load on the site
  • Action: None
  • SemrushBot crawler
  • URL: http://www.semrush.com/bot.html
  • IP: 46.229.164.98
  • Purpose: SEO tool
  • Impact: Puts medium load on the site
  • Action: Disallow from indexing site
  • Disallow: /
  • User-agent: SemrushBot
  • Disallow: /
  • Eniro crawler
  • URL: ECCP/1.2.1+(productlists@eniro.com)
  • IP: 80.69.225.169
  • Purpose: Eniro search engine
  • Impact: Puts medium load on the site.
  • Action: none
  • opensiteexplorer crawler
  • URL: http://www.opensiteexplorer.org/dotbot
  • IP: 208.115.113.88
  • Purpose: SEO tool
  • Impact: Puts medium load on the site
  • Action: Disallow from indexing site
  • Pingdom crawler
  • URL: http://www.pingdom.com/
  • IP: Many - so suspicious (tool to check responses from around world, which explains a lot of IPs)
  • Purpose: web site monitoring tool - there is a verion 1.4 and 2.0
  • Impact: Puts medium load on the site
  • Action: Disallow from indexing site - but more should probably be done
  • XoviBot crawler
  • URL: http://www.xovibot.net/
  • IP: Many - so suspicious
  • Purpose: SEO tool
  • Impact: Puts medium load on the site
  • Action: Disallow from indexing site - but more should probably be done
  • Synapse crawler
  • URL: Mozilla/4.0+(compatible;+Synapse)
  • IP: Many - so suspicious
  • Purpose: Used as an agent for viewstate attacks
  • Impact: Puts medium load on the site.
  • Action: Disallow from indexing site. Should probably be blocked in code
  • BLEXBot crawler
  • URL: http://webmeup-crawler.com/
  • IP:
  • Purpose: SEO tool
  • Impact: Puts medium load on the site.
  • Action: Disallow from indexing site
  • Riddler crawler
  • URL: http://riddler.io/about
  • IP:
  • Purpose: commercial hostname tool
  • Impact: Puts medium load on the site.
  • Action: Disallow from indexing site
  • trendiction crawler
  • URL: http://www.trendiction.de/bot
  • IP:
  • Purpose: social media analysis tool
  • Impact: Puts medium load on the site.
  • Action: Disallow from indexing site
  • Archive.org crawler
  • URL: http://archive.org/details/archive.org_bot
  • IP:
  • Purpose: Website time machine - ok
  • Impact: Puts medium load on the site.
  • Action: None
  • Genio crawler
  • URL: http://www.genieo.com/webfilter.html
  • IP:
  • Purpose: Genieo is the provider of a recommendation engine and personal homepage.
  • Impact: Puts medium load on the site.
  • Action: Disallow from indexing site
  • TurnitinBot crawler
  • URL: https://turnitin.com/robot/crawlerinfo.html
  • IP:
  • Purpose: This robot collects content from the Internet for the sole purpose of helping educational institutions prevent plagiarism.
  • Impact: Puts medium load on the site.
  • Action: None
  • Yandex (Russian search engine)
  • URL: https://yandex.com/support/webmaster/controlling-robot/robots-txt.xml
  • IP:
  • Purpose:
  • Impact:
  • Action: Disallow from indexing site
  • Majestic12 crawler
  • URL: http://www.majestic12.co.uk/bot.php
  • IP:
  • Purpose:
  • Impact:
  • Action: Disallow from indexing site
  • Exabot crawler
  • URL: http://www.exabot.com/go/robot
  • IP:
  • Purpose:
  • Impact:
  • Action: Disallow from indexing site
  • worldwebheritage.org crawler
  • URL: http://worldwebheritage.org (doesn't work anymore)
  • IP:
  • Purpose:
  • Impact:
  • Action: Disallow from indexing site
  • Site24x7 crawler
  • URL: https://www.site24x7.com
  • IP: 79.99.1.106
  • Purpose: Response monitoring
  • Impact:
  • Action: Disallow from indexing site
  • seoscanners.net crawler
  • URL: https://www.site24x7.com
  • IP: 79.99.1.106
  • Purpose: SEO scanning, though domain is just registeret to GoDaddy
  • Impact:
  • Action: Disallow from indexing site
  • Baiduspider crawler
  • URL: http://www.baidu.com/search/spider.html
  • IP: Lots of different
  • Purpose: Baidu (Chinese) search engine
  • Impact:
  • Action: Disallow from indexing site
  • spbot crawler
  • URL: http://OpenLinkProfiler.org/bot
  • IP: Lots of different
  • Purpose: Finds who links to whom
  • Impact:
  • Action: Disallow from indexing site
  • ia_archiver crawler
  • URL: Archive.org
  • IP: Wayback machine (and Alexa)
  • Purpose: Saves old versions of sites
  • Impact: Limited impact
  • Action: No action
  • User-agent: ia_archiver
  • Disallow: /
  • msnbot-media crawler
  • URL: http://search.msn.com/msnbot.htm
  • IP: 40.77.167.13
  • Purpose: MSN/Bing media crawler
  • Impact: Limited impact
  • Action: No action
  • User-agent: msnbot-media
  • Disallow: /