idahopotato.com
robots.txt

Robots Exclusion Standard data for idahopotato.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	idahopotato.com
Base Domain	idahopotato.com
Scan Status	Ok
Last Scan	2025-03-20T02:51:30+00:00
Next Scan	2025-04-19T02:51:30+00:00

Last Scan

Scanned	2025-03-20T02:51:30+00:00
URL	https://idahopotato.com/robots.txt
Domain IPs	216.92.208.246
Response IP	216.92.208.246
Found	Yes
Hash	f824de53ebfdd2b0cdf6f6148741d5efd15e1893fd58d1963f38518d46aec034
SimHash	70d21ddc4d78

Groups

*

Rule	Path
Disallow	/sp-revision
Disallow	/tag/
Disallow	/recipes/tag/
Disallow	/gallery/show/
Disallow	/dr-potato/tag/

Rule

Path

Disallow

/sp-revision

Disallow

/tag/

Disallow

/recipes/tag/

Disallow

/gallery/show/

Disallow

/dr-potato/tag/

slurp

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

pingdom

Rule	Path
Disallow	/

Rule

Path

Disallow

xovibot

Rule	Path
Disallow	/

Rule

Path

Disallow

synapse

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

riddler

Rule	Path
Disallow	/

Rule

Path

Disallow

trendictionbot

Rule	Path
Disallow	/

Rule

Path

Disallow

genieo

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

mozilla/5.0 (compatible; worldwebheritage.org/1.0; +crawl@worldwebheritage.org)

Rule	Path
Disallow	/

Rule

Path

Disallow

site24x7

Rule	Path
Disallow	/

Rule

Path

Disallow

seoscanners.net

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

spbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	http://idahopotato.com/sitemap.xml

Field

Value

sitemap

http://idahopotato.com/sitemap.xml

Comments

Yahoo Slurp crawler
URL: http://help.yahoo.com/help/us/ysearch/slurp
IP: 68.180.228.120, 68.180.229.187
Purpose: Indexing for yahoo search
Impact: Very heavy load on the site.
Action: Dont disallow, but ask it to keep the pace down
AHrefs crawler
URL: https://ahrefs.com/robot
IP: 151.80.31.138
Purpose: SEO tool
Impact: Very heavy load on the site
Action: Disallow from indexing site
Bing crawler
URL: http://www.bing.com/bingbot.htm
IP:
Purpose: Bing search
Impact: Very heavy load on the site
Action: None
BingPreview crawler
URL: https://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0
IP:
Purpose: Bing snapshot of pages
Impact: Low load on the site
Action: None
SearchImprove Linkcheck crawler
URL:
IP:
Purpose: Siteimprove linkcheck
Impact: Heavy load on the site
Action: None
Siteimprove response
URL:
IP:
Purpose: Siteimprove response
Impact: Puts heavy load on the site
Action: None
Google crawler
URL: http://www.google.com/bot.html
IP:
Purpose: Google search
Impact: Puts heavy load on the site
Action: None
Netmester+Automated+Download
URL:
IP:
Purpose: ?
Impact: Puts heavy load on the site
Action: None
Siteimprove search
URL:
IP:
Purpose: Siteimprove search
Impact: Puts heavy load on the site
Action: None
SemrushBot crawler
URL: http://www.semrush.com/bot.html
IP: 46.229.164.98
Purpose: SEO tool
Impact: Puts medium load on the site
Action: Disallow from indexing site
Disallow: /
User-agent: SemrushBot
Disallow: /
Eniro crawler
URL: ECCP/1.2.1+(productlists@eniro.com)
IP: 80.69.225.169
Purpose: Eniro search engine
Impact: Puts medium load on the site.
Action: none
opensiteexplorer crawler
URL: http://www.opensiteexplorer.org/dotbot
IP: 208.115.113.88
Purpose: SEO tool
Impact: Puts medium load on the site
Action: Disallow from indexing site
Pingdom crawler
URL: http://www.pingdom.com/
IP: Many - so suspicious (tool to check responses from around world, which explains a lot of IPs)
Purpose: web site monitoring tool - there is a verion 1.4 and 2.0
Impact: Puts medium load on the site
Action: Disallow from indexing site - but more should probably be done
XoviBot crawler
URL: http://www.xovibot.net/
IP: Many - so suspicious
Purpose: SEO tool
Impact: Puts medium load on the site
Action: Disallow from indexing site - but more should probably be done
Synapse crawler
URL: Mozilla/4.0+(compatible;+Synapse)
IP: Many - so suspicious
Purpose: Used as an agent for viewstate attacks
Impact: Puts medium load on the site.
Action: Disallow from indexing site. Should probably be blocked in code
BLEXBot crawler
URL: http://webmeup-crawler.com/
IP:
Purpose: SEO tool
Impact: Puts medium load on the site.
Action: Disallow from indexing site
Riddler crawler
URL: http://riddler.io/about
IP:
Purpose: commercial hostname tool
Impact: Puts medium load on the site.
Action: Disallow from indexing site
trendiction crawler
URL: http://www.trendiction.de/bot
IP:
Purpose: social media analysis tool
Impact: Puts medium load on the site.
Action: Disallow from indexing site
Archive.org crawler
URL: http://archive.org/details/archive.org_bot
IP:
Purpose: Website time machine - ok
Impact: Puts medium load on the site.
Action: None
Genio crawler
URL: http://www.genieo.com/webfilter.html
IP:
Purpose: Genieo is the provider of a recommendation engine and personal homepage.
Impact: Puts medium load on the site.
Action: Disallow from indexing site
TurnitinBot crawler
URL: https://turnitin.com/robot/crawlerinfo.html
IP:
Purpose: This robot collects content from the Internet for the sole purpose of helping educational institutions prevent plagiarism.
Impact: Puts medium load on the site.
Action: None
Yandex (Russian search engine)
URL: https://yandex.com/support/webmaster/controlling-robot/robots-txt.xml
IP:
Purpose:
Impact:
Action: Disallow from indexing site
Majestic12 crawler
URL: http://www.majestic12.co.uk/bot.php
IP:
Purpose:
Impact:
Action: Disallow from indexing site
Exabot crawler
URL: http://www.exabot.com/go/robot
IP:
Purpose:
Impact:
Action: Disallow from indexing site
worldwebheritage.org crawler
URL: http://worldwebheritage.org (doesn't work anymore)
IP:
Purpose:
Impact:
Action: Disallow from indexing site
Site24x7 crawler
URL: https://www.site24x7.com
IP: 79.99.1.106
Purpose: Response monitoring
Impact:
Action: Disallow from indexing site
seoscanners.net crawler
URL: https://www.site24x7.com
IP: 79.99.1.106
Purpose: SEO scanning, though domain is just registeret to GoDaddy
Impact:
Action: Disallow from indexing site
Baiduspider crawler
URL: http://www.baidu.com/search/spider.html
IP: Lots of different
Purpose: Baidu (Chinese) search engine
Impact:
Action: Disallow from indexing site
spbot crawler
URL: http://OpenLinkProfiler.org/bot
IP: Lots of different
Purpose: Finds who links to whom
Impact:
Action: Disallow from indexing site
ia_archiver crawler
URL: Archive.org
IP: Wayback machine (and Alexa)
Purpose: Saves old versions of sites
Impact: Limited impact
Action: No action
User-agent: ia_archiver
Disallow: /
msnbot-media crawler
URL: http://search.msn.com/msnbot.htm
IP: 40.77.167.13
Purpose: MSN/Bing media crawler
Impact: Limited impact
Action: No action
User-agent: msnbot-media
Disallow: /

idahopotato.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

slurp

Other Records

ahrefsbot

dotbot

pingdom

xovibot

synapse

blexbot

riddler

trendictionbot

genieo

yandex

mj12bot

exabot

mozilla/5.0 (compatible; worldwebheritage.org/1.0; +crawl@worldwebheritage.org)

site24x7

seoscanners.net

baiduspider

spbot

Other Records

Comments

idahopotato.com
robots.txt