robhandgraaf.nl
robots.txt

Robots Exclusion Standard data for robhandgraaf.nl

Resource Scan

Scan Details

Site Domain robhandgraaf.nl
Base Domain robhandgraaf.nl
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-03-01T08:47:29+00:00
Next Scan 2024-05-30T08:47:29+00:00

Last Successful Scan

Scanned2021-07-20T03:19:50+00:00
URL https://robhandgraaf.nl/robots.txt
Redirect https://www.robhandgraaf.nl/robots.txt
Redirect Domain www.robhandgraaf.nl
Redirect Base robhandgraaf.nl
Found Yes
Hash ba87b3d96fca524572f2a2b6624ca39d81d272617163002b46e89ee833c036a1
SimHash 4147214b65b2

Groups

semrushbot

Rule Path
Disallow /

semrushbot-sa

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

velenpublicwebcrawler

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

sogou spider

Rule Path
Disallow /

youdaobot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

adsbot-google

Rule Path
Disallow /js/

alphaseobot

Rule Path
Disallow /

siteexplorer

Rule Path
Disallow /

sitesucker

Rule Path
Disallow /

openindexspider

Rule Path
Disallow /

booglebot

Rule Path
Disallow /

backlinkcrawler

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

netestate ne crawler (+http://www.website-datenbank.de/)

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

hubspot crawler

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

mail.ru_bot

Rule Path
Disallow /

mail.ru

Rule Path
Disallow /

serpstatbot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

megaindex.ru

Rule Path
Disallow /

megaindex.com

Rule Path
Disallow /

bingbot
*

Rule Path
Allow /
Disallow /about/
Disallow /products/
Disallow /projects/
Disallow /contact/
Disallow /about/
Disallow /contact.html
Disallow /about.html
Disallow /contact-us.html
Disallow /service.html
Disallow /service/

Comments

  • -----------------------------------------------------------
  • robots.txt for http(s)://tinycms.xyz, last refresh 2019/09/30
  • -----------------------------------------------------------
  • not all bots below may obey robots.txt in general
  • or specific rules, respectively
  • cat /home/wwwlogs/access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -nr | head -20
  • -----------------------------------------------------------
  • semrush bot
  • ahrefs bot
  • moz bot
  • Wayback Machine
  • https://velen.io/
  • Baiduspider
  • Block SoGou
  • Block Youdao
  • Yandex
  • AdsBot
  • http://alphaseobot.com/bot.html
  • http://siteexplorer.info/about.html
  • http://www.sitesucker.us/mac/limitations.html
  • https://www.openindex.io/saas/about-our-spider/
  • http://www.backlinktest.com/crawler.html
  • http://napoveda.seznam.cz/
  • http://www.website-datenbank.de
  • Block netEstate NE Crawler (+http://www.website-datenbank.de/)
  • Block BlexBot
  • https://megaindex.com/crawler
  • ------------
  • not exclude
  • ------------

Warnings

  • `crawl-delay` is not a known field.