cnfreevpn.com
robots.txt

Robots Exclusion Standard data for cnfreevpn.com

Resource Scan

Scan Details

Site Domain cnfreevpn.com
Base Domain cnfreevpn.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-08-09T19:13:14+00:00
Next Scan 2024-11-07T19:13:14+00:00

Last Successful Scan

Scanned2023-01-16T23:24:09+00:00
URL https://cnfreevpn.com/robots.txt
Domain IPs 104.21.55.29, 172.67.170.85, 2606:4700:3033::6815:371d, 2606:4700:3037::ac43:aa55
Response IP 104.21.55.29
Found Yes
Hash 796104b47a3beb70390cfae9e290bbb8893b976bf09a2ec7e57ced9acfbfa4e2
SimHash 161051c9e4d6

Groups

*

Rule Path
Disallow /cgi-bin/

googlebot

Rule Path
Disallow

browsershots

Rule Path
Disallow

funwebproducts

Rule Path
Disallow /

googledocs

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

screaming frog seo spider

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

sbooksnet

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

bubing

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.calculatorsoup.com/sitemap.xml

Comments

  • applies to all robots disallow
  • 2019-02-22 remove
  • Disallow: /search.php
  • block Mediapartners from search.php 2017-03-12 because they try many search query's
  • 2019-02-22 remove
  • User-agent: Mediapartners-Google
  • Allow: /
  • Disallow: /search.php
  • do not beleive this is respected
  • From Wiki
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/