gmyz.dorzeczy.pl
robots.txt

Robots Exclusion Standard data for gmyz.dorzeczy.pl

Resource Scan

Scan Details

Site Domain gmyz.dorzeczy.pl
Base Domain dorzeczy.pl
Scan Status Ok
Last Scan2024-10-22T12:49:39+00:00
Next Scan 2024-11-21T12:49:39+00:00

Last Scan

Scanned2024-10-22T12:49:39+00:00
URL https://gmyz.dorzeczy.pl/robots.txt
Domain IPs 104.22.46.154, 104.22.47.154, 172.67.26.134, 2606:4700:10::6816:2e9a, 2606:4700:10::6816:2f9a, 2606:4700:10::ac43:1a86
Response IP 104.22.47.154
Found Yes
Hash 90aa5a827c0b2708af30193d4dca49ce3e85aee64e8b52dd75af418efde48209
SimHash e43051c9e6f7

Groups

*

Rule Path
Disallow /szukaj/
Disallow /wyszukaj/

mediapartners-google

Rule Path
Disallow

googlebot

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot-mobile

Rule Path
Disallow

googlebot-news

Rule Path
Disallow

googlebot-video

Rule Path
Disallow

adsbot-google

Rule Path
Disallow

googlebot_nauxeo

Rule Path
Disallow

twitterbot

Rule Path
Disallow

applebot

Rule Path
Disallow

ouestfrancebot

Rule Path
Disallow

taboolabot

Rule Path
Disallow

proximic

Rule Path
Disallow

upday

Rule Path
Disallow

bingbot

Rule Path
Disallow

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

fast

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 2

*

Rule Path
Disallow

Comments

  • disable at search level
  • Allowed search engines directives
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Misbehaving: requests much too fast:
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/