wedkuje.pl
robots.txt

Robots Exclusion Standard data for wedkuje.pl

Archived Snapshots

Resource Scan

Scan Details

Site Domain	wedkuje.pl
Base Domain	wedkuje.pl
Scan Status	Ok
Last Scan	2024-11-15T05:01:58+00:00
Next Scan	2024-11-22T05:01:58+00:00

Last Scan

Scanned	2024-11-15T05:01:58+00:00
URL	https://wedkuje.pl/robots.txt
Domain IPs	104.21.43.40, 172.67.218.225, 2606:4700:3033::ac43:dae1, 2606:4700:3037::6815:2b28
Response IP	104.21.43.40
Found	Yes
Hash	8a8c3fcbec7df86feb6cf3831fcb4a5c7941f82d96c744b31b2e1b89e2653521
SimHash	95f879f3aee7

Groups

sitecheck.internetseer.com

Rule	Path
Disallow	/

Rule

Path

Disallow

zealbot

Rule	Path
Disallow	/

Rule

Path

Disallow

msiecrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

sitesnagger

Rule	Path
Disallow	/

Rule

Path

Disallow

webstripper

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

fetch

Rule	Path
Disallow	/

Rule

Path

Disallow

offline explorer

Rule	Path
Disallow	/

Rule

Path

Disallow

teleport

Rule	Path
Disallow	/

Rule

Path

Disallow

teleportpro

Rule	Path
Disallow	/

Rule

Path

Disallow

webzip

Rule	Path
Disallow	/

Rule

Path

Disallow

linko

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoft.url.control

Rule	Path
Disallow	/

Rule

Path

Disallow

xenu

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

zyborg

Rule	Path
Disallow	/

Rule

Path

Disallow

download ninja

Rule	Path
Disallow	/

Rule

Path

Disallow

grub-client

Rule	Path
Disallow	/

Rule

Path

Disallow

k2spider

Rule	Path
Disallow	/

Rule

Path

Disallow

cliqzbot

Rule	Path
Disallow	/

Rule

Path

Disallow

buck/2.2

Rule	Path
Disallow	/

Rule

Path

Disallow

grapeshot

Rule	Path
Disallow	/

Rule

Path

Disallow

npbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bubing

Rule	Path
Disallow	/

Rule

Path

Disallow

webreaper

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/boot/
Disallow	/_rdzen/
Disallow	/office/
Disallow	/office_new/
Disallow	/_my/
Disallow	/_lib/
Disallow	/_img/
Disallow	/_demon_new/
Disallow	/_hide/
Disallow	/_css/
Disallow	/_blog_cms/
Disallow	/_sklep/
Disallow	/dokumenty/
Disallow	/img/
Disallow	/jquery/
Disallow	/js/
Disallow	/mailing/
Disallow	/newsletter/
Disallow	/pliki/
Disallow	/owiska/
Disallow	/_shared/

Rule

Path

Disallow

/boot/

Disallow

/_rdzen/

Disallow

/office/

Disallow

/office_new/

Disallow

/_my/

Disallow

/_lib/

Disallow

/_img/

Disallow

/_demon_new/

Disallow

/_hide/

Disallow

/_css/

Disallow

/_blog_cms/

Disallow

/_sklep/

Disallow

/dokumenty/

Disallow

/img/

Disallow

/jquery/

Disallow

/js/

Disallow

/mailing/

Disallow

/newsletter/

Disallow

/pliki/

Disallow

/owiska/

Disallow

/_shared/

Comments

The 'grub' distributed client has been *very* poorly behaved.
Doesn't follow robots.txt anyway, but...
Hits many times per second, not acceptable
http://www.nameprotect.com/botinfo.html
A capture bot, downloads gazillions of pages with no public benefit
http://www.webreaper.net/

wedkuje.plrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

sitecheck.internetseer.com

zealbot

msiecrawler

sitesnagger

webstripper

webcopier

fetch

offline explorer

teleport

teleportpro

webzip

linko

httrack

microsoft.url.control

xenu

larbin

libwww

zyborg

download ninja

grub-client

k2spider

cliqzbot

buck/2.2

grapeshot

npbot

bubing

webreaper

*

Comments

wedkuje.pl
robots.txt