historia.wprost.pl
robots.txt

Robots Exclusion Standard data for historia.wprost.pl

Resource Scan

Scan Details

Site Domain historia.wprost.pl
Base Domain wprost.pl
Scan Status Ok
Last Scan2024-10-22T17:48:08+00:00
Next Scan 2024-11-21T17:48:08+00:00

Last Scan

Scanned2024-10-22T17:48:08+00:00
URL https://historia.wprost.pl/robots.txt
Domain IPs 104.22.36.159, 104.22.37.159, 172.67.11.47, 2606:4700:10::6816:249f, 2606:4700:10::6816:259f, 2606:4700:10::ac43:b2f
Response IP 172.67.11.47
Found Yes
Hash b234f07eec7bab9420fd51a0b72e225eebe99149ef397212d20e47ac4ec99f0b
SimHash e43051d9e7f7

Groups

*

Rule Path
Disallow /wyszukaj/

mediapartners-google

Rule Path
Disallow

googlebot

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot-mobile

Rule Path
Disallow

googlebot-news

Rule Path
Disallow

googlebot-video

Rule Path
Disallow

adsbot-google

Rule Path
Disallow

googlebot_nauxeo

Rule Path
Disallow

twitterbot

Rule Path
Disallow

applebot

Rule Path
Disallow

ouestfrancebot

Rule Path
Disallow

taboolabot

Rule Path
Disallow

proximic

Rule Path
Disallow

upday

Rule Path
Disallow

bingbot

Rule Path
Disallow

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

fast

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 2

*

Rule Path
Disallow

Other Records

Field Value
sitemap https://historia.wprost.pl/sitemap/sections
sitemap https://historia.wprost.pl/sitemap/news

Comments

  • disable at search level
  • for biznes.wprost.pl
  • Allowed search engines directives
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Misbehaving: requests much too fast:
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/