realwire.com
robots.txt

Robots Exclusion Standard data for realwire.com

Resource Scan

Scan Details

Site Domain realwire.com
Base Domain realwire.com
Scan Status Ok
Last Scan2024-06-18T13:21:15+00:00
Next Scan 2024-07-18T13:21:15+00:00

Last Scan

Scanned2024-06-18T13:21:15+00:00
URL https://realwire.com/robots.txt
Redirect https://www.realwire.com/robots.txt
Redirect Domain www.realwire.com
Redirect Base realwire.com
Domain IPs 92.53.244.93
Redirect IPs 92.53.244.93
Response IP 92.53.244.93
Found Yes
Hash aef378d84f18f69ed646ba137a038e3e2ae167395722d826a6538a3c80977f77
SimHash 7432cf3ddff5

Groups

nlcrawler/1.0+(+http://northernlight.com/)
nlcrawler

Rule Path
Disallow /

*

Rule Path
Disallow /clients/

Other Records

Field Value
crawl-delay 3

vegebot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 4

vegi bot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 4

yandex

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 4

twitterbot/1.0

Rule Path
Allow /

Comments

  • nlcrawler are very agressive creating mulitple sessions
  • nlcrawler has been blocked by IP (38.106.112.254) as they ignored the following instruction
  • Wait 3 seconds between successive requests. TO keep within the 200 requests per 10 minute window
  • upped to 4 seconds as vegbot still triggering (150 requests)