databases.lovd.nl
robots.txt

Robots Exclusion Standard data for databases.lovd.nl

Resource Scan

Scan Details

Site Domain databases.lovd.nl
Base Domain lovd.nl
Scan Status Ok
Last Scan2025-10-15T11:28:49+00:00
Next Scan 2025-11-14T11:28:49+00:00

Last Scan

Scanned2025-10-15T11:28:49+00:00
URL https://databases.lovd.nl/robots.txt
Domain IPs 145.88.210.19
Response IP 145.88.210.19
Found Yes
Hash c728ab4a01121dcf98bd431e255edf8451a95c5e86ce99da44d30e2a9fe7a854
SimHash aa1e51510272

Groups

mj12bot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

owler

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

awariorssbot
awariosmartbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

terracotta

Rule Path
Disallow /

Comments

  • Because it causes HTTP 406 errors everywhere.
  • Because it causes HTTP 406 errors everywhere.
  • This bot reads, but ignores the robots.txt file.
  • Because it's being an idiot and it ignores the BASE HREF tag.
  • Has no use crawling our site but causes screen scraping warnings.
  • Nope. Just nope. Downloads everything and then lets others use it without restriction.
  • Buggy. Doens't understand what a BASE HREF is.
  • Doesn't support crawl-delay. OK, leave us alone, then.
  • Repeated requests to the same pages and downloads lots of variant data that isn't useful for the purpose of the bot.
  • Slow down, boys.
  • Slow these down even more, since they don't follow the 'User-agent: *' rule.
  • Annoying crawler; sends immense amounts of HEAD requests using an UA other than its own which annoys my scraping detection scripts.