dbr.gbi-bogor.org
robots.txt

Robots Exclusion Standard data for dbr.gbi-bogor.org

Resource Scan

Scan Details

Site Domain dbr.gbi-bogor.org
Base Domain gbi-bogor.org
Scan Status Ok
Last Scan2025-11-28T09:31:18+00:00
Next Scan 2025-12-28T09:31:18+00:00

Last Scan

Scanned2025-11-28T09:31:18+00:00
URL https://dbr.gbi-bogor.org/robots.txt
Domain IPs 198.204.243.242
Response IP 198.204.243.242
Found Yes
Hash c35ee07b09e08d41dcc38d4dc66985e22b0e3bd99a47e6a3a69a730f661adcbb
SimHash 49141153ccf7

Groups

webreaper

Rule Path
Disallow /

*

Rule Path
Disallow /wiki/User%3A*
Disallow /wiki/Pengguna%3A*
Disallow /wiki/Special%3A*
Disallow /wiki/Istimewa*
Disallow /wiki/Templat%3A*
Disallow /wiki/Internal%3A*
Disallow /o/index.php?
Disallow /org/
Disallow /wt/index.php?
Disallow /wtest/

Comments

  • Taken from Source: http://en.wikipedia.org/robots.txt
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Don't allow the wayback-maschine to index user-pages
  • User-agent: ia_archiver
  • Disallow: /wiki/User
  • Disallow: /wiki/Benutzer
  • Friendly, low-speed bots are welcome viewing article pages, but not
  • dynamically-generated pages please.
  • Inktomi's "Slurp" can read a minimum delay between hits; if your
  • bot supports such a thing using the 'Crawl-delay' or another
  • instruction, please let us know.
  • Disallow: /w/