nightlies.apache.org
robots.txt

Robots Exclusion Standard data for nightlies.apache.org

Resource Scan

Scan Details

Site Domain nightlies.apache.org
Base Domain apache.org
Scan Status Ok
Last Scan2025-03-03T10:02:32+00:00
Next Scan 2025-04-02T10:02:32+00:00

Last Scan

Scanned2025-03-03T10:02:32+00:00
URL https://nightlies.apache.org/robots.txt
Domain IPs 2a01:4f9:4a:23ec::2, 95.217.87.228
Response IP 95.217.87.228
Found Yes
Hash 056733415abb87cdc34e95f2fa29427f32fb81026763a9bbe4af8083ba0c021b
SimHash 2c39ced2dfd5

Groups

googlebot

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot-mobile

Rule Path
Disallow

msnbot

Rule Path
Disallow

slurp

Rule Path
Disallow

nutch

Rule Path
Disallow

ia_archiver

Rule Path
Disallow

baiduspider

Rule Path
Disallow /

yahoo-mmcrawler

Rule Path
Disallow

psbot

Rule Path
Disallow

yahoo-blogs/v3.9

Rule Path
Disallow

*

Rule Path
Disallow /

Other Records

Field Value
crawl-delay 2

Comments

  • Bot operators will need to contact root@apache.org in order to be explicitly allowed.
  • Bots that do not respect crawl-delay instructions are not permitted.
  • Default action: don't allow. Global crawl-delay is set to two seconds.