issues.apache.org
robots.txt

Robots Exclusion Standard data for issues.apache.org

Resource Scan

Scan Details

Site Domain issues.apache.org
Base Domain apache.org
Scan Status Ok
Last Scan2025-02-24T02:58:26+00:00
Next Scan 2025-03-26T02:58:26+00:00

Last Scan

Scanned2025-02-24T02:58:26+00:00
URL https://issues.apache.org/robots.txt
Domain IPs 168.119.33.54, 2a01:4f8:242:1f49::2
Response IP 168.119.33.54
Found Yes
Hash 056733415abb87cdc34e95f2fa29427f32fb81026763a9bbe4af8083ba0c021b
SimHash 2c39ced2dfd5

Groups

googlebot

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot-mobile

Rule Path
Disallow

msnbot

Rule Path
Disallow

slurp

Rule Path
Disallow

nutch

Rule Path
Disallow

ia_archiver

Rule Path
Disallow

baiduspider

Rule Path
Disallow /

yahoo-mmcrawler

Rule Path
Disallow

psbot

Rule Path
Disallow

yahoo-blogs/v3.9

Rule Path
Disallow

*

Rule Path
Disallow /

Other Records

Field Value
crawl-delay 2

Comments

  • Bot operators will need to contact root@apache.org in order to be explicitly allowed.
  • Bots that do not respect crawl-delay instructions are not permitted.
  • Default action: don't allow. Global crawl-delay is set to two seconds.