aerospaceweb.org
robots.txt

Robots Exclusion Standard data for aerospaceweb.org

Resource Scan

Scan Details

Site Domain aerospaceweb.org
Base Domain aerospaceweb.org
Scan Status Ok
Last Scan2025-10-10T16:37:55+00:00
Next Scan 2025-11-09T16:37:55+00:00

Last Scan

Scanned2025-10-10T16:37:55+00:00
URL https://aerospaceweb.org/robots.txt
Redirect https://aerospaceweb.org/robots_ssl.txt
Domain IPs 50.87.234.5
Response IP 50.87.234.5
Found Yes
Hash 9fa41a4b77676f7d5537e149d9b2023f6be4a001b2f1da52a7abddfb4e1daa7c
SimHash 2805fa0965e3

Groups

*

Rule Path
Disallow /about/copyright.shtml
Disallow /about/formsubmit.shtml
Disallow /affiliates/
Disallow /cgi-bin/
Disallow /errors/
Disallow /jeff/

ia_archiver

Rule Path
Disallow /

psbot

Rule Path
Disallow /

Comments

  • Settings for all search engines
  • following wildcard lines may not work with all crawlers
  • Disallow: /*.gif$
  • Disallow: /*.jpg$
  • Disallow: /*.mp3$
  • Disallow: /*.wav$
  • Disallow: /*.doc$
  • Disallow: /*.pdf$
  • Disallow: /*.txt$
  • Google image search
  • User-agent: Googlebot-Image
  • Disallow: /
  • Internet Archive
  • Picsearch