sourceware.org
robots.txt

Robots Exclusion Standard data for sourceware.org

Resource Scan

Scan Details

Site Domain sourceware.org
Base Domain sourceware.org
Scan Status Ok
Last Scan2025-10-25T18:08:38+00:00
Next Scan 2025-11-24T18:08:38+00:00

Last Scan

Scanned2025-10-25T18:08:38+00:00
URL https://sourceware.org/robots.txt
Domain IPs 2620:52:3:1:0:246e:9693:128c, 8.43.85.97
Response IP 8.43.85.97
Found Yes
Hash 27449ac926e5f5a59ef487a467e215771f4a6eaaf719db5e66e34f7d1291cf8f
SimHash 88fe63d812c6

Groups

*

Rule Path
Disallow /cygwin/snapshots
Disallow /cygwin/packages/
Disallow /cgi-bin/
Disallow /cgi/
Disallow /git/
Disallow /cgit/
Disallow /viewvc
Disallow /viewcvs
Disallow /bugzilla/buglist.cgi
Disallow /bugzilla//buglist.cgi
Disallow /bugzilla/show_bug.cgi*ctype%3Dxml
Disallow /bugzilla/attachment.cgi
Disallow /bugzilla/showdependencygraph.cgi
Disallow /bugzilla/showdependencytree.cgi
Disallow /bugzilla/enter_bug.cgi
Disallow /bugzilla/show_activity.cgi
Disallow /*/wiki/*?action=*
Disallow /*/wiki/*?diffs=*
Disallow /*/wiki/*?highlight=*
Disallow /*/wiki/*?calparms=*

Other Records

Field Value
crawl-delay 60

test crawler

Rule Path
Disallow /

digext

Rule Path
Disallow /

digext

Rule Path
Disallow /ml/

cw crawler

Rule Path
Disallow /

scooter

Rule Path
Disallow /ml/

Comments

  • contact sourcemaster@sourceware.org for questions.
  • see https://www.robotstxt.org/robotstxt.html
  • for information about the file format.
  • Rogue robot blacklist