repo1.dso.mil
robots.txt

Robots Exclusion Standard data for repo1.dso.mil

Resource Scan

Scan Details

Site Domain repo1.dso.mil
Base Domain dso.mil
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-03-04T17:07:15+00:00
Next Scan 2024-06-02T17:07:15+00:00

Last Successful Scan

Scanned2023-02-02T12:54:06+00:00
URL https://repo1.dso.mil/robots.txt
Domain IPs 15.205.173.153
Response IP 15.205.173.153
Found Yes
Hash 6e281ebbf54733e6c454fc8541c95ea28154a27bb7e9a000765df86dad2a08c2
SimHash c69299536377

Groups

*

Rule Path
Disallow /autocomplete/users
Disallow /autocomplete/projects
Disallow /search
Disallow /admin
Disallow /profile
Disallow /dashboard
Disallow /users
Disallow /api/v*
Disallow /help
Disallow /s/
Disallow /-/profile
Disallow /-/ide/
Disallow /-/experiment
Allow /users/sign_in
Allow /users/sign_up
Allow /users/*/snippets

*

Rule Path
Disallow /*/new
Disallow /*/edit
Disallow /*/raw
Disallow /*/realtime_changes

*

Rule Path
Disallow /groups/*/analytics
Disallow /groups/*/contribution_analytics
Disallow /groups/*/group_members
Disallow /groups/*/-/saml/sso

*

Rule Path
Disallow /*/*.git$
Disallow /*/archive/
Disallow /*/repository/archive*
Disallow /*/activity
Disallow /*/blame
Disallow /*/commits
Disallow /*/commit
Disallow /*/commit/*.patch
Disallow /*/commit/*.diff
Disallow /*/compare
Disallow /*/network
Disallow /*/graphs
Disallow /*/merge_requests/*.patch
Disallow /*/merge_requests/*.diff
Disallow /*/merge_requests/*/diffs
Disallow /*/deploy_keys
Disallow /*/hooks
Disallow /*/services
Disallow /*/protected_branches
Disallow /*/uploads/
Disallow /*/project_members
Disallow /*/settings
Disallow /*/-/import
Disallow /*/-/environments
Disallow /*/-/jobs
Disallow /*/-/requirements_management
Disallow /*/-/pipelines
Disallow /*/-/pipeline_schedules
Disallow /*/-/dependencies
Disallow /*/-/licenses
Disallow /*/-/metrics
Disallow /*/-/incidents
Disallow /*/-/value_stream_analytics
Disallow /*/-/analytics
Disallow /*/insights

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /
  • Add a 1 second delay between successive requests to the same server, limits resources used by crawler
  • Only some crawlers respect this setting, e.g. Googlebot does not
  • Crawl-delay: 1
  • Based on details in https://gitlab.com/gitlab-org/gitlab/blob/master/config/routes.rb,
  • https://gitlab.com/gitlab-org/gitlab/blob/master/spec/routing, and using application
  • Global routes
  • Restrict allowed routes to avoid very ugly search results
  • Generic resource routes like new, edit, raw
  • This will block routes like:
  • - /projects/new
  • - /gitlab-org/gitlab-foss/issues/123/-/edit
  • Group details
  • Project details