git.doit.wisc.edu
robots.txt

Robots Exclusion Standard data for git.doit.wisc.edu

Resource Scan

Scan Details

Site Domain git.doit.wisc.edu
Base Domain wisc.edu
Scan Status Ok
Last Scan2024-11-03T09:09:49+00:00
Next Scan 2024-11-17T09:09:49+00:00

Last Scan

Scanned2024-11-03T09:09:49+00:00
URL https://git.doit.wisc.edu/robots.txt
Domain IPs 128.104.31.113
Response IP 128.104.31.113
Found Yes
Hash 590b616f2026264c6d87c5a5db32a9c1a4126dacf517bcdeb3ab6f58f4c9c43a
SimHash 4e0699636377

Groups

*

Rule Path
Disallow /autocomplete/users
Disallow /autocomplete/projects
Disallow /search
Disallow /admin
Disallow /profile
Disallow /dashboard
Disallow /users
Disallow /api/v*
Disallow /help
Disallow /s/
Disallow /-/profile
Disallow /-/profile/
Disallow /-/user_settings/
Disallow /-/ide/
Disallow /-/experiment
Allow /users/sign_in
Allow /users/sign_up
Allow /users/*/snippets

*

Rule Path
Disallow /*/new
Disallow /*/edit
Disallow /*/raw
Disallow /*/realtime_changes

*

Rule Path
Disallow /groups/*/-/analytics
Disallow /groups/*/-/analytics/
Disallow /groups/*/-/insights/
Disallow /groups/*/-/issues_analytics
Disallow /groups/*/-/contribution_analytics
Disallow /groups/*/-/group_members
Disallow /groups/*/-/saml/
Disallow /groups/*/-/saml_group_links
Disallow /groups/*/-/settings/
Disallow /groups/*/-/billings
Disallow /groups/*/-/hooks
Disallow /groups/*/-/projects

*

Rule Path
Disallow /*/*.git$
Disallow /*/archive/
Disallow /*/repository/archive*
Disallow /*/activity
Disallow /*/-/project_members
Disallow /*/-/blame/
Disallow /*/-/branches
Disallow /*/-/commits/
Disallow /*/-/commit
Disallow /*/commit/*.patch
Disallow /*/commit/*.diff
Disallow /*/-/compare/
Disallow /*/-/network/
Disallow /*/path_locks
Disallow /*/merge_requests/*.patch
Disallow /*/merge_requests/*.diff
Disallow /*/merge_requests/*/diffs
Disallow /*/services
Disallow /*/uploads/
Disallow /*/-/import
Disallow /*/-/requirements_management/
Disallow /*/-/pipelines
Disallow /*/-/pipeline_schedules
Disallow /*/-/jobs
Disallow /*/-/ci/
Disallow /*/-/quality/
Disallow /*/-/licenses
Disallow /*/-/security/
Disallow /*/-/dependencies
Disallow /*/-/audit_events
Disallow /*/-/on_demand_scans
Disallow /*/-/feature_flags
Disallow /*/-/ml/
Disallow /*/-/environments
Disallow /*/-/clusters
Disallow /*/-/terraform
Disallow /*/-/terraform_module_registry
Disallow /*/-/*/configuration
Disallow /*/-/error_tracking
Disallow /*/-/metrics
Disallow /*/-/alert_management
Disallow /*/-/incidents
Disallow /*/-/oncall_schedules
Disallow /*/-/escalation_policies
Disallow /*/-/*/service_desk
Disallow /*/-/analytics
Disallow /*/-/analytics/
Disallow /*/-/value_stream_analytics
Disallow /*/-/graphs/
Disallow /*/insights/
Disallow /*/-/pipelines/
Disallow /*/-/settings/
Disallow /*/-/hooks
Disallow /*/-/usage_quotas

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /
  • Add a 1 second delay between successive requests to the same server, limits resources used by crawler
  • Only some crawlers respect this setting, e.g. Googlebot does not
  • Crawl-delay: 1
  • Based on details in https://gitlab.com/gitlab-org/gitlab/blob/master/config/routes.rb,
  • https://gitlab.com/gitlab-org/gitlab/blob/master/spec/routing, and using application
  • Global routes
  • Restrict allowed routes to avoid very ugly search results
  • Generic resource routes like new, edit, raw
  • This will block routes like:
  • - /projects/new
  • - /gitlab-org/gitlab-foss/issues/123/-/edit
  • Group details
  • Project details