gitlab.cern.ch
robots.txt

Robots Exclusion Standard data for gitlab.cern.ch

Resource Scan

Scan Details

Site Domain gitlab.cern.ch
Base Domain cern.ch
Scan Status Ok
Last Scan2024-06-09T03:57:46+00:00
Next Scan 2024-06-23T03:57:46+00:00

Last Scan

Scanned2024-06-09T03:57:46+00:00
URL https://gitlab.cern.ch/robots.txt
Domain IPs 188.185.11.106, 188.185.15.101, 188.185.15.160, 188.185.22.120, 188.185.25.125, 188.185.25.203, 188.185.31.211, 188.185.34.83, 188.185.35.37, 2001:1458:d00:63::100:377, 2001:1458:d00:64::100:2c3, 2001:1458:d00:64::100:ac, 2001:1458:d00:66::100:39c, 2001:1458:d00:67::100:313, 2001:1458:d00:67::100:327, 2001:1458:d00:68::100:18, 2001:1458:d00:69::100:1fb, 2001:1458:d00:69::100:d0
Response IP 188.185.11.106
Found Yes
Hash 1710ebf9c5d856930b85b68c01860c6558c20ce3be0196d211e84136d18ce9ed
SimHash c61299536377

Groups

*

Rule Path
Disallow /autocomplete/users
Disallow /autocomplete/projects
Disallow /search
Disallow /admin
Disallow /profile
Disallow /dashboard
Disallow /users
Disallow /api/v*
Disallow /help
Disallow /s/
Disallow /-/profile
Disallow /-/user_settings/profile
Disallow /-/ide/
Disallow /-/experiment
Allow /users/sign_in
Allow /users/sign_up
Allow /users/*/snippets

*

Rule Path
Disallow /*/new
Disallow /*/edit
Disallow /*/raw
Disallow /*/realtime_changes

*

Rule Path
Disallow /groups/*/analytics
Disallow /groups/*/contribution_analytics
Disallow /groups/*/group_members
Disallow /groups/*/-/saml/sso

*

Rule Path
Disallow /*/*.git$
Disallow /*/archive/
Disallow /*/repository/archive*
Disallow /*/activity
Disallow /*/blame
Disallow /*/commits
Disallow /*/commit
Disallow /*/commit/*.patch
Disallow /*/commit/*.diff
Disallow /*/compare
Disallow /*/network
Disallow /*/graphs
Disallow /*/merge_requests/*.patch
Disallow /*/merge_requests/*.diff
Disallow /*/merge_requests/*/diffs
Disallow /*/deploy_keys
Disallow /*/hooks
Disallow /*/services
Disallow /*/protected_branches
Disallow /*/uploads/
Disallow /*/project_members
Disallow /*/settings
Disallow /*/-/import
Disallow /*/-/environments
Disallow /*/-/jobs
Disallow /*/-/requirements_management
Disallow /*/-/pipelines
Disallow /*/-/pipeline_schedules
Disallow /*/-/dependencies
Disallow /*/-/licenses
Disallow /*/-/metrics
Disallow /*/-/incidents
Disallow /*/-/value_stream_analytics
Disallow /*/-/analytics
Disallow /*/insights

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /
  • Add a 1 second delay between successive requests to the same server, limits resources used by crawler
  • Only some crawlers respect this setting, e.g. Googlebot does not
  • Crawl-delay: 1
  • Based on details in https://gitlab.com/gitlab-org/gitlab/blob/master/config/routes.rb,
  • https://gitlab.com/gitlab-org/gitlab/blob/master/spec/routing, and using application
  • Global routes
  • Restrict allowed routes to avoid very ugly search results
  • Generic resource routes like new, edit, raw
  • This will block routes like:
  • - /projects/new
  • - /gitlab-org/gitlab-foss/issues/123/-/edit
  • Group details
  • Project details