gogit.univ-orleans.fr
robots.txt

Robots Exclusion Standard data for gogit.univ-orleans.fr

Resource Scan

Scan Details

Site Domain gogit.univ-orleans.fr
Base Domain univ-orleans.fr
Scan Status Ok
Last Scan2024-10-02T19:23:11+00:00
Next Scan 2024-10-16T19:23:11+00:00

Last Scan

Scanned2024-10-02T19:23:11+00:00
URL https://gogit.univ-orleans.fr/robots.txt
Domain IPs 194.167.30.150
Response IP 194.167.30.150
Found Yes
Hash dd57731817fc004fc04e547e944d7b931deefb7bc2aeda188402f439645ca596
SimHash 02b21d5d0176

Groups

*

Rule Path
Disallow /autocomplete/users
Disallow /search
Disallow /api
Disallow /admin
Disallow /profile
Disallow /dashboard
Disallow /projects/new
Disallow /groups/new
Disallow /groups/*/edit
Disallow /users
Disallow /help
Allow /users/sign_in

*

Rule Path
Disallow /s/
Disallow /snippets/new
Disallow /snippets/*/edit
Disallow /snippets/*/raw

*

Rule Path
Disallow /*/*.git
Disallow /*/*/fork/new
Disallow /*/*/repository/archive*
Disallow /*/*/activity
Disallow /*/*/new
Disallow /*/*/edit
Disallow /*/*/raw
Disallow /*/*/blame
Disallow /*/*/commits/*/*
Disallow /*/*/commit/*.patch
Disallow /*/*/commit/*.diff
Disallow /*/*/compare
Disallow /*/*/branches/new
Disallow /*/*/tags/new
Disallow /*/*/network
Disallow /*/*/graphs
Disallow /*/*/milestones/new
Disallow /*/*/milestones/*/edit
Disallow /*/*/issues/new
Disallow /*/*/issues/*/edit
Disallow /*/*/merge_requests/new
Disallow /*/*/merge_requests/*.patch
Disallow /*/*/merge_requests/*.diff
Disallow /*/*/merge_requests/*/edit
Disallow /*/*/merge_requests/*/diffs
Disallow /*/*/project_members/import
Disallow /*/*/labels/new
Disallow /*/*/labels/*/edit
Disallow /*/*/wikis/*/edit
Disallow /*/*/snippets/new
Disallow /*/*/snippets/*/edit
Disallow /*/*/snippets/*/raw
Disallow /*/*/deploy_keys
Disallow /*/*/hooks
Disallow /*/*/services
Disallow /*/*/protected_branches
Disallow /*/*/uploads/
Disallow /*/-/group_members
Disallow /*/project_members

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /
  • Add a 1 second delay between successive requests to the same server, limits resources used by crawler
  • Only some crawlers respect this setting, e.g. Googlebot does not
  • Crawl-delay: 1
  • Based on details in https://gitlab.com/gitlab-org/gitlab-ce/blob/master/config/routes.rb, https://gitlab.com/gitlab-org/gitlab-ce/blob/master/spec/routing, and using application
  • Only specifically allow the Sign In page to avoid very ugly search results
  • Global snippets
  • Project details