pm.opentechiz.com
robots.txt

Robots Exclusion Standard data for pm.opentechiz.com

Resource Scan

Scan Details

Site Domain pm.opentechiz.com
Base Domain opentechiz.com
Scan Status Ok
Last Scan2024-10-11T08:42:58+00:00
Next Scan 2024-10-25T08:42:58+00:00

Last Scan

Scanned2024-10-11T08:42:58+00:00
URL https://pm.opentechiz.com/robots.txt
Domain IPs 45.32.125.3
Response IP 45.32.125.3
Found Yes
Hash d653b009fb89f7af1e0000a3c119976bfd30c3160ccbc52089ae5bff1a859bdd
SimHash 02321c1d4176

Groups

*

Rule Path
Disallow /autocomplete/users
Disallow /search
Disallow /api
Disallow /admin
Disallow /profile
Disallow /dashboard
Disallow /projects/new
Disallow /groups/new
Disallow /groups/*/edit
Disallow /users
Disallow /help
Allow /users/sign_in

*

Rule Path
Disallow /s/
Disallow /snippets/new
Disallow /snippets/*/edit
Disallow /snippets/*/raw

*

Rule Path
Disallow /*/*.git
Disallow /*/*/fork/new
Disallow /*/*/repository/archive*
Disallow /*/*/activity
Disallow /*/*/new
Disallow /*/*/edit
Disallow /*/*/raw
Disallow /*/*/blame
Disallow /*/*/commits/*/*
Disallow /*/*/commit/*.patch
Disallow /*/*/commit/*.diff
Disallow /*/*/compare
Disallow /*/*/branches/new
Disallow /*/*/tags/new
Disallow /*/*/network
Disallow /*/*/graphs
Disallow /*/*/milestones/new
Disallow /*/*/milestones/*/edit
Disallow /*/*/issues/new
Disallow /*/*/issues/*/edit
Disallow /*/*/-/merge_requests/new
Disallow /*/*/-/merge_requests/*.patch
Disallow /*/*/-/merge_requests/*.diff
Disallow /*/*/-/merge_requests/*/edit
Disallow /*/*/-/merge_requests/*/diffs
Disallow /*/*/project_members/import
Disallow /*/*/labels/new
Disallow /*/*/labels/*/edit
Disallow /*/*/wikis/*/edit
Disallow /*/*/snippets/new
Disallow /*/*/snippets/*/edit
Disallow /*/*/snippets/*/raw
Disallow /*/*/deploy_keys
Disallow /*/*/hooks
Disallow /*/*/services
Disallow /*/*/protected_branches
Disallow /*/*/uploads/
Disallow /*/-/group_members
Disallow /*/project_members
Disallow /groups/*/-/contribution_analytics
Disallow /groups/*/-/analytics

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-Agent: *
  • Disallow: /
  • Add a 1 second delay between successive requests to the same server, limits resources used by crawler
  • Only some crawlers respect this setting, e.g. Googlebot does not
  • Crawl-delay: 1
  • Based on details in https://gitlab.com/gitlab-org/gitlab/blob/master/config/routes.rb, https://gitlab.com/gitlab-org/gitlab/blob/master/spec/routing, and using application
  • Only specifically allow the Sign In page to avoid very ugly search results
  • Global snippets
  • Project details