aceproject.org
robots.txt

Robots Exclusion Standard data for aceproject.org

Resource Scan

Scan Details

Site Domain aceproject.org
Base Domain aceproject.org
Scan Status Ok
Last Scan2024-09-26T03:19:15+00:00
Next Scan 2024-10-26T03:19:15+00:00

Last Scan

Scanned2024-09-26T03:19:15+00:00
URL https://aceproject.org/robots.txt
Domain IPs 104.21.9.110, 172.67.159.195, 2606:4700:3031::6815:96e, 2606:4700:3036::ac43:9fc3
Response IP 172.67.159.195
Found Yes
Hash 21c59bea50b0d12ccdd928ae10fa092020acb5fbf62a0ce2c20fb1a94751c845
SimHash 1e71ae634ef4

Groups

ahrefsbot

Rule Path
Disallow /

scirus-crawler

Rule Path
Disallow /

wotbox

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

blekkobot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /*-ru/
Disallow /*-fr/
Disallow /*-ar/
Disallow /*-es/
Disallow /*-sw/
Disallow /today/
Disallow /electoral-advice
Disallow /main/
Disallow /Members/
Disallow /author/
Disallow /epic-
Disallow /ero-
Disallow /*CDCountry
Disallow /*CDTable
Disallow /*CDMap
Disallow /*CDBarChart
Disallow /*CDChart
Disallow /*open-flash-chart
Disallow /*world_swf
Disallow /*?searchterm
Disallow /*SearchableText
Disallow /*mail_password_form
Disallow /*sendto_form$
Disallow /*folder_factories$
Disallow /*portal_factory$
Disallow /*search$
Disallow /*login_form
Disallow /*search_materials_results
Disallow /*onePage
Disallow /ace-en/pdf/

Other Records

Field Value
crawl-delay 90

yandex

Rule Path
Disallow /*-en/
Disallow /*-fr/
Disallow /*-ar/
Disallow /*-es/
Disallow /*-sw/
Disallow /today/
Disallow /electoral-advice
Disallow /main/
Disallow /Members/
Disallow /author/
Disallow /epic-
Disallow /ero-
Disallow /*CDCountry
Disallow /*CDTable
Disallow /*CDMap
Disallow /*CDBarChart
Disallow /*CDChart
Disallow /*open-flash-chart
Disallow /*world_swf
Disallow /*?searchterm
Disallow /*SearchableText
Disallow /*mail_password_form
Disallow /*sendto_form$
Disallow /*folder_factories$
Disallow /*portal_factory$
Disallow /*search$
Disallow /*login_form
Disallow /*search_materials_results
Disallow /ace-en/pdf/

Other Records

Field Value
crawl-delay 90

bnf.fr_bot

Rule Path
Disallow /

showyoubot

Rule Path
Disallow /

suggybot

Rule Path
Disallow /

speedy

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

discobot

Rule Path
Disallow /

discoverybot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

archive.org_bot

Rule Path
Disallow /

jikespider

Rule Path
Disallow /

wbsearchbot

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

*

Rule Path
Disallow /epic-en/cmfepic/
Disallow /epic-en/research/
Disallow /epic-en/countries/
Disallow /epic-fr/countries/
Disallow /epic-es/countries/
Disallow /epic-ar/countries/
Disallow /epic-ru/countries/
Disallow /regions-en/archive/
Disallow /author/
Disallow /today-ole/
Disallow /today/feature-articles/portal_factory
Disallow /electoral-advice/ace-workspace/
Disallow /electoral-advice/ace-workspace/
Disallow /regions-en/archive/
Disallow /electoral-advice/dop?
Disallow /translatorsTemplate
Disallow /electoral-advice/ace-workspace/
Disallow /ace-en/pdf/

Other Records

Field Value
crawl-delay 30

*

Rule Path
Disallow /*/topics/onePage
Disallow /ace-es/topics/va/onePage
Disallow /ace-ru/*/onePage
Disallow /ace-sw/*/onePage
Disallow /ace-fr/*/onePage
Disallow /ace-ar/*/onePage
Disallow /main/*/onePage
Disallow /*sendto_form$
Disallow /*folder_factories$
Disallow /*portal_factory$
Disallow /*search$
Disallow /*login_form
Disallow /*mail_password_form
Disallow /ace-*search_materials_results
Disallow /regions-*search_materials_results
Disallow /regions-en/*CDCountry
Disallow /regions-en/*CDTable
Disallow /regions-en/*CDMap
Disallow /regions-en/*CDChart
Disallow /regions-en/*CDBarChart
Disallow /regions-en/*open-flash-chart
Disallow /regions-en/*world_swf
Disallow /epic-en/feedback
Disallow /epic-es/feedback
Disallow /epic-fr/feedback
Disallow /epic-en/en/CD
Disallow /epic-en/es/CD
Disallow /epic-en/fr/CD
Disallow /epic-en/ar/CD
Disallow /epic-en/ru/CD
Disallow /epic-fr/en/CD
Disallow /epic-fr/es/CD
Disallow /epic-fr/fr/CD
Disallow /epic-fr/ar/CD
Disallow /epic-fr/ru/CD
Disallow /epic-es/en/CD
Disallow /epic-es/es/CD
Disallow /epic-es/fr/CD
Disallow /epic-es/ar/CD
Disallow /epic-es/ru/CD
Disallow /CD
Disallow /epic-*open-flash-chart
Disallow /epic-*world_swf
Disallow /ero-*country%3D
Disallow /*ero-*/index_html?filter
Disallow /ero-es
Disallow /ero-fr
Disallow /ero-ar
Disallow /ero-ru
Disallow /ace-*CDCountry
Disallow /ace-en/ero-
Disallow /ace-es/ero-
Disallow /ace-ar/ero-
Disallow /ace-ru/ero-
Disallow /ace-fr/ero-
Disallow /ace-*CDTable
Disallow /ace-*CDMap
Disallow /ace-*CDBarChart
Disallow /ace-*CDChart
Disallow /ace-*open-flash-chart
Disallow /ace-*world_swf
Disallow /*?searchterm
Disallow /*SearchableText
Disallow /*mail_password_form
Disallow /*%26mission%3D
Disallow /en/*
Disallow /fr/*
Disallow /es/*
Disallow /ru/*
Disallow /ar/*
Disallow /ace-en/pdf/

Comments

  • Define access-restrictions for robots/spiders
  • http://www.robotstxt.org/wc/norobots.html
  • Yandex bot
  • French natl. library
  • video indexing service
  • CCbot not obeying wildcards
  • ezooms not obeying wildcards
  • MJ12bot not obeying wildcards
  • discobot not obeying wildcards
  • discobot not obeying wildcards
  • Exabot not obeying wildcards
  • archive.org_bot not obeying wildcards
  • JikeSpider not obeying wildcards
  • WBSearchBot not obeying wildcards
  • Gigabot not obeying wildcards
  • By default we allow robots to access all areas of our site
  • already accessible to anonymous users
  • Add Googlebot-specific syntax extension to exclude forms
  • that are repeated for each piece of content in the site
  • the wildcard is only supported by Googlebot
  • http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling

Warnings

  • 2 invalid lines.