rh.recruteur.pro
robots.txt

Robots Exclusion Standard data for rh.recruteur.pro

Resource Scan

Scan Details

Site Domain rh.recruteur.pro
Base Domain recruteur.pro
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-08-13T10:50:13+00:00
Next Scan 2024-11-11T10:50:13+00:00

Last Successful Scan

Scanned2023-03-29T05:11:20+00:00
URL https://rh.recruteur.pro/robots.txt
Domain IPs 178.33.237.163
Response IP 178.33.237.163
Found Yes
Hash 2893a9f8d88f2d61d15cd796f7769c57a426e1f35cc58e1e527f60094b7bfcb3
SimHash 6a1e94504531

Groups

googlebot

Rule Path
Disallow /*?*
Disallow /*folder_factories$
Disallow /*send_as_pdf*
Disallow /*download_as_pdf*
Disallow /parametrages/
Disallow /newsletter/
Disallow /abonnez-vous/
Disallow /don-en-ligne/
Disallow /portal_checkouttool/
Disallow /Members/

mediapartners-google

Rule Path
Disallow
Allow /
Disallow /*folder_factories$
Disallow /*send_as_pdf*
Disallow /*download_as_pdf*
Disallow /parametrages/
Disallow /newsletter/
Disallow /abonnez-vous/
Disallow /don-en-ligne/
Disallow /portal_checkouttool/
Disallow /Members/

yahoo! slurp

Rule Path
Disallow /*?*
Disallow /*folder_factories$
Disallow /*send_as_pdf*
Disallow /*download_as_pdf*
Disallow /parametrages/
Disallow /newsletter/
Disallow /abonnez-vous/
Disallow /don-en-ligne/
Disallow /portal_checkouttool/
Disallow /Members/

Other Records

Field Value
crawl-delay 10

bingbot

Rule Path
Disallow /*?*
Disallow /*folder_factories$
Disallow /*send_as_pdf*
Disallow /*download_as_pdf*
Disallow /parametrages/
Disallow /newsletter/
Disallow /abonnez-vous/
Disallow /don-en-ligne/
Disallow /portal_checkouttool/
Disallow /Members/

Other Records

Field Value
crawl-delay 10

baiduspider

Rule Path
Disallow /*?*
Disallow /*folder_factories$
Disallow /*send_as_pdf*
Disallow /*download_as_pdf*
Disallow /parametrages/
Disallow /newsletter/
Disallow /abonnez-vous/
Disallow /don-en-ligne/
Disallow /portal_checkouttool/
Disallow /Members/

Other Records

Field Value
crawl-delay 10

msnbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

sogou spider

Rule Path
Disallow /

seokicks-robot

Rule Path
Disallow /

seokicks

Rule Path
Disallow /

discobot

Rule Path
Disallow /

blekkobot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

sistrix crawler

Rule Path
Disallow /

uptimerobot/2.0

Rule Path
Disallow /

ezooms robot

Rule Path
Disallow /

perl lwp

Rule Path
Disallow /

netestate ne crawler

Rule Path
Disallow /

wiseguys robot

Rule Path
Disallow /

turnitin robot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

babya discoverer

Rule Path
Disallow /

*

Rule Path
Disallow /parametrages/
Disallow /newsletter/
Disallow /abonnez-vous/
Disallow /don-en-ligne/
Disallow /portal_checkouttool/
Disallow /Members/

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://rh.recruteur.pro/sitemap.xml.gz

Comments

  • Define access-restrictions for robots/spiders
  • http://www.robotstxt.org/wc/norobots.html
  • see http://opensourcehacker.com/2009/08/07/seo-tips-query-strings-multiple-languages-forms-and-other-content-management-system-issues/
  • Googlebot allows regex in its syntax
  • Block all URLs including query strings (? pattern) - contentish objects expose query string only for actions or status reports which
  • might confuse search results.
  • This will also block ?set_language
  • Allow Adsense bot on entire site
  • Block MJ12bot as it is just noise
  • Block Ahrefs
  • Block Sogou
  • Block SEOkicks
  • SEOkicks
  • Dicoveryengine.com
  • Blekkobot
  • Block BlexBot
  • Block SISTRIX
  • Block Uptime robot
  • Block Ezooms Robot
  • Block Perl LWP
  • Block netEstate NE Crawler
  • Block WiseGuys Robot
  • Block Turnitin Robot
  • Exabot
  • Yandex
  • Babya Discoverer
  • Directories
  • Request-rate: defines pages/seconds to be crawled ratio. 1/20 would be 1 page in every 20 second.
  • Crawl-delay: defines howmany seconds to wait after each succesful crawling.
  • Visit-time: you can define between which hours you want your pages to be crawled. Example usage is: 0100-0330 which means that pages will be indexed between 01:00 AM - 03:30 AM GMT.

Warnings

  • 2 invalid lines.
  • `request-rate` is not a known field.
  • `visit-time` is not a known field.