site.vagas.com.br
robots.txt

Robots Exclusion Standard data for site.vagas.com.br

Resource Scan

Scan Details

Site Domain site.vagas.com.br
Base Domain vagas.com.br
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-08-31T13:09:47+00:00
Next Scan 2024-11-29T13:09:47+00:00

Last Successful Scan

Scanned2022-09-26T10:25:19+00:00
URL https://site.vagas.com.br/robots.txt
Response IP 104.16.60.29, 104.16.61.29
Found Yes
Hash 788f9b87e44867eea44c23be023ac5f126354b1dc1fca7d6dc596ced1e84c25d
SimHash be1673d9eef6

Groups

*

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

download

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

googlebot-image

Rule Path
Disallow /

msnbot-newsblogs

Rule Path
Disallow /

msnbot-products

Rule Path
Disallow /

msnbot-media

Rule Path
Disallow /

*

Rule Path
Disallow /Componentes_VBV/
Disallow /Demo/
Disallow /PromocaoAcontece/
Disallow /Acontece/
Disallow /css/
Disallow /DocCli/
Disallow /DocExt/
Disallow /EditorCli/
Disallow /HtmlCli/
Disallow /HLCAPerfis/
Disallow /img/
Disallow /js/
Disallow /Library/
Disallow /News/
Disallow /Rel/
Disallow /RelDwld/
Disallow /Scripts/
Disallow /symantec/
Disallow /xml/
Disallow /MsgSessaoCancelada.asp

Comments

  • As partes comentadas em ingles foram baseadas no robots.txt da Wikipedia
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • User-agent: DOC
  • Disallow: /
  • User-agent: Zao
  • Disallow: /
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • User-agent: sitecheck.internetseer.com
  • Disallow: /
  • User-agent: Zealbot
  • Disallow: /
  • User-agent: MSIECrawler
  • Disallow: /
  • User-agent: SiteSnagger
  • Disallow: /
  • User-agent: WebStripper
  • Disallow: /
  • User-agent: WebCopier
  • Disallow: /
  • User-agent: Fetch
  • Disallow: /
  • O nome do agente abaixo continha um espaco em branco
  • User-agent: Offline Explorer
  • User-agent: Offline
  • Disallow: /
  • User-agent: Teleport
  • Disallow: /
  • User-agent: WebZIP
  • Disallow: /
  • User-agent: linko
  • Disallow: /
  • User-agent: HTTrack
  • Disallow: /
  • User-agent: Microsoft.URL.Control
  • Disallow: /
  • User-agent: Xenu
  • Disallow: /
  • User-agent: larbin
  • Disallow: /
  • User-agent: libwww
  • Disallow: /
  • User-agent: ZyBORG
  • Disallow: /
  • O nome do agente abaixo continha um espaco em branco
  • User-agent: Download Ninja
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Robos indesejados do google e da Microsoft.
  • Demais robos - acesso restrito.
  • xml sitemap address
  • Sitemap: http://www.vagas.com.br/sitemap.aspx
  • Sitemap: http://vagas.com.br/profissoes/sitemap_index.xml