site.vagas.com.br
robots.txt

Robots Exclusion Standard data for site.vagas.com.br

Archived Snapshots

Resource Scan

Scan Details

Site Domain	site.vagas.com.br
Base Domain	vagas.com.br
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2024-08-31T13:09:47+00:00
Next Scan	2024-11-29T13:09:47+00:00

Last Successful Scan

Scanned	2022-09-26T10:25:19+00:00
URL	https://site.vagas.com.br/robots.txt
Response IP	104.16.60.29, 104.16.61.29
Found	Yes
Hash	788f9b87e44867eea44c23be023ac5f126354b1dc1fca7d6dc596ced1e84c25d
SimHash	be1673d9eef6

Groups

*

Rule	Path
Disallow	/

Rule

Path

Disallow

teleportpro

Rule	Path
Disallow	/

Rule

Path

Disallow

download

Rule	Path
Disallow	/

Rule

Path

Disallow

wget

Rule	Path
Disallow	/

Rule

Path

Disallow

grub-client

Rule	Path
Disallow	/

Rule

Path

Disallow

k2spider

Rule	Path
Disallow	/

Rule

Path

Disallow

npbot

Rule	Path
Disallow	/

Rule

Path

Disallow

webreaper

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot-image

Rule	Path
Disallow	/

Rule

Path

Disallow

msnbot-newsblogs

Rule	Path
Disallow	/

Rule

Path

Disallow

msnbot-products

Rule	Path
Disallow	/

Rule

Path

Disallow

msnbot-media

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/Componentes_VBV/
Disallow	/Demo/
Disallow	/PromocaoAcontece/
Disallow	/Acontece/
Disallow	/css/
Disallow	/DocCli/
Disallow	/DocExt/
Disallow	/EditorCli/
Disallow	/HtmlCli/
Disallow	/HLCAPerfis/
Disallow	/img/
Disallow	/js/
Disallow	/Library/
Disallow	/News/
Disallow	/Rel/
Disallow	/RelDwld/
Disallow	/Scripts/
Disallow	/symantec/
Disallow	/xml/
Disallow	/MsgSessaoCancelada.asp

Rule

Path

Disallow

/Componentes_VBV/

Disallow

/Demo/

Disallow

/PromocaoAcontece/

Disallow

/Acontece/

Disallow

/css/

Disallow

/DocCli/

Disallow

/DocExt/

Disallow

/EditorCli/

Disallow

/HtmlCli/

Disallow

/HLCAPerfis/

Disallow

/img/

Disallow

/js/

Disallow

/Library/

Disallow

/News/

Disallow

/Rel/

Disallow

/RelDwld/

Disallow

/Scripts/

Disallow

/symantec/

Disallow

/xml/

Disallow

/MsgSessaoCancelada.asp

Comments

As partes comentadas em ingles foram baseadas no robots.txt da Wikipedia
Crawlers that are kind enough to obey, but which we'd rather not have
unless they're feeding search engines.
User-agent: DOC
Disallow: /
User-agent: Zao
Disallow: /
Some bots are known to be trouble, particularly those designed to copy
entire sites. Please obey robots.txt.
User-agent: sitecheck.internetseer.com
Disallow: /
User-agent: Zealbot
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: Fetch
Disallow: /
O nome do agente abaixo continha um espaco em branco
User-agent: Offline Explorer
User-agent: Offline
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: WebZIP
Disallow: /
User-agent: linko
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: Xenu
Disallow: /
User-agent: larbin
Disallow: /
User-agent: libwww
Disallow: /
User-agent: ZyBORG
Disallow: /
O nome do agente abaixo continha um espaco em branco
User-agent: Download Ninja
Sorry, wget in its recursive mode is a frequent problem.
Please read the man page and use it properly; there is a
--wait option you can use to set the delay between hits,
for instance.
The 'grub' distributed client has been *very* poorly behaved.
Doesn't follow robots.txt anyway, but...
Hits many times per second, not acceptable
http://www.nameprotect.com/botinfo.html
A capture bot, downloads gazillions of pages with no public benefit
http://www.webreaper.net/
Robos indesejados do google e da Microsoft.
Demais robos - acesso restrito.
xml sitemap address
Sitemap: http://www.vagas.com.br/sitemap.aspx
Sitemap: http://vagas.com.br/profissoes/sitemap_index.xml

site.vagas.com.brrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

teleportpro

download

wget

grub-client

k2spider

npbot

webreaper

googlebot-image

msnbot-newsblogs

msnbot-products

msnbot-media

*

Comments

site.vagas.com.br
robots.txt