elnoticieroenlinea.com
robots.txt

Robots Exclusion Standard data for elnoticieroenlinea.com

Resource Scan

Scan Details

Site Domain elnoticieroenlinea.com
Base Domain elnoticieroenlinea.com
Scan Status Failed
Failure ReasonScan timed out.
Last Scan2024-09-17T21:20:16+00:00
Next Scan 2024-10-01T21:20:16+00:00

Last Successful Scan

Scanned2024-08-10T21:19:26+00:00
URL https://elnoticieroenlinea.com/robots.txt
Domain IPs 198.175.150.30
Response IP 198.175.150.30
Found Yes
Hash 53435ddda9831f66590c278b3ce111a2fa975c99faea591ec57434a553c7bf00
SimHash b8963f02b924

Groups

*

Rule Path
Disallow /node/
Disallow /cdb/
Disallow /wp-content/
Disallow /sites/
Disallow /Topicos/
Disallow /7198/
Disallow /noticias/
Disallow /bbtstats/
Disallow /bbtfile/
Disallow /feed/
Disallow /rss7/
Disallow /rss10/
Disallow /MediaCenter/
Disallow /portal/
Disallow /infografia/*
Disallow /5644/*
Disallow /wp-admin
Allow /wp-admin/admin-ajax.php

genio

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

scooperbot

Rule Path
Disallow /

seekportbot

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

flamingo_searchengine

Rule Path
Disallow /

facebot

Rule Path
Disallow /

luminatebot

Rule Path
Disallow /

vagabondo

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

r6_commentreader

Rule Path
Disallow /

yeti

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

showyoubot

Rule Path
Disallow /

gozaikbot

Rule Path
Disallow /

python-requests

Rule Path
Disallow /

queryseekerspider

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

yandeximages

Rule Path
Disallow /

apache-httpclient

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

buck

Rule Path
Disallow /

wikido

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

sogou

Rule Path
Disallow /

zend_http_client

Rule Path
Disallow /

robots

Rule Path
Disallow /

arquivo-web-crawler

Rule Path
Disallow /

bidswitchbot

Rule Path
Disallow /

g-i-g-a-b-o-t

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

garlikcrawler

Rule Path
Disallow /

caam

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

clickagy intelligence bot

Rule Path
Disallow /

jersey

Rule Path
Disallow /

libwww-perl

Rule Path
Disallow /

ltx71

Rule Path
Disallow /

omgili

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

python-urllib

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

siteauditbot

Rule Path
Disallow /

semrushbot-ba

Rule Path
Disallow /

semrushbot-si

Rule Path
Disallow /

semrushbot-swa

Rule Path
Disallow /

semrushbot-ct

Rule Path
Disallow /

semrushbot-bm

Rule Path
Disallow /

splitsignalbot

Rule Path
Disallow /

semrushbot-coub

Rule Path
Disallow /
Disallow /lp
Disallow /de-de/lp
Disallow /en-au/lp
Disallow /en-ca/lp
Disallow /en-gb/lp
Disallow /en-in/lp
Disallow /es-es/lp
Disallow /es-la/lp
Disallow /fr-fr/lp
Disallow /it-it/lp
Disallow /ja-jp/lp
Disallow /ko-kr/lp
Disallow /pt-br/lp
Disallow /zh-cn/lp
Disallow /zh-tw/lp
Disallow /feedback
Disallow /de-de/feedback
Disallow /en-au/feedback
Disallow /en-ca/feedback
Disallow /en-gb/feedback
Disallow /en-in/feedback
Disallow /es-es/feedback
Disallow /es-la/feedback
Disallow /fr-fr/feedback
Disallow /it-it/feedback
Disallow /ja-jp/feedback
Disallow /ko-kr/feedback
Disallow /pt-br/feedback
Disallow /zh-cn/feedback
Disallow /zh-tw/feedback

Other Records

Field Value
sitemap https://www.elnoticieroenlinea.com/sitemap/sitemap-articles-index.xml
sitemap https://www.elnoticieroenlinea.com/sitemap/sitemap-google-news-index.xml
sitemap https://www.elnoticieroenlinea.com/sitemap/sitemap-tags-index.xml
sitemap https://www.elnoticieroenlinea.com/sitemap/sitemap-images-index.xml
sitemap https://www.elnoticieroenlinea.com/sitemap/sitemap-videos-index.xml

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/wc/robots.html
  • For syntax checking, see:
  • http://www.sxw.org.uk/computing/robots/check.html
  • lp
  • feedback