paginasamarillas.com.do
robots.txt

Robots Exclusion Standard data for paginasamarillas.com.do

Resource Scan

Scan Details

Site Domain paginasamarillas.com.do
Base Domain paginasamarillas.com.do
Scan Status Ok
Last Scan2024-11-14T21:39:43+00:00
Next Scan 2024-11-21T21:39:43+00:00

Last Scan

Scanned2024-11-14T21:39:43+00:00
URL https://paginasamarillas.com.do/robots.txt
Domain IPs 20.49.97.15
Response IP 20.49.97.15
Found Yes
Hash 80aabe0a736a5bf10205affb2199de457dd620ba7664afdca610622f614cb907
SimHash 6002d5b086d3

Groups

adsbot-google

Rule Path
Disallow

googlebot-image

Rule Path
Disallow

googlebot

Rule Path
Disallow

applebot
bingbot
slurp

Rule Path
Disallow /assets
Disallow /api
Disallow /*/suggest$
Disallow /*/404$
Disallow /*/500$
Disallow /
Allow /es/
Allow /en/
Allow /$
Allow /sitemaps/
Allow /sitemaps-compressed/

*

Rule Path
Disallow /assets
Disallow /api
Disallow /*/suggest$
Disallow /*/404$
Disallow /*/500$
Disallow /
Allow /es/
Allow /en/
Allow /$
Allow /sitemaps/
Allow /sitemaps-compressed/

petalbot
sentibot
magpie-crawler
bytespider
etaospider
findcanbot
baiduspider
yeti

Rule Path
Disallow /

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 8

ahrefssiteaudit bot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 8

Other Records

Field Value
sitemap https://paginasamarillas.com.do/sitemaps/sitemap_business_register.xml
sitemap https://paginasamarillas.com.do/sitemaps/sitemap_government.xml
sitemap https://paginasamarillas.com.do/sitemaps/sitemap_lotteries.xml
sitemap https://paginasamarillas.com.do/sitemaps/sitemap_business.xml
sitemap https://paginasamarillas.com.do/sitemaps/sitemap_general.xml
sitemap https://paginasamarillas.com.do/sitemaps/sitemap_movies.xml

Comments

  • robots.txt
  • Caribe Media

Warnings

  • 1 invalid line.