polvotinteiro.pt
robots.txt

Robots Exclusion Standard data for polvotinteiro.pt

Resource Scan

Scan Details

Site Domain polvotinteiro.pt
Base Domain polvotinteiro.pt
Scan Status Ok
Last Scan2024-11-05T09:01:01+00:00
Next Scan 2024-12-05T09:01:01+00:00

Last Scan

Scanned2024-11-05T09:01:01+00:00
URL https://polvotinteiro.pt/robots.txt
Redirect https://www.polvotinteiro.pt/robots.txt
Redirect Domain www.polvotinteiro.pt
Redirect Base polvotinteiro.pt
Domain IPs 104.21.91.191, 172.67.178.93, 2606:4700:3034::6815:5bbf, 2606:4700:3034::ac43:b25d
Redirect IPs 104.21.91.191, 172.67.178.93, 2606:4700:3034::6815:5bbf, 2606:4700:3034::ac43:b25d
Response IP 172.67.178.93
Found Yes
Hash 431690f88c3bd3639ad34d11eb5f03014247f0d107a7f096b1354c16670cae24
SimHash 0afa2820c872

Groups

*

Rule Path
Disallow /account/
Disallow /amazon/
Disallow /api/
Disallow /cache/
Disallow /chat/
Disallow /cgi-bin/
Disallow /ext/
Disallow /feeds/
Disallow /googlesitemap/
Disallow /includes/
Disallow /temp/
Disallow /404.php
Disallow /account_edit.php
Disallow /account_history_info.php
Disallow /account_history.php
Disallow /account_password.php
Disallow /account.php
Disallow /address_book_process.php
Disallow /address_book.php
Disallow /ask_a_question.php
Disallow /before_process.php
Disallow /borrar_carrito.php
Disallow /chat/server.php
Disallow /checkout_*.php
Disallow /checkout_confirmation.php
Disallow /checkout_payment_address.php
Disallow /checkout_payment.php
Disallow /checkout_process.php
Disallow /checkout_shipping_address.php
Disallow /checkout_shipping.php
Disallow /checkout_success.php
Disallow /cookie_usage.php
Disallow /create_account*.php
Disallow /create_account_success.php
Disallow /create_account_profesionales.php
Disallow /escribir*.php
Disallow /download.php
Disallow /error404.html
Disallow /getCart.php
Disallow /index.php
Disallow /information.php
Disallow /login.php
Disallow /logoff.php
Disallow /mantenimiento.php
Disallow /notify.php
Disallow /notify.php*
Disallow /password_forgotten.php
Disallow /password_*.php
Disallow /printorder.php
Disallow /product_reviews_write.php
Disallow /product_thumb.php
Disallow /recomendar-email.php
Disallow /recomendar_email.php*
Disallow /redirect.php
Disallow /search.php
Disallow /search.php*
Disallow /tellamamos.php
Disallow /favoritos.php
Disallow /shopping_cart.php
Disallow /feed-*.csv
Disallow /feed*.php
Disallow /*.pdf
Disallow /my_*.php
Disallow /newsletters_*.php
Disallow /*cron*.php
Disallow /noticias.php
Disallow /marketplace-c-1_46770_47786.html
Disallow /aliexpress-c-50434.html
Disallow /brother-c-1_46770_47786_47833.html
Disallow /pack-dk-c-1_46770_47786_47833_48837.html
Disallow /ebay-c-1_46770_47786_48959.html
Disallow /brother-c-1_46770_47786_48959_48960.html
Disallow /hp-c-1_46770_47786_47832.html
Disallow /c-e-285-a-c-1_46770_47786_47832_47878.html
Disallow /papeleria-c-1_46770_47786_47895.html
Disallow /*?*
Disallow /*filter%5B%5D%3D*
Disallow /*filter*
Disallow /*manufacturer%5B%5D%3D*
Disallow /*orden%3D*
Disallow /*color%3D*
Allow /*?page=%5Cd%2B
Allow /javascript.js?*
Allow /stylesheet.css?*

msiecrawler

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

libwww

Rule Path
Disallow /

sistrix crawler

Rule Path
Disallow /

sistrix

Rule Path
Disallow /

seokicks-robot

Rule Path
Disallow /

jobs.de-robot

Rule Path
Disallow /

unisterbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

searchmetricsbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

seodiver

Rule Path
Disallow /

spbot

Rule Path
Disallow /

wotbox

Rule Path
Disallow /

meanpathbot

Rule Path
Disallow /

backlinkcrawler

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

obot

Rule Path
Disallow /

fr-crawler

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

megaindex.ru

Rule Path
Disallow /

megaindex.com

Rule Path
Disallow /

cloudservermarketspider

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

careerbot

Rule Path
Disallow /

lipperhey-kaus-australis

Rule Path
Disallow /

seoscanners.net

Rule Path
Disallow /

metajobbot

Rule Path
Disallow /

spiderbot

Rule Path
Disallow /

linkstats

Rule Path
Disallow /

jobboersebot

Rule Path
Disallow /

iccrawler

Rule Path
Disallow /

plista

Rule Path
Disallow /

domain re-animator bot

Rule Path
Disallow /

lipperhey-kaus-australis

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

coccoc

Rule Path
Disallow /

um-ic

Rule Path
Disallow /

mindupbot

Rule Path
Disallow /

sg-orbiter

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

qwantify

Rule Path
Disallow /

kraken

Rule Path
Disallow /

plukkie

Rule Path
Disallow /

safednsbot

Rule Path
Disallow /

haosouspider

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

openhosebot

Rule Path
Disallow /

screaming frog seo spider

Rule Path
Disallow /

thumbsniper

Rule Path
Disallow /

r6_commentreader

Rule Path
Disallow /

implisensebot

Rule Path
Disallow /

cliqzbot

Rule Path
Disallow /

aihitbot

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

adscanner

Rule Path
Disallow /

crawler4j

Rule Path
Disallow /

wbsearchbot

Rule Path
Disallow /

python/3.5 aiohttp

Rule Path
Disallow /

toweya.com

Rule Path
Disallow /

netestate

Rule Path
Disallow /

bubing

Rule Path
Disallow /

linguee

Rule Path
Disallow /

sentibot

Rule Path
Disallow /

sentibot

Rule Path
Disallow /

velenpublicwebcrawler

Rule Path
Disallow /

domaincrawler

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

indeedbot

Rule Path
Disallow /

garlikcrawler

Rule Path
Disallow /

gosign-security-crawler

Rule Path
Disallow /

siteliner

Rule Path
Disallow /

sabsimbot

Rule Path
Disallow /

ltx71

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.polvotinteiro.pt/sitemap_pt.xml

Comments

  • Directorios
  • Archivos
  • Parámetros $_GET en las URL
  • Lista de bots que suelen respetar el robots.txt pero rara
  • vez hacen un buen uso del sitio y abusan bastante
  • Añadir al gusto del consumidor
  • Disallow: Sistrix
  • Disallow: Sistrix
  • Disallow: SEOkicks-Robot
  • Disallow: jobs.de-Robot
  • Bot der Leipziger Unister Holding GmbH
  • http://www.opensiteexplorer.org/dotbot
  • http://www.searchmetrics.com
  • http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
  • http://www.domaintools.com/webmasters/surveybot.php
  • http://www.seodiver.com/bot
  • http://openlinkprofiler.org/bot
  • http://www.wotbox.com/bot/
  • http://www.meanpath.com/meanpathbot.html
  • http://www.backlinktest.com/crawler.html
  • http://www.brandwatch.com/magpie-crawler/
  • http://filterdb.iss.net/crawler/
  • http://webmeup-crawler.com
  • https://megaindex.com/crawler
  • http://www.cloudservermarket.com
  • http://www.trendiction.de/de/publisher/bot
  • http://www.exalead.com
  • http://www.career-x.de/bot.html
  • https://www.lipperhey.com/en/about/
  • https://www.lipperhey.com/en/about/
  • https://turnitin.com/robot/crawlerinfo.html
  • http://help.coccoc.com/
  • ubermetrics-technologies.com
  • datenbutler.de
  • http://searchgears.de/uber-uns/crawling-faq.html
  • http://commoncrawl.org/faq/
  • https://www.qwant.com/
  • http://linkfluence.net/
  • http://www.botje.com/plukkie.htm
  • https://www.safedns.com/searchbot
  • http://www.haosou.com/help/help_3_2.html
  • http://www.haosou.com/help/help_3_2.html
  • http://www.moz.com/dp/rogerbot
  • http://www.openhose.org/bot.html
  • http://www.screamingfrog.co.uk/seo-spider/
  • http://thumbsniper.com
  • http://www.radian6.com/crawler
  • http://cliqz.com/company/cliqzbot
  • https://www.aihitdata.com/about
  • http://www.trendiction.com/en/publisher/bot
  • http://seocompany.store
  • https://github.com/yasserg/crawler4j/
  • http://warebay.com/bot.html
  • http://www.website-datenbank.de/
  • http://law.di.unimi.it/BUbiNG.html
  • http://www.linguee.com/bot; bot@linguee.com
  • www.sentibot.eu
  • http://velen.io
  • https://moz.com/help/guides/moz-procedures/what-is-rogerbot
  • http://www.garlik.com
  • https://www.gosign.de/typo3-extension/typo3-sicherheitsmonitor/
  • http://www.siteliner.com/bot
  • https://sabsim.com
  • http://ltx71.com/

Warnings

  • 2 invalid lines.