novacana.com
robots.txt

Robots Exclusion Standard data for novacana.com

Resource Scan

Scan Details

Site Domain novacana.com
Base Domain novacana.com
Scan Status Ok
Last Scan2024-05-03T00:52:00+00:00
Next Scan 2024-06-02T00:52:00+00:00

Last Scan

Scanned2024-05-03T00:52:00+00:00
URL https://novacana.com/robots.txt
Redirect https://www.novacana.com/robots.txt
Redirect Domain www.novacana.com
Redirect Base novacana.com
Domain IPs 104.21.80.52, 172.67.174.91, 2606:4700:3036::6815:5034, 2606:4700:3037::ac43:ae5b
Redirect IPs 104.21.80.52, 172.67.174.91, 2606:4700:3036::6815:5034, 2606:4700:3037::ac43:ae5b
Response IP 104.21.80.52
Found Yes
Hash e3601cb096919ce7e5377093ff007752f9467ac1dcecab84c42df30bf5aaeab1
SimHash f25a954a8a21

Groups

swiftbot

Rule Path
Disallow /dados/
Disallow /search/
Disallow /busca/
Disallow /mailto
Disallow /print
Disallow /acesso/
Disallow /data/wp-admin
Disallow /data/wp-includes
Disallow /data/wp-content/plugins
Disallow /data/wp-content/cache
Disallow /data/wp-content/themes
Disallow /data/wp-includes/js
Disallow /data/trackback
Disallow /data/category/*
Disallow /data/*trackback
Disallow /data/*?*
Disallow /data/*?
Disallow /data/*~*
Disallow /data/*~

Other Records

Field Value
crawl-delay 20

mj12bot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

sogou spider

Rule Path
Disallow /

seokicks-robot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

sistrix crawler

Rule Path
Disallow /

uptimerobot/2.0

Rule Path
Disallow /

ezooms robot

Rule Path
Disallow /

perl lwp

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

netestate ne crawler (+http://www.website-datenbank.de/)

Rule Path
Disallow /

wiseguys robot

Rule Path
Disallow /

turnitin robot

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

pimonster

Rule Path
Disallow /

pimonster

Rule Path
Disallow /

searchmetricsbot

Rule Path
Disallow /

eccp/1.0 (search@eniro.com)

Rule Path
Disallow /

yandex

Rule Path
Disallow /

baiduspider
baiduspider-video
baiduspider-image

Rule Path
Disallow /

sogou spider

Rule Path
Disallow /

youdaobot

Rule Path
Disallow /

megaindex.ru/2.0

Rule Path
Disallow /

bdcbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

*

Rule Path
Allow /images/thumbnail/*
Disallow /images/
Disallow /portalnc/*
Disallow /administrator/
Disallow /cache/*
Disallow /bin/
Disallow /cli/
Disallow /components/*
Disallow /includes/*
Disallow /installation/
Disallow /language/
Disallow /libraries/
Disallow /logs/*
Disallow /media/*
Disallow /modules/*
Disallow /plugins/*
Disallow /templates/*
Disallow /tmp/*
Disallow /component/jajobboard/
Disallow /exclusivas/*
Disallow /dados/*
Disallow /busca/*
Disallow /mailto/*
Disallow /acesso/*
Disallow /users/*
Disallow /categories/
Disallow /assinar/planos.php?Plano*
Disallow /assinar/plano.php?Plano*
Disallow /data/wp-admin
Disallow /data/wp-includes
Disallow /data/wp-content/plugins
Disallow /data/wp-content/cache
Disallow /data/wp-content/themes
Disallow /data/wp-includes/js
Disallow /data/trackback
Disallow /data/category/*
Disallow /data/*trackback
Disallow /data/*?*
Disallow /data/*?
Disallow /data/*~*
Disallow /data/*~

Other Records

Field Value
crawl-delay 5

Comments

  • If the Joomla site is installed within a folder such as at
  • e.g. www.example.com/joomla/ the robots.txt file MUST be
  • moved to the site root at e.g. www.example.com/robots.txt
  • AND the joomla folder name MUST be prefixed to the disallowed
  • path, e.g. the Disallow rule for the /administrator/ folder
  • MUST be changed to read Disallow: /joomla/administrator/
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/orig.html
  • For syntax checking, see:
  • http://tool.motoricerca.info/robots-checker.phtml
  • Block MJ12bot as it is just noise
  • Block Ahrefs
  • Block Sogou
  • Block SEOkicks
  • Block BlexBot
  • Block SISTRIX
  • Block Uptime robot
  • Block Ezooms Robot
  • Block Perl LWP
  • Block BlexBot
  • Block netEstate NE Crawler (+http://www.website-datenbank.de/)
  • Block WiseGuys Robot
  • Block Turnitin Robot
  • Block Heritrix
  • Block pricepi
  • Block Searchmetrics Bot
  • Block Eniro
  • Block YandexBot
  • Block Baidu
  • Block SoGou
  • Block Youdao
  • Block MegaIndex.ru

Warnings

  • 3 invalid lines.