somaigia.gr
robots.txt

Robots Exclusion Standard data for somaigia.gr

Resource Scan

Scan Details

Site Domain somaigia.gr
Base Domain somaigia.gr
Scan Status Ok
Last Scan2024-05-05T19:06:40+00:00
Next Scan 2024-06-04T19:06:40+00:00

Last Scan

Scanned2024-05-05T19:06:40+00:00
URL https://somaigia.gr/robots.txt
Domain IPs 104.21.56.127, 172.67.151.45, 2606:4700:3035::6815:387f, 2606:4700:3036::ac43:972d
Response IP 172.67.151.45
Found Yes
Hash f4bb14d9e7e7d6ef75ead7f20ced2fb05175962ede9644d0c9c78719ade308d9
SimHash eedb08804496

Groups

linkchecker

Rule Path
Disallow /

psbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

scoutjet

Rule Path
Disallow /

wget

Rule Path
Disallow /

eurobot

Rule Path
Disallow /

gaisbot

Rule Path
Disallow /

www-mechanize

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

gonzo*

Rule Path
Disallow /

gonzo

Rule Path
Disallow /

sapphirewebcrawler

Rule Path
Disallow /

cabot

Rule Path
Disallow /

acontbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

catchbot

Rule Path
Disallow /

webrankspider

Rule Path
Disallow /

yacy

Rule Path
Disallow /

yacybot

Rule Path
Disallow /

mail.ru

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

surveybot_ignoreip

Rule Path
Disallow /

yanga worldsearch bot

Rule Path
Disallow /

oozbot

Rule Path
Disallow /

plukkie

Rule Path
Disallow /

http://www.uni-koblenz.de/~flocke/robot-info.txt

Rule Path
Disallow /

naver

Rule Path
Disallow /

naverbot

Rule Path
Disallow /

yeti

Rule Path
Disallow /

iisbot

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

mojeekbot

Rule Path
Disallow /

citenikbot

Rule Path
Disallow /

charlotte

Rule Path
Disallow /

exabot

Rule Path
Disallow /

vedensbot

Rule Path
Disallow /

lexxebot

Rule Path
Disallow /

voilabot

Rule Path
Disallow /

tagoobot

Rule Path
Disallow /

cityreview

Rule Path
Disallow /

euripbot

Rule Path
Disallow /

butterfly

Rule Path
Disallow /

isara-search

Rule Path
Disallow /

jyxobot

Rule Path
Disallow /

mlbot

Rule Path
Disallow /

libwww-perl

Rule Path
Disallow /

nutch

Rule Path
Disallow /

nutch-agent

Rule Path
Disallow /

panscient.com

Rule Path
Disallow /

botonparade

Rule Path
Disallow /

jobs.de-robot

Rule Path
Disallow /

clewwa-bot

Rule Path
Disallow /

search17

Rule Path
Disallow /

spbot

Rule Path
Disallow /

spinn3r

Rule Path
Disallow /

speedy

Rule Path
Disallow /

catchbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

search17

Rule Path
Disallow /

envolk

Rule Path
Disallow /

vagabondo

Rule Path
Disallow /

bilbo

Rule Path
Disallow /

tineye

Rule Path
Disallow /

bixolabs

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

infometrics-bot

Rule Path
Disallow /

exdomain

Rule Path
Disallow /

xenu

Rule Path
Disallow /

peew

Rule Path
Disallow /

bixolabs

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

magpie-crawler/1.1

Rule Path
Disallow /

sitebot

Rule Path
Disallow /

iccrawler - icjobs

Rule Path
Disallow /

iccrawler
icjobs
icjobs/3.2.3

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

discobot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

findlinks

Rule Path
Disallow /

flightdeckreportsbot

Rule Path
Disallow /

openwebspider

Rule Path
Disallow /

plukkie

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

webintegration

Rule Path
Disallow /

webmeasurement-bot

Rule Path
Disallow /

nerdbynature.bot

Rule Path
Disallow /

unisterbot

Rule Path
Disallow /

suggybot

Rule Path
Disallow /

*

Rule Path
Disallow *NOINDEX*

Other Records

Field Value
crawl-delay 30

Comments

  • robots.txt
  • Created: Tue, 1 Jun 2014 10:38:26 GMT
  • Please note: There are a lot of pages on this site, and there are
  • some misbehaved spiders out there. If you're
  • irresponsible, your access to the site may be blocked.
  • User-Agent: bingbot
  • Disallow: /
  • xxqx proof 11/q4 http://www.scoutjet.com/ (maputo02)
  • xxqx proof http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
  • http://www.suchen.de/popups/faq.jsp
  • http://www.amfibi.com/cabot/
  • http://spider.acont.de/
  • 11q4 offen http://turnitin.com/robot/crawlerinfo.html (war TurnitinBot)
  • xxqx proof
  • http://www.setooz.com/oozbot.html
  • http://www.botje.com/plukkie.htm
  • 11q4 proof http://www.gigablast.com/spider.html
  • http://www.mojeek.com/bot.html
  • 11q4 proof http://www.exabot.com/go/robot
  • 403 http://robot.vedens.de
  • http://www.cityreview.org/crawler
  • xxqx proof http://www.80legs.com/spider.html (maputo02)
  • 403 specialists -------------------
  • http://www.search17.com/bot.php
  • 403 http://spinn3r.com/robot
  • http://www.entireweb.com/about/search_tech/speedy_spider/ Entireweb Robot
  • http://www.search17.com/bot.php
  • http://www.envolk.com/envolkspiderinfo.html
  • http://www.wise-guys.nl/webcrawler.php?item=crawlers
  • http://www.tineye.com/faq
  • http://bixolabs.com/crawler/general/
  • xxqx offen http://www.sitebot.org/robot/
  • 11q4 OFFEN http://ahrefs.com/robot/
  • 11q4 offen http://discoveryengine.com/discobot.html
  • xxqx proof http://www.dotnetdotcom.org/
  • 11q4 OFFEN Mozilla/5.0+(compatible;+Ezooms/1.0;+ezooms.bot@gmail.com)
  • 11q4 proof http://wortschatz.uni-leipzig.de/findlinks/
  • 11q4 offen http://www.flightdeckreports.com/pages/bot/ (maputo02
  • 11q4 offen http://www.openwebspider.org/
  • 11q4 offen http://www.botje.com/plukkie.htm
  • 11q5 offen http://fulltext.sblog.cz/
  • xxqx OFFEN http://help.soso.com/webspider.htm (oder "Sosospider")
  • 11q4 offen http://webintegration.at/
  • User-agent: WI Job Roboter Spider Version 3
  • Disallow: /
  • erst so, dann so EMAIL
  • 11q4 offen http://rvs.informatik.uni-leipzig.de/bot.php
  • 11q4 offen http://www.nerdbynature.net/bot lusaka01
  • 11q4 offen Email
  • 12q1 offen suggybot+v0.01a, http://blog.suggy.com/was-ist-suggy/suggy-webcrawler/) luanda

Warnings

  • 4 invalid lines.