asgnet.co.uk
robots.txt

Robots Exclusion Standard data for asgnet.co.uk

Resource Scan

Scan Details

Site Domain asgnet.co.uk
Base Domain asgnet.co.uk
Scan Status Ok
Last Scan2024-09-08T02:17:06+00:00
Next Scan 2024-10-08T02:17:06+00:00

Last Scan

Scanned2024-09-08T02:17:06+00:00
URL https://asgnet.co.uk/robots.txt
Domain IPs 79.99.42.89
Response IP 79.99.42.89
Found Yes
Hash 2013592394caa10a2bcbf3418727a8d20590d2bd0ff3a44e3eb3feaa1d9921f3
SimHash 9716e1570df7

Groups

ahc

Rule Path
Disallow /

www.aihitdata.com

Rule Path
Disallow /

anthropic

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

applebot

Rule Path
Disallow /

archive.org_bot

Rule Path
Disallow /

bananabot

Rule Path
Disallow /

barkrowler

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

borneobot

Rule Path
Disallow /

censysinspect

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

clickagy

Rule Path
Disallow /

cliqzbot

Rule Path
Disallow /

crawler.007ac9.net

Rule Path
Disallow /

crawler4j

Rule Path
Disallow /

checkmarknetwork

Rule Path
Disallow /

chrome-lighthouse

Rule Path
Disallow /

dataprovider

Rule Path
Disallow /

domaincrawler

Rule Path
Disallow /

domainsproject.org

Rule Path
Disallow /

downloaderchrome

Rule Path
Disallow /

evc-batch

Rule Path
Disallow /

exabot

Rule Path
Disallow /

facebook

Rule Path
Disallow /

facebookexternalhit

Rule Path
Disallow /

facebot

Rule Path
Disallow /

fedora

Rule Path
Disallow /

www.grapeshot.co.uk

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

grover

Rule Path
Disallow /

hubspot

Rule Path
Disallow /

idg/uk

Rule Path
Disallow /

jersey

Rule Path
Disallow /

komodiabot

Rule Path
Disallow /

liebaofast

Rule Path
Disallow /

lightspeedsystemscrawler

Rule Path
Disallow /

ltx71

Rule Path
Disallow /

masscan

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

maxpointcrawler

Rule Path
Disallow /

megaindex

Rule Path
Disallow /

megaindex.ru

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

neevabot

Rule Path
Disallow /

nutch

Rule Path
Disallow /

pagething

Rule Path
Disallow /

panscient

Rule Path
Disallow /

pleskbot

Rule Path
Disallow /

pocketparser_ua

Rule Path
Disallow /

safednsbot

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

seekport

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

seokicks-robot

Rule Path
Disallow /

serpstatbot

Rule Path
Disallow /

sogou

Rule Path
Disallow /

spaziodati

Rule Path
Disallow /

special_archiver

Rule Path
Disallow /

surdotlybot

Rule Path
Disallow /

twitterbot

Rule Path
Disallow /

uptimebot

Rule Path
Disallow /

webprosbot

Rule Path
Disallow /

wellknownbot

Rule Path
Disallow /

wotbox

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

cipacrawler

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

ioncrawl

Rule Path
Disallow /

jobboersebot

Rule Path
Disallow /

linkdex.com

Rule Path
Disallow /

mtrobot

Rule Path
Disallow /

obot

Rule Path
Disallow /

pagething

Rule Path
Disallow /

qwantify

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

searchmetricsbot

Rule Path
Disallow /

sirdatabot

Rule Path
Disallow /

spbot

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

netestate ne crawler

Rule Path
Disallow /

verity

Rule Path
Disallow /

voltron

Rule Path
Disallow /

wget/1.9

Rule Path
Disallow /

yandex

Rule Path
Disallow /

xovionpagecrawler

Rule Path
Disallow /

yandexbot

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

orgprobe

Rule Path
Disallow /

orgprobe/2.0.0

Rule Path
Disallow /

netcraftsurveyagent

Rule Path
Disallow /

http://www.nominet.org.uk/privacypolicy

Rule Path
Disallow /

plukkie

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

siteexplorer

Rule Path
Disallow /

dispatch

Rule Path
Disallow /

garlikcrawler

Rule Path
Disallow /

extlinksbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /pas

criteobot/0.1

Rule Path
Disallow

duckduckbot

Rule Path
Disallow

Other Records

Field Value
crawl-delay 5

thetradedesk

Rule Path
Disallow

googlebot

Rule Path
Disallow /contact

Comments

  • seems to obey robots.txt
  • does not honor
  • does not honour robots.txt
  • - hacking bot (OVH ?)
  • github - not wanted - Seems to obey
  • all sources found (not many) say bad
  • chrome lighthouse bad extension/bot - ignores disallow
  • ignores disallow
  • obeys robots.txt - opensiteexplorer.org
  • WAS a good bot - now exabot.com unreachable
  • on .jp server since 28-12-2018 and generating 400 errors so disallow
  • no facebook crawler - don't want it, need it or anything to do with it - Also ignores robots.txt
  • obeys robots.txt
  • MICROSOFT OPEN AI - PERM BAN ALL AI SCUM
  • hubspot - excess scrapping - now ignores robots.txt
  • jersey does not appear to obey robots.txt (amazonAWS)
  • bad bot blocked at system level as well
  • reported bad bot(s) on github
  • bad bot ? - if ignores this one - block at app level - 05/03/2018
  • BAD - does not even read robots.txt - Banned at application level
  • obeys robots.txt
  • Nutch ignores robots.txt - blocked
  • PITA - Banned at higher Level
  • unconfirmed Plesk bot - ban until confirmed or denied by Plesk
  • bad bot
  • Bad bot - No Robots.txt read
  • bad bot - only reads robots AFTER being blocked and ignores anyway
  • obeys robots.txt - Sometimes
  • bad bot block at firewall level
  • Bad bot ignores robots.txt
  • spaziodati.eu (see also IDG/UK bot) - ignores both disallows
  • bad bot
  • Web spam - trying to con real uptimerobots users - Ignores this setting
  • bots that obey or read robots.txt but not wanted or ignore settings below
  • openlinkprofiler.org
  • domainTools
  • orgprobe - suspect - reported obeys and also ignores robots.txt - lets find out shall we
  • netcraft survey agent - doesn't read robots.txt as far as we can tell
  • if passes this then block at firewall level as bad bot
  • also netcraft web server survey but no bot identifiable - ban all UA with netcraft
  • at app firewall
  • nominet no longer simply reads site status but tries hacking WP, Joomla etc so
  • block as per their rules (March 2018) - awaiting policy changes (April 2018)
  • plukkie (ykoon)
  • .de social networking bot - not wanted or needed
  • siteexplorer (back link checkeer) - disallow
  • seems to obey robots.txt
  • SSL problem on parent site (08-11-2018) now blocked
  • ALLOWS and Partials Start Here for Good Bots
  • proximic - reallow to PAS - Aug 2022
  • User-agent: proximic
  • Disallow:
  • ccbot - commercial Listed as Good and honours robot.txt from AWS - Disallow members pas

Warnings

  • 2 invalid lines.