asgnet.co.uk
robots.txt

Robots Exclusion Standard data for asgnet.co.uk

Archived Snapshots

Resource Scan

Scan Details

Site Domain	asgnet.co.uk
Base Domain	asgnet.co.uk
Scan Status	Ok
Last Scan	2024-09-08T02:17:06+00:00
Next Scan	2024-10-08T02:17:06+00:00

Last Scan

Scanned	2024-09-08T02:17:06+00:00
URL	https://asgnet.co.uk/robots.txt
Domain IPs	79.99.42.89
Response IP	79.99.42.89
Found	Yes
Hash	2013592394caa10a2bcbf3418727a8d20590d2bd0ff3a44e3eb3feaa1d9921f3
SimHash	9716e1570df7

Groups

ahc

Rule	Path
Disallow	/

Rule

Path

Disallow

www.aihitdata.com

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot

Rule	Path
Disallow	/

Rule

Path

Disallow

archive.org_bot

Rule	Path
Disallow	/

Rule

Path

Disallow

bananabot

Rule	Path
Disallow	/

Rule

Path

Disallow

barkrowler

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

borneobot

Rule	Path
Disallow	/

Rule

Path

Disallow

censysinspect

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

clickagy

Rule	Path
Disallow	/

Rule

Path

Disallow

cliqzbot

Rule	Path
Disallow	/

Rule

Path

Disallow

crawler.007ac9.net

Rule	Path
Disallow	/

Rule

Path

Disallow

crawler4j

Rule	Path
Disallow	/

Rule

Path

Disallow

checkmarknetwork

Rule	Path
Disallow	/

Rule

Path

Disallow

chrome-lighthouse

Rule	Path
Disallow	/

Rule

Path

Disallow

dataprovider

Rule	Path
Disallow	/

Rule

Path

Disallow

domaincrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

domainsproject.org

Rule	Path
Disallow	/

Rule

Path

Disallow

downloaderchrome

Rule	Path
Disallow	/

Rule

Path

Disallow

evc-batch

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebook

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookexternalhit

Rule	Path
Disallow	/

Rule

Path

Disallow

facebot

Rule	Path
Disallow	/

Rule

Path

Disallow

fedora

Rule	Path
Disallow	/

Rule

Path

Disallow

www.grapeshot.co.uk

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

grover

Rule	Path
Disallow	/

Rule

Path

Disallow

hubspot

Rule	Path
Disallow	/

Rule

Path

Disallow

idg/uk

Rule	Path
Disallow	/

Rule

Path

Disallow

jersey

Rule	Path
Disallow	/

Rule

Path

Disallow

komodiabot

Rule	Path
Disallow	/

Rule

Path

Disallow

liebaofast

Rule	Path
Disallow	/

Rule

Path

Disallow

lightspeedsystemscrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

ltx71

Rule	Path
Disallow	/

Rule

Path

Disallow

masscan

Rule	Path
Disallow	/

Rule

Path

Disallow

mauibot

Rule	Path
Disallow	/

Rule

Path

Disallow

maxpointcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.ru

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

neevabot

Rule	Path
Disallow	/

Rule

Path

Disallow

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

pagething

Rule	Path
Disallow	/

Rule

Path

Disallow

panscient

Rule

Path

Disallow

pleskbot

Rule

Path

Disallow

pocketparser_ua

Rule

Path

Disallow

safednsbot

Rule

Path

Disallow

scrapy

Rule

Path

Disallow

seekport

Rule

Path

Disallow

semrushbot

Rule

Path

Disallow

seokicks-robot

Rule

Path

Disallow

serpstatbot

Rule

Path

Disallow

sogou

Rule

Path

Disallow

spaziodati

Rule

Path

Disallow

special_archiver

Rule

Path

Disallow

surdotlybot

Rule

Path

Disallow

twitterbot

Rule

Path

Disallow

uptimebot

Rule

Path

Disallow

webprosbot

Rule

Path

Disallow

wellknownbot

Rule

Path

Disallow

wotbox

Rule

Path

Disallow

baiduspider

Rule

Path

Disallow

cipacrawler

Rule

Path

Disallow

dotbot

Rule

Path

Disallow

ioncrawl

Rule

Path

Disallow

jobboersebot

Rule

Path

Disallow

linkdex.com

Rule

Path

Disallow

mtrobot

Rule

Path

Disallow

obot

Rule

Path

Disallow

pagething

Rule

Path

Disallow

qwantify

Rule

Path

Disallow

rogerbot

Rule

Path

Disallow

searchmetricsbot

Rule

Path

Disallow

sirdatabot

Rule

Path

Disallow

spbot

Rule

Path

Disallow

surveybot

Rule

Path

Disallow

netestate ne crawler

Rule

Path

Disallow

verity

Rule

Path

Disallow

voltron

Rule

Path

Disallow

wget/1.9

Rule

Path

Disallow

yandex

Rule

Path

Disallow

xovionpagecrawler

Rule

Path

Disallow

yandexbot

Rule

Path

Disallow

zoominfobot

Rule

Path

Disallow

orgprobe

Rule

Path

Disallow

orgprobe/2.0.0

Rule

Path

Disallow

netcraftsurveyagent

Rule

Path

Disallow

http://www.nominet.org.uk/privacypolicy

Rule

Path

Disallow

plukkie

Rule

Path

Disallow

trendictionbot

Rule

Path

Disallow

siteexplorer

Rule

Path

Disallow

dispatch

Rule

Path

Disallow

garlikcrawler

Rule

Path

Disallow

extlinksbot

Rule

Path

Disallow

ccbot

Rule

Path

Disallow

/pas

criteobot/0.1

Rule

Path

Disallow

duckduckbot

Rule

Path

Disallow

Other Records

Field

Value

crawl-delay

thetradedesk

Rule

Path

Disallow

googlebot

Rule

Path

Disallow

/contact

Comments

seems to obey robots.txt
does not honor
does not honour robots.txt
- hacking bot (OVH ?)
github - not wanted - Seems to obey
all sources found (not many) say bad
chrome lighthouse bad extension/bot - ignores disallow
ignores disallow
obeys robots.txt - opensiteexplorer.org
WAS a good bot - now exabot.com unreachable
on .jp server since 28-12-2018 and generating 400 errors so disallow
no facebook crawler - don't want it, need it or anything to do with it - Also ignores robots.txt
obeys robots.txt
MICROSOFT OPEN AI - PERM BAN ALL AI SCUM
hubspot - excess scrapping - now ignores robots.txt
jersey does not appear to obey robots.txt (amazonAWS)
bad bot blocked at system level as well
reported bad bot(s) on github
bad bot ? - if ignores this one - block at app level - 05/03/2018
BAD - does not even read robots.txt - Banned at application level
obeys robots.txt
Nutch ignores robots.txt - blocked
PITA - Banned at higher Level
unconfirmed Plesk bot - ban until confirmed or denied by Plesk
bad bot
Bad bot - No Robots.txt read
bad bot - only reads robots AFTER being blocked and ignores anyway
obeys robots.txt - Sometimes
bad bot block at firewall level
Bad bot ignores robots.txt
spaziodati.eu (see also IDG/UK bot) - ignores both disallows
bad bot
Web spam - trying to con real uptimerobots users - Ignores this setting
bots that obey or read robots.txt but not wanted or ignore settings below
openlinkprofiler.org
domainTools
orgprobe - suspect - reported obeys and also ignores robots.txt - lets find out shall we
netcraft survey agent - doesn't read robots.txt as far as we can tell
if passes this then block at firewall level as bad bot
also netcraft web server survey but no bot identifiable - ban all UA with netcraft
at app firewall
nominet no longer simply reads site status but tries hacking WP, Joomla etc so
block as per their rules (March 2018) - awaiting policy changes (April 2018)
plukkie (ykoon)
.de social networking bot - not wanted or needed
siteexplorer (back link checkeer) - disallow
seems to obey robots.txt
SSL problem on parent site (08-11-2018) now blocked
ALLOWS and Partials Start Here for Good Bots
proximic - reallow to PAS - Aug 2022
User-agent: proximic
Disallow:
ccbot - commercial Listed as Good and honours robot.txt from AWS - Disallow members pas

Warnings

2 invalid lines.

asgnet.co.ukrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

ahc

www.aihitdata.com

anthropic

ahrefsbot

amazonbot

applebot

archive.org_bot

bananabot

barkrowler

blexbot

borneobot

censysinspect

claudebot

clickagy

cliqzbot

crawler.007ac9.net

crawler4j

checkmarknetwork

chrome-lighthouse

dataprovider

domaincrawler

domainsproject.org

downloaderchrome

evc-batch

exabot

facebook

facebookexternalhit

facebot

fedora

www.grapeshot.co.uk

gptbot

grover

hubspot

idg/uk

jersey

komodiabot

liebaofast

lightspeedsystemscrawler

ltx71

masscan

mauibot

maxpointcrawler

megaindex

megaindex.ru

mj12bot

neevabot

nutch

pagething

panscient

pleskbot

pocketparser_ua

safednsbot

scrapy

seekport

semrushbot

seokicks-robot

serpstatbot

sogou

spaziodati

special_archiver

surdotlybot

twitterbot

uptimebot

webprosbot

wellknownbot

wotbox

baiduspider

cipacrawler

dotbot

ioncrawl

jobboersebot

linkdex.com

mtrobot

obot

pagething

asgnet.co.uk
robots.txt