118businessdirectory.co.uk
robots.txt

Robots Exclusion Standard data for 118businessdirectory.co.uk

Archived Snapshots

Resource Scan

Scan Details

Site Domain	118businessdirectory.co.uk
Base Domain	118businessdirectory.co.uk
Scan Status	Ok
Last Scan	2024-09-26T19:08:59+00:00
Next Scan	2024-10-03T19:08:59+00:00

Last Scan

Scanned	2024-09-26T19:08:59+00:00
URL	https://118businessdirectory.co.uk/robots.txt
Domain IPs	104.21.95.31, 172.67.142.162, 2606:4700:3030::6815:5f1f, 2606:4700:3035::ac43:8ea2
Response IP	104.21.95.31
Found	Yes
Hash	acd2934460c57ddd28acff1c985b75b029992f8f5031658437eff39dba218eee
SimHash	dbc85542f909

Groups

*

Rule	Path
Disallow	/decor-life/

Rule

Path

Disallow

/decor-life/

ninjabot

Rule	Path
Allow	/

Rule

Path

Allow

mediapartners-google*

Rule	Path
Allow	/

Rule

Path

Allow

adsbot-google

Rule	Path
Allow	/

Rule

Path

Allow

googlebot-mobile

Rule	Path
Allow	/

Rule

Path

Allow

googlebot

Rule	Path
Allow	/

Rule

Path

Allow

ahrefsbot

Rule	Path
Allow	/

Rule

Path

Allow

semrushbot

Rule	Path
Allow	/

Rule

Path

Allow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot-sa

Rule	Path
Allow	/

Rule

Path

Allow

semrushbot-ba

Rule	Path
Allow	/

Rule

Path

Allow

semrushbot-si

Rule	Path
Allow	/

Rule

Path

Allow

semrushbot-swa

Rule	Path
Allow	/

Rule

Path

Allow

semrushbot-ct

Rule	Path
Allow	/

Rule

Path

Allow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

alexibot

Rule	Path
Disallow	/

Rule

Path

Disallow

surveybot

Rule	Path
Allow	/

Rule

Path

Allow

xenu’s

Rule	Path
Disallow	/

Rule

Path

Disallow

xenu’s link sleuth 1.1c

Rule	Path
Disallow	/

Rule

Path

Disallow

rogerbot

Rule	Path
Disallow	/

Rule

Path

Disallow

nextgensearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver

Rule	Path
Disallow	/

Rule

Path

Disallow

archive.org_bot

Rule	Path
Disallow	/

Rule

Path

Disallow

archive.org bot

Rule	Path
Disallow	/

Rule

Path

Disallow

linkwalker

Rule	Path
Disallow	/

Rule

Path

Disallow

gigablast spider

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver-web.archive.org

Rule	Path
Disallow	/

Rule

Path

Disallow

picscout

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

tineye

Rule	Path
Disallow	/

Rule

Path

Disallow

seokicks-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

uptimerobot/2.0

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms robot

Rule	Path
Disallow	/

Rule

Path

Disallow

netestate ne crawler (+http://www.website-datenbank.de/)

Rule	Path
Disallow	/

Rule

Path

Disallow

wiseguys robot

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitin robot

Rule	Path
Disallow	/

Rule

Path

Disallow

heritrix

Rule	Path
Disallow	/

Rule

Path

Disallow

pimonster

Rule	Path
Disallow	/

Rule

Path

Disallow

pimonster

Rule	Path
Disallow	/

Rule

Path

Disallow

pi-monster

Rule	Path
Disallow	/

Rule

Path

Disallow

eccp/1.0 (search@eniro.com)

Rule	Path
Disallow	/

Rule

Path

Disallow

psbot

Rule	Path
Disallow	/

Rule

Path

Disallow

youdaobot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

naverbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yeti

Rule	Path
Disallow	/

Rule

Path

Disallow

zbot

Rule

Path

Disallow

vagabondo

Rule

Path

Disallow

linkwalker

Rule

Path

Disallow

simplepie

Rule

Path

Disallow

wget

Rule

Path

Disallow

pixray-seeker

Rule

Path

Disallow

boardreader

Rule

Path

Disallow

quantify

Rule

Path

Disallow

plukkie

Rule

Path

Disallow

cuam

Rule

Path

Disallow

megaindex.ru

Rule

Path

Disallow

megaindex.com

Rule

Path

Disallow

megaindex.ru/2.0

Rule

Path

Disallow

megaindex.ru

Rule

Path

Disallow

Other Records

Field

Value

sitemap

https://118businessdirectory.co.uk/sitemap.xml

Comments

Block NextGenSearchBot
Block ia-archiver from crawling site
Block archive.org_bot from crawling site
Block Archive.org Bot from crawling site
Block LinkWalker from crawling site
Block GigaBlast Spider from crawling site
Block ia_archiver-web.archive.org_bot from crawling site
Block PicScout Crawler from crawling site
Block BLEXBot Crawler from crawling site
Block TinEye from crawling site
Block SEOkicks
Block BlexBot
Block SISTRIX
Block Uptime robot
Block Ezooms Robot
Block netEstate NE Crawler (+http://www.website-datenbank.de/)
Block WiseGuys Robot
Block Turnitin Robot
Block Heritrix
Block pricepi
Block Eniro
Block Psbot
Block Youdao
BLEXBot
Block NaverBot
Block ZBot
Block Vagabondo
Block LinkWalker
Block SimplePie
Block Wget
Block Pixray-Seeker
Block BoardReader
Block Quantify
Block Plukkie
Block Cuam
https://megaindex.com/crawler

Warnings

2 invalid lines.

118businessdirectory.co.ukrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ninjabot

mediapartners-google*

adsbot-google

googlebot-mobile

googlebot

ahrefsbot

semrushbot

mj12bot

semrushbot-sa

semrushbot-ba

semrushbot-si

semrushbot-swa

semrushbot-ct

dotbot

alexibot

surveybot

xenu’s

xenu’s link sleuth 1.1c

rogerbot

nextgensearchbot

ia_archiver

archive.org_bot

archive.org bot

linkwalker

gigablast spider

ia_archiver-web.archive.org

picscout

blexbot crawler

tineye

seokicks-robot

blexbot

sistrix crawler

uptimerobot/2.0

ezooms robot

netestate ne crawler (+http://www.website-datenbank.de/)

wiseguys robot

turnitin robot

heritrix

pimonster

pimonster

pi-monster

eccp/1.0 (search@eniro.com)

psbot

youdaobot

blexbot

naverbot

yeti

zbot

vagabondo

linkwalker

simplepie

wget

pixray-seeker

boardreader

quantify

plukkie

cuam

megaindex.ru

megaindex.com

megaindex.ru/2.0

megaindex.ru

Other Records

Comments

Warnings

118businessdirectory.co.uk
robots.txt