belgium-iphone.lesoir.be
robots.txt

Robots Exclusion Standard data for belgium-iphone.lesoir.be

Archived Snapshots

Resource Scan

Scan Details

Site Domain	belgium-iphone.lesoir.be
Base Domain	lesoir.be
Scan Status	Ok
Last Scan	2024-05-21T08:32:26+00:00
Next Scan	2024-06-20T08:32:26+00:00

Last Scan

Scanned	2024-05-21T08:32:26+00:00
URL	https://belgium-iphone.lesoir.be/robots.txt
Domain IPs	23.48.107.17, 23.48.107.24, 2600:1413:b000:1a::17d7:74a, 2600:1413:b000:1a::17d7:75e
Response IP	23.48.107.24
Found	Yes
Hash	529d8bcacdee6edad4c834dc2dff7972a772703ae626269860dbb18c05d3beb9
SimHash	a03a5b814465

Groups

mediapartners-google
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
adsbot-google
googlebot_nauxeo
bingbot
twitterbot
applebot
bingbot
facebot
siteauditbot
screaming frog seo spider
grapeshot
ias_crawler
publication-access-for-facebook
proximic
facebookexternalhit
flipboard
flipboardproxy
weborama-fetcher
taboolabot
upday

Rule	Path
Disallow	/wp-admin/
Allow	/wp-admin/admin-ajax.php
Allow	/.well-known/
Disallow	/cache/
Disallow	/data/
Disallow	/includes/
Disallow	/templates/
Disallow	/recherche/
Disallow	/produits/guide/
Disallow	/produits/*/comparateur/

Rule

Path

Disallow

/wp-admin/

Allow

/wp-admin/admin-ajax.php

Allow

/.well-known/

Disallow

/cache/

Disallow

/data/

Disallow

/includes/

Disallow

/templates/

Disallow

/recherche/

Disallow

/produits/guide/

Disallow

/produits/*/comparateur/

exabot
slurp

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	20

Field

Value

crawl-delay

adequat

Rule	Path
Disallow	/

Rule

Path

Disallow

adequat-systems

Rule	Path
Disallow	/

Rule

Path

Disallow

amisoftware

Rule	Path
Disallow	/

Rule

Path

Disallow

argus

Rule	Path
Disallow	/

Rule

Path

Disallow

ask n read

Rule	Path
Disallow	/

Rule

Path

Disallow

asknread.com

Rule	Path
Disallow	/

Rule

Path

Disallow

augure

Rule	Path
Disallow	/

Rule

Path

Disallow

auramundi

Rule	Path
Disallow	/

Rule

Path

Disallow

bloodhound

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

cision

Rule	Path
Disallow	/

Rule

Path

Disallow

coexel

Rule	Path
Disallow	/

Rule

Path

Disallow

converacrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

corporama

Rule	Path
Disallow	/

Rule

Path

Disallow

cydralspider

Rule	Path
Disallow	/

Rule

Path

Disallow

digimind

Rule	Path
Disallow	/

Rule

Path

Disallow

download ninja

Rule	Path
Disallow	/

Rule

Path

Disallow

downloadexpress

Rule	Path
Disallow	/

Rule

Path

Disallow

edd

Rule	Path
Disallow	/

Rule

Path

Disallow

ellisphere

Rule	Path
Disallow	/

Rule

Path

Disallow

eureka

Rule	Path
Disallow	/

Rule

Path

Disallow

europresse

Rule	Path
Disallow	/

Rule

Path

Disallow

explore

Rule	Path
Disallow	/

Rule

Path

Disallow

factiva

Rule	Path
Disallow	/

Rule

Path

Disallow

fasterfox

Rule	Path
Disallow	/

Rule

Path

Disallow

fetch

Rule	Path
Disallow	/

Rule

Path

Disallow

gammaspider

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

grub-client

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver-web.archive.org

Rule	Path
Disallow	/

Rule

Path

Disallow

indexer

Rule	Path
Disallow	/

Rule

Path

Disallow

infoseek

Rule	Path
Disallow	/

Rule

Path

Disallow

jetbot

Rule	Path
Disallow	/

Rule

Path

Disallow

k2spider

Rule	Path
Disallow	/

Rule

Path

Disallow

kantar

Rule	Path
Disallow	/

Rule

Path

Disallow

kbcrawl

Rule	Path
Disallow	/

Rule

Path

Disallow

knowings

Rule	Path
Disallow	/

Rule

Path

Disallow

larbin

Rule	Path
Disallow	/

Rule

Path

Disallow

leadbox

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

linkfluence

Rule	Path
Disallow	/

Rule

Path

Disallow

linko

Rule	Path
Disallow	/

Rule

Path

Disallow

manageo

Rule	Path
Disallow	/

Rule

Path

Disallow

mediacompil

Rule

Path

Disallow

meltwater

Rule

Path

Disallow

mention

Rule

Path

Disallow

moreover

Rule

Path

Disallow

msiecrawler

Rule

Path

Disallow

mytwip

Rule

Path

Disallow

newscan-online

Rule

Path

Disallow

newsnow

Rule

Path

Disallow

newzbin

Rule

Path

Disallow

npbot

Rule

Path

Disallow

objectssearch

Rule

Path

Disallow

offline explorer

Rule

Path

Disallow

opinion-tracker

Rule

Path

Disallow

pimptrain

Rule

Path

Disallow

proxem

Rule

Path

Disallow

quepasacreep

Rule

Path

Disallow

qwam content intelligence

Rule

Path

Disallow

raven

Rule

Path

Disallow

readability.com

Rule

Path

Disallow

scoop.it

Rule

Path

Disallow

score3

Rule

Path

Disallow

sindup

Rule

Path

Disallow

sitecheck.internetseer.com

Rule

Path

Disallow

sitesnagger

Rule

Path

Disallow

spotter

Rule

Path

Disallow

synthesio

Rule

Path

Disallow

talkwater

Rule

Path

Disallow

teleport

Rule

Path

Disallow

teleportpro

Rule

Path

Disallow

trendeo

Rule

Path

Disallow

trendybuzz

Rule

Path

Disallow

tunitinbot

Rule

Path

Disallow

turnitinbot

Rule

Path

Disallow

up2news

Rule

Path

Disallow

vecteurplus

Rule

Path

Disallow

verif

Rule

Path

Disallow

verticalsearch

Rule

Path

Disallow

vsw

Rule

Path

Disallow

wapspider

Rule

Path

Disallow

webcopier

Rule

Path

Disallow

webreaper

Rule

Path

Disallow

webstripper

Rule

Path

Disallow

webzinger

Rule

Path

Disallow

webzip

Rule

Path

Disallow

wget

Rule

Path

Disallow

winello

Rule

Path

Disallow

youmag

Rule

Path

Disallow

zealbot

Rule

Path

Disallow

zite

Rule

Path

Disallow

zyborg

Rule

Path

Disallow

Other Records

Field

Value

sitemap

https://belgium-iphone.lesoir.be/sitemaps/sitemapnews-0.xml

sitemap

https://belgium-iphone.lesoir.be/sitemaps/sitemapindex.xml

Comments

robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/wc/robots.html
Allowed search engines directives
Sitemaps
Crawling limitation fixed for low priority bots

Warnings

4 invalid lines.

belgium-iphone.lesoir.berobots.txt

Resource Scan

Scan Details

Last Scan

Groups

exabotslurp

Other Records

adequat

adequat-systems

amisoftware

argus

ask n read

asknread.com

augure

auramundi

bloodhound

ccbot

chatgpt-user

cision

coexel

converacrawler

corporama

cydralspider

digimind

download ninja

downloadexpress

edd

ellisphere

eureka

europresse

explore

factiva

fasterfox

fetch

gammaspider

gptbot

grub-client

httrack

ia_archiver

ia_archiver-web.archive.org

indexer

infoseek

jetbot

k2spider

kantar

kbcrawl

knowings

larbin

leadbox

libwww

linkfluence

linko

manageo

mediacompil

meltwater

mention

moreover

msiecrawler

mytwip

newscan-online

newsnow

newzbin

npbot

objectssearch

offline explorer

opinion-tracker

pimptrain

proxem

quepasacreep

qwam content intelligence

raven

readability.com

scoop.it

score3

sindup

sitecheck.internetseer.com

sitesnagger

spotter

synthesio

talkwater

belgium-iphone.lesoir.be
robots.txt

exabot
slurp