avenirdeseglisesdebruxelles.be
robots.txt

Robots Exclusion Standard data for avenirdeseglisesdebruxelles.be

Archived Snapshots

Resource Scan

Scan Details

Site Domain	avenirdeseglisesdebruxelles.be
Base Domain	avenirdeseglisesdebruxelles.be
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Couldn't connect to server.
Last Scan	2024-09-15T02:34:24+00:00
Next Scan	2024-12-14T02:34:24+00:00

Last Successful Scan

Scanned	2022-04-27T11:55:23+00:00
URL	https://avenirdeseglisesdebruxelles.be/robots.txt
Redirect	https://www.avenirdeseglisesdebruxelles.be/robots.txt
Redirect Domain	www.avenirdeseglisesdebruxelles.be
Redirect Base	avenirdeseglisesdebruxelles.be
Response IP	172.67.169.151
Found	Yes
Hash	1eec7667fb1b839a40aecb0729e1e357d45d7077ed39c6cc4c3109d324aa7e93
SimHash	5567214b65ba

Groups

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot-sa

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

rogerbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver

Rule	Path
Disallow	/

Rule

Path

Disallow

velenpublicwebcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

sogou spider

Rule	Path
Disallow	/

Rule

Path

Disallow

youdaobot

Rule	Path
Disallow	/

Rule

Path

Disallow

adsbot-google

Rule	Path
Disallow	/js/

Rule

Path

Disallow

/js/

alphaseobot

Rule	Path
Disallow	/

Rule

Path

Disallow

siteexplorer

Rule	Path
Disallow	/

Rule

Path

Disallow

sitesucker

Rule	Path
Disallow	/

Rule

Path

Disallow

openindexspider

Rule	Path
Disallow	/

Rule

Path

Disallow

booglebot

Rule	Path
Disallow	/

Rule

Path

Disallow

backlinkcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

zoominfobot

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

netestate ne crawler (+http://www.website-datenbank.de/)

Rule	Path
Disallow	/

Rule

Path

Disallow

zoominfobot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

hubspot crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

mail.ru_bot

Rule	Path
Disallow	/

Rule

Path

Disallow

mail.ru

Rule	Path
Disallow	/

Rule

Path

Disallow

serpstatbot

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.ru

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.com

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

bingbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

*

Rule	Path
Allow	/

Rule

Path

Allow

Comments

-----------------------------------------------------------
robots.txt, last refresh 2021/11/20
-----------------------------------------------------------
not all bots below may obey robots.txt in general
or specific rules, respectively
cat /home/wwwlogs/access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -nr | head -20
-----------------------------------------------------------
semrush bot
ahrefs bot
moz bot
Wayback Machine
https://velen.io/
Baiduspider
Block SoGou
Block Youdao
AdsBot
http://alphaseobot.com/bot.html
http://siteexplorer.info/about.html
http://www.sitesucker.us/mac/limitations.html
https://www.openindex.io/saas/about-our-spider/
http://www.backlinktest.com/crawler.html
http://napoveda.seznam.cz/
http://www.website-datenbank.de
Block netEstate NE Crawler (+http://www.website-datenbank.de/)
Block BlexBot
https://megaindex.com/crawler
------------
not exclude
------------

avenirdeseglisesdebruxelles.berobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

semrushbot

semrushbot-sa

ahrefsbot

rogerbot

dotbot

ia_archiver

velenpublicwebcrawler

baiduspider

sogou spider

youdaobot

adsbot-google

alphaseobot

siteexplorer

sitesucker

openindexspider

booglebot

backlinkcrawler

zoominfobot

seznambot

seznambot

netestate ne crawler (+http://www.website-datenbank.de/)

zoominfobot

blexbot

mj12bot

hubspot crawler

seznambot

mail.ru_bot

mail.ru

serpstatbot

baiduspider

megaindex.ru

megaindex.com

yandex

bingbot

Other Records

*

Comments

avenirdeseglisesdebruxelles.be
robots.txt