20minuts.com
robots.txt

Robots Exclusion Standard data for 20minuts.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	20minuts.com
Base Domain	20minuts.com
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Couldn't connect to server.
Last Scan	2024-06-09T02:05:03+00:00
Next Scan	2024-09-07T02:05:03+00:00

Last Successful Scan

Scanned	2022-09-28T03:08:03+00:00
URL	https://20minuts.com/robots.txt
Redirect	https://www.20minuts.com/robots.txt
Redirect Domain	www.20minuts.com
Redirect Base	20minuts.com
Response IP	172.67.158.100, 104.21.58.86
Found	Yes
Hash	493774da77569c77b9c090ebc00d90fdd55242109888672d0a0c744377fd03d5
SimHash	224b50d24e05

Groups

*

Rule	Path
Disallow	/article//commentaires
Disallow	/resultats-examen/recherche/
Disallow	/resultats-examen/candidat/
Disallow	/embed/elections/resultats/
Disallow	/v-ajax
Disallow	/v-esi
Disallow	/search

Rule

Path

Disallow

/article/*/commentaires*

Disallow

/resultats-examen/recherche/

Disallow

/resultats-examen/candidat/

Disallow

/embed/elections/resultats/

Disallow

/v-ajax

Disallow

/v-esi

Disallow

/search

grapeshot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	2

Field

Value

crawl-delay

firstrain

Rule	Path
Disallow	/

Rule

Path

Disallow

rogerbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

searchmetricsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

trendictionbot

Rule	Path
Disallow	/

Rule

Path

Disallow

xovibot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

cliqzbot

Rule	Path
Disallow	/

Rule

Path

Disallow

gnowitnewsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

maxpointcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

meltawer

Rule	Path
Disallow	/

Rule

Path

Disallow

digimind

Rule	Path
Disallow	/

Rule

Path

Disallow

knowings

Rule	Path
Disallow	/

Rule

Path

Disallow

sindup

Rule	Path
Disallow	/

Rule

Path

Disallow

cision

Rule	Path
Disallow	/

Rule

Path

Disallow

talkwater

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

converacrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

quepasacreep

Rule	Path
Disallow	/

Rule

Path

Disallow

jetbot

Rule	Path
Disallow	/

Rule

Path

Disallow

newsnow

Rule	Path
Disallow	/

Rule

Path

Disallow

kbcrawl

Rule	Path
Disallow	/

Rule

Path

Disallow

amisoftware

Rule	Path
Disallow	/

Rule

Path

Disallow

newzbin

Rule	Path
Disallow	/

Rule

Path

Disallow

ask n read

Rule	Path
Disallow	/

Rule

Path

Disallow

qwam content intelligence

Rule	Path
Disallow	/

Rule

Path

Disallow

zite

Rule	Path
Disallow	/

Rule

Path

Disallow

youmag

Rule	Path
Disallow	/

Rule

Path

Disallow

synthesio

Rule	Path
Disallow	/

Rule

Path

Disallow

trendybuzz

Rule	Path
Disallow	/

Rule

Path

Disallow

scoop.it

Rule	Path
Disallow	/

Rule

Path

Disallow

linkfluence

Rule	Path
Disallow	/

Rule

Path

Disallow

augure

Rule	Path
Disallow	/

Rule

Path

Disallow

corporama

Rule	Path
Disallow	/

Rule

Path

Disallow

readability.com

Rule	Path
Disallow	/

Rule

Path

Disallow

grub-client

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver-web.archive.org

Rule	Path
Disallow	/

Rule

Path

Disallow

k2spider

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

wget

Rule	Path
Disallow	/

Rule

Path

Disallow

adequat

Rule	Path
Disallow	/

Rule

Path

Disallow

adequat-systems

Rule	Path
Disallow	/

Rule

Path

Disallow

auramundi

Rule

Path

Disallow

coexel

Rule

Path

Disallow

leadbox

Rule

Path

Disallow

mention

Rule

Path

Disallow

moreover

Rule

Path

Disallow

mytwip

Rule

Path

Disallow

newsnow

Rule

Path

Disallow

newzbin

Rule

Path

Disallow

opinion-tracker

Rule

Path

Disallow

proxem

Rule

Path

Disallow

score3

Rule

Path

Disallow

trendeo

Rule

Path

Disallow

vecteurplus

Rule

Path

Disallow

verticalsearch

Rule

Path

Disallow

vsw

Rule

Path

Disallow

winello

Rule

Path

Disallow

fetch

Rule

Path

Disallow

infoseek

Rule

Path

Disallow

msiecrawler

Rule

Path

Disallow

offline explorer

Rule

Path

Disallow

sitecheck.internetseer.com

Rule

Path

Disallow

sitesnagger

Rule

Path

Disallow

teleport

Rule

Path

Disallow

teleportpro

Rule

Path

Disallow

webcopier

Rule

Path

Disallow

webstripper

Rule

Path

Disallow

zealbot

Rule

Path

Disallow

asknread.com

Rule

Path

Disallow

ellisphere

Rule

Path

Disallow

spotter

Rule

Path

Disallow

riddler

Rule

Path

Disallow

Other Records

Field

Value

sitemap

https://www.20minutes.fr/sitemap-arbo.xml

Warnings

4 invalid lines.

20minuts.comrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

grapeshot

Other Records

firstrain

rogerbot

ahrefsbot

searchmetricsbot

mj12bot

dotbot

nutch

trendictionbot

xovibot

yandex

cliqzbot

gnowitnewsbot

maxpointcrawler

meltawer

digimind

knowings

sindup

cision

talkwater

turnitinbot

converacrawler

quepasacreep

jetbot

newsnow

kbcrawl

amisoftware

newzbin

ask n read

qwam content intelligence

zite

youmag

synthesio

trendybuzz

scoop.it

linkfluence

augure

corporama

readability.com

grub-client

ia_archiver

ia_archiver-web.archive.org

k2spider

libwww

wget

adequat

adequat-systems

auramundi

coexel

leadbox

mention

moreover

mytwip

newsnow

newzbin

opinion-tracker

proxem

score3

trendeo

vecteurplus

verticalsearch

vsw

winello

fetch

infoseek

msiecrawler

offline explorer

sitecheck.internetseer.com

sitesnagger

teleport

teleportpro

webcopier

webstripper

20minuts.com
robots.txt