aisnenouvelle.fr
robots.txt

Robots Exclusion Standard data for aisnenouvelle.fr

Resource Scan

Scan Details

Site Domain aisnenouvelle.fr
Base Domain aisnenouvelle.fr
Scan Status Ok
Last Scan2025-01-17T18:02:12+00:00
Next Scan 2025-01-24T18:02:12+00:00

Last Scan

Scanned2025-01-17T18:02:12+00:00
URL https://aisnenouvelle.fr/robots.txt
Redirect https://www.aisnenouvelle.fr/robots.txt
Redirect Domain www.aisnenouvelle.fr
Redirect Base aisnenouvelle.fr
Domain IPs 109.7.16.62, 90.83.65.62
Redirect IPs 23.220.203.122, 23.220.203.163, 2600:1413:b000:6::17d5:2bc6, 2600:1413:b000:6::17d5:2be0
Response IP 96.17.96.29
Found Yes
Hash 55b16f9f82a57d3cf5b7cdaa9bc7b573b3d9804d659c7baf7daf927fff35917e
SimHash 243f51f1c677

Groups

mediapartners-google
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
adsbot-google
googlebot_nauxeo
twitterbot
applebot
bingbot
echoboxbot
publication-access-for-facebook
grapeshot
proximic
weborama-fetcher
upday
facebookexternalhit
flipboard
flipboardproxy
siteauditbot

Rule Path
Disallow /simplesaml
Disallow /simplsamlphp_auth/
Disallow /wallyextra/contenttypesajax
Disallow /esi/
Disallow /includes/
Disallow /misc/
Disallow /modules/
Disallow /profiles/
Disallow /scripts/
Disallow /themes/
Disallow /CHANGELOG.txt
Disallow /cron.php
Disallow /INSTALL.mysql.txt
Disallow /INSTALL.pgsql.txt
Disallow /install.php
Disallow /INSTALL.txt
Disallow /LICENSE.txt
Disallow /MAINTAINERS.txt
Disallow /update.php
Disallow /UPGRADE.txt
Disallow /xmlrpc.php
Disallow /admin
Disallow /admin/
Disallow /comment/reply/
Disallow /logout
Disallow /search/
Disallow /user/register
Disallow /user/password
Disallow /user/login
Disallow /cgi-bin/
Disallow /bears
Disallow /archives/recherche*
Disallow /agenda-evenements/*
Disallow */modules/*
Disallow */advertising_script.js
Disallow /node*
Disallow */www.ultimedia.com/js/common/smart.js
Disallow */video/MMV
Disallow /package-type/
Disallow /*?amp
Disallow /*?referer=%2Farchives%2F
Disallow /atom/
Allow /misc/*.js
Allow /misc/*.css
Allow /modules/*.js
Allow /modules/*.css
Allow /profiles/*.js
Allow /profiles/*.css
Allow /themes/*.js
Allow /themes/*.css
Allow /.well-known/
Allow /apple-app-site-association
Allow */modules/*.js

adequat
adequat-systems
amisoftware
awariorssbot
awariosmartbot
argus
ask n read
asknread.com
augure
auramundi
bloodhound
cision
coexel
converacrawler
corporama
cydralspider
digimind
download ninja
downloadexpress
edd
ellisphere
eureka
europresse
explore
factiva
fasterfox
fetch
gammaspider
grub-client
httrack
ia_archiver
ia_archiver-web.archive.org
indexer
infoseek
jetbot
k2spider
kantar
kbcrawl
knowings
larbin
leadbox
libwww
linkfluence
linko
manageo
mediacompil
meltwater
mention
moreover
msiecrawler
mytwip
newscan-online
newsnow
newzbin
npbot
objectssearch
offline explorer
opinion-tracker
pimptrain
proxem
quepasacreep
qwam content intelligence
raven
readability.com
scoop.it
score3
sindup
sitecheck.internetseer.com
sitesnagger
spotter
synthesio
talkwater
teleport
teleportpro
trendeo
trendybuzz
tunitinbot
turnitinbot
up2news
vecteurplus
verif
verticalsearch
vsw
wapspider
webcopier
webreaper
webstripper
webzinger
webzip
wget
winello
youmag
zealbot
zite
zyborg

Rule Path
Disallow /

ai2bot
amazonbot
applebot-extended
anthropic-ai
bytespider
ccbot
chatgpt-user
claudebot
claude-web
cohere-ai
diffbot
duckassistbot
facebookbot
google-extended
gptbot
meta-externalagent
meta-externalfetcher
oai-searchbot
perplexitybot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.aisnenouvelle.fr/sitemap.xml
sitemap https://www.aisnenouvelle.fr/sites/default/files/sitemaps/abonne_aisnenouvelle_fr/sitemapnews-0.xml

Comments

  • Agent Specific Allowed Sections
  • Sitemaps
  • General Disallowed Paths
  • General Allowed Paths
  • Not allowed bots
  • AI Data Scrapers

Warnings

  • 2 invalid lines.