marvelousnews.com
robots.txt

Robots Exclusion Standard data for marvelousnews.com

Resource Scan

Scan Details

Site Domain marvelousnews.com
Base Domain marvelousnews.com
Scan Status Ok
Last Scan2024-11-11T12:11:33+00:00
Next Scan 2024-11-18T12:11:33+00:00

Last Scan

Scanned2024-11-11T12:11:33+00:00
URL https://marvelousnews.com/robots.txt
Domain IPs 66.135.63.168
Response IP 66.135.63.168
Found Yes
Hash b5638c215eb1073406c3b69cf491cb757812aef3c2640f55b36eaf37cf912203
SimHash 7b46d6310ab7

Groups

*

Rule Path
Allow /

Other Records

Field Value
crawl-delay 10

a6-indexer
aboundex
acapbot
acoonbot
adbeat_bot
addsearchbot
addthis
adidxbot
admantx
adscanner
adstxtcrawler
advbot
ahc
ahrefsbot
ahrefs
aihitbot
aisearchbot
alphabot
anderspinkbot
antibot
anyevent
apercite
appengine-google
appinsights
applebot
arabot
archivebot
aspiegelbot
atom awariorssbot
awariorssbot
awariosmartbot
awesomecrawler
axios
b2b backlinkcrawler
baidu-yunguance
baiduspider
baiduspider
bark[rr]owler
bazqux
bdcbot
behloolbot
betabot
bidswitchbot
biglotron
binlar
bitbot bitlybot
blackboard
blexbot
blog blogmurabot
blogtrafficd.d+ blp_bbot
bnf.fr_bot
bomborabot
bot-pge.chlooe.com
bot.araturka.com
botify
bot
bot
bot
boxcarbot
brainobot
brandonbot
brandverity
btwebclient
bubing
bublupbot
buck
buzzbot
bytespider
bytespider
caliperbot
capsulechecker
careerbot
cc ccbot
changedetection
checkmarknetwork
check_http
chrome-lighthouse
cincraw
citeseerxbot
claudebot
clickagy
cliqzbot
cloudflare-alwaysonline
cloudfront
coccoc
collection@infegy.com
companybook-crawler
content contextad contxbot
convera
crawler crawler4j
crawler
criteobot
criteobot/0.1
crunchbot
crystalsemanticsbot
curebot
cutbot
cxensebot
cyberpatrol
dareboost
datafeedwatch
dataforseobot
datagnionbot
datanyze
dataprovider.com
daum
dcrawl
deadlinkchecker
deeper
deusu
diffbot
digg digincore discobot
discordbot
disqus
dnyzbot
domain domaincrawler
domains domainstatsbot
dotbot
dotbot
dragonbot
drupact
dubbotbot
ec2linkfinder
edisterbot
electricmonk
elisabot
embedly
engine
enterprise epicbot
eright
europarchive.org
everyonesocialbot
exabot
experibot
extlinksbot
eyeotabot
ezid
ezooms
facebookexternalhit
facebook
facebookbot
facebot
fast fast-webcrawler
fedoraplanet
feed feed-fetcher
feedbot
feedfetcher-google
feedly
feedspot
feedvalidator
femtosearchbot
fetch
fever
filterdb.iss.netcrawler
finditanswersbot
findlink
findthatfile
findxbot
flamingo_searchengine
flipboardproxy
fluffy
fr-crawler
free freewebmonitoring freshrss
friendica
frog fuelbot
fyrebot
g00g1e.net
g2 g2reader-bot
garlikcrawler
genieo
gigablast
gigabot
gingercrawler
gluten gnam gnam gnowitnewsbot
go-http-client
google-xrawler
gowikibot
gptbot
grapeshotcrawler
grapeshot
grobbot
grouphigh
grub.org
gslfbot
gwene
hatena
headlesschrome
heritrix
httpunit
httpurlconnection
http_get
httrack
hubspot
ias ia_archiver
icbot
icc-crawler
ichiro
imrbot
indeedbot
infoobot
integromedb
intelium_bot
interfaxscanbot
ip-web-crawler.com
ips-agent
iskanie
istellabot
it2media-domain-crawler
james jamie's jetslide
jetty
jobboersebot
jooblebot
jpg-newsbot
jugendschutzprogramm-crawler
jyxobot
k7mlwcbot
kemvibot
kosmiobot
landau-media-spider
laserlikebot
lb-spider
leikibot
libwww-perl
linguee link link linkapediabot
linkarchiver
linkdex
linkedinbot
linkisbot
lipperhey
livelap[bb]ot
lssbot
lssrocketcrawler
ltx71
luminator-robots
magpie-crawler
mail.ru_bot
mappydata
mastodon
mauibot
mbcrawler
mediapartners-google
mediapartners
mediatoolkitbot
megaindex
meltwaternews
memorybot
metadata metajobbot
metauri
microdata mindupbot
miniflux
mixnodecache
mj12bot
mlbot
moatbot
mojeekbot
moodlebot
moreover
msrbot
muckrack
multiviewbot
naver ne nerdbynature.bot
nerdybot
netcraftsurveyagent
netestate neticle netresearchserver
netvibes
newsharecounts
newspaper
nextcloud
niki-bot
nimbostratus-bot
ning
ninja nixstatsbot
nmap ntentbot
nutch
nuzzel
ocarinabot
officestorebot
okhttp
omgili
online-webceo-bot
openhosebot
openindexspider
orangebot
outbrain
outclicksbot
page2rss
pagepeeker
pandalytics
panscient
paperlibot
pcore-http
petalbot
phantomjs
phpcrawl
pingdom
pinterest.com.bot
pinterestbot
piplbot
pocketparser
postrank
pr-cy.ru
preview
preview
primalbot
privacyawarebot
project
proximic
psbot
pulsepoint
purebot
python-requests
python-urllib
qwantify
rankactivelinkbot
re-animator redditbot
refindbot
regionstuttgartbot
retrevopageanalyzer
ridderbot
rivva
robot
rogerbot
rssbot
rssbot
rssingbot
rss
rytebot
safednsbot
safesearch sbl-bot
scaper
scoutjet
scrapy
screaming scribdbot
scripting searchatlas
seekbot
seekport seewithkids
semanticbot
semanticscholarbot
semrushbot-coub
semrushbot-ct
semrushbot-si
semrushbot-swa
semrushbot
semrushbot
semrush
sentibot
seo seobilitybot
seokicks
seoscanners
serendeputybot
serpstatbot
services
seznambot
simplecrawler
simplepie
simplescraper
sistrix siteauditbot
sitebot
sitechecker
siteexplorer.info
siteimprove.com
skypeuripreview
slack-imgproxy
slackbot
sleuth
slurp
smtbot
snacktory
socialrankiobot
sogou
sonic
spbot
speedy
spider
spider
splitsignalbot
startmebot
storygizebot
streamline3bot
summify
superfeedr
surdotlybot
surveybot
swimgbot
sysomos
taboolabot
tagoobot
tangibleebot
telegrambot
teoma
theoldreader.com
thinklab
tigerbot
tineye
tiny tiny toplistbot
toutiaospider
traackr.com
tracemyfile
trendictionbot
trendsmapresolver
trove
ttd-content
turnitinbot
tweetedtimes
tweetmemebot
twengabot
twingly
twitterbot
twurly
um-ln
upflow
uptimebot.org
uptimerobot
urlappendbot
user-agent: usinenouvellecrawler
ut-dorkbot
validator.nu
vebidoobot
velenpublicwebcrawler
veoozbot
vigil
vkrobot
vkshare
voilabot
voluumdsp-content-bot
w3c-checklink
w3c-mobileok
w3c_css_validator
w3c_i18n-checker
w3c_unicorn
w3c_validator
wbsearchbot
web web web-archive-net.com.bot
webcompanycrawler
webdatastats
webmon
wesee:search
whatsapp
wocbot
woobot
weborama-fetcher
wordupinfosearch
woriobot
wotbox
www.uptime.com
xenu xovibot
y!j
yacybot
yahoo yak
yandexaccessibilitybot
yandexbot
yandeximageresizer
yandeximages
yandexmetrika
yandexmobilebot
yandexturbo
yandex
yandexvideoparser
yanga
yeti
yisouspider
yoozbot
zabbix
zenback zgrab
zoombot
zoominfobot
zumbot
zuperlistbot

Rule Path
Disallow /
Allow /adstxt/
Allow /ads.txt

Other Records

Field Value
crawl-delay 10000

Warnings

  • 7 invalid lines.