worldcrunch.com
robots.txt
Robots Exclusion Standard data for worldcrunch.com
Resource Scan
Scan Details
Site Domain | worldcrunch.com |
Base Domain | worldcrunch.com |
Scan Status | Ok |
Last Scan | 2024-11-12T20:18:45+00:00 |
Next Scan | 2024-11-19T20:18:45+00:00 |
Last Scan
Scanned | 2024-11-12T20:18:45+00:00 |
URL | https://worldcrunch.com/robots.txt |
Domain IPs | 104.26.8.104, 104.26.9.104, 172.67.74.156, 2606:4700:20::681a:868, 2606:4700:20::681a:968, 2606:4700:20::ac43:4a9c |
Response IP | 104.26.9.104 |
Found | Yes |
Hash | 9c18538f7c8207ee44ef8e418d55179b8398a778c4b55c31723aa5d48e050993 |
SimHash | 761f5260cc87 |
Groups
*
Rule | Path |
---|---|
Disallow | /core/* |
Disallow | /r/* |
Disallow | /mnt/* |
Disallow | /res/* |
anthropic-ai
claude-web
claudebot
webedia
adequat
adequat-systems
adsbot-google
alexibot
alvinetspider
amazonbot
amisoftware
antenne hatena
anthropic-ai
apocalxexplorerbot
applebot-extended
argus
ask n read
asknread.com
asterias
augure
augure
auramundi
awakari
backdoorbot/1.0
bizinformation
black hole
bloodhound
bloomberg
blowfish/1.0
botalot
builtbottough
bullseye/1.0
bunnyslippers
bytespider
ccbot
cegbfeieh
chatgpt-user
cheesebot
cherrypicker
cherrypickerelite/1.0
cherrypickerse/1.0
cision
claude-web
claude-web
claudebot
coexel
cohere-ai
converacrawler
copyrightcheck
corporama
cosmos
crescent
crescnt internet toolpak http ole control v.1.0
cydralspider
diffbot
digimind
disco pump 3.1
dittospyder
dotbot
download ninja
downloadexpress
edd
ellisphere
emailcollector
emailsiphon
emailwolf
erocrawler
eureka
europresse
explore
extractorpro
facebookbot
factiva
fasterfox
fetch
flamingo_searchengine
foobot
friendlycrawler
gammaspider
google-extended
googleother
gptbot
grub-client
harvest/1.5
hloader
httplib
httrack
httrack 3.0
humanlinks
ia_archiver
ia_archiver-web.archive.org
igentia
imagesiftbot
img2dataset
indexer
infonavirobot
infoseek
jennybot
jetbot
jikespider
k2spider
kantar
kbcrawl
kenjin spider
knowings
larbin
leadbox
lexibot
libweb/clshttp
libwww
linkextractorpro
linkfluence
linko
linkscan/8.1a unix
linkwalker
lwp-trivial
lwp-trivial/1.34
manageo
mata hari
mediacompil
meltwater
mention
microsoft url control - 5.01.4511
microsoft url control - 6.00.8169
miixpc
miixpc/4.2
mister pix
mlbot
moget
moget/2.1
moreover
ms search 4.0 robot
ms search 5.0 robot
msiecrawler
mytwip
naverbot
netants
netattache
netmechanic
newscan-online
newsnow
newzbin
nicerspro
npbot
objectssearch
offline explorer
omgili
omgilibot
omigili
omigilibot
openfind
openindexspider
opinion-tracker
peer39_crawler
peer39_crawler/1.0
perplexitybot
pimptrain
propowerbot/2.14
prowebwalker
proxem
psbot
quepasacreep
queryn metasearch
qwam content intelligence
raven
readability.com
repomonkey
rma
scoop.it
score3
semrushbot
sightupbot
sindup
sitebot
sitecheck.internetseer.com
sitesnagger
sitesucker
sogou web spider
sosospider
spankbot
spanner
speedy
spotter
suggybot
superbot
superbot/2.6
suzuran
synthesio
szukacz/1.4
talkwalker
teleport
teleportpro
telesoft
the intraformant
thenomad
tighttwatbot
titan
tocrawl/urldispatcher
toscrawler
trendeo
trendybuzz
true_robot
true_robot/1.0
tunitinbot
turingos
turnitinbot
up2news
urlpouls
urly warning
vci
vecteurplus
verif
verticalsearch
vsw
wapspider
web image collector
webauto
webbandit
webbandit/3.50
webcopier
webcopy
webenhancer
webmasterworldforumbot
webmirror
webreaper
websauger
website extractor
website quester
webster pro
webstripper
webstripper/2.02
webzinger
webzip
wget
wikiofeedbot
winello
winhttrack
www-collector-e
xenu link sleuth/1.3.8
yacy
yandex
youbot
youmag
yrspider
zealbot
zeus
zite
zookabot
zyborg
Rule | Path |
---|---|
Disallow | / |
Other Records
Field | Value |
---|---|
sitemap | https://worldcrunch.com/sitemap.xml |
sitemap | https://worldcrunch.com/sitemap_news.xml |
sitemap | https://worldcrunch.com/sitemap_video.xml |
sitemap | https://worldcrunch.com/sitemap_sections.xml |
sitemap | https://worldcrunch.com/sitemap_tags.xml |
Warnings
- 2 invalid lines.