worldcrunch.com
robots.txt

Robots Exclusion Standard data for worldcrunch.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	worldcrunch.com
Base Domain	worldcrunch.com
Scan Status	Ok
Last Scan	2024-11-12T20:18:45+00:00
Next Scan	2024-11-19T20:18:45+00:00

Last Scan

Scanned	2024-11-12T20:18:45+00:00
URL	https://worldcrunch.com/robots.txt
Domain IPs	104.26.8.104, 104.26.9.104, 172.67.74.156, 2606:4700:20::681a:868, 2606:4700:20::681a:968, 2606:4700:20::ac43:4a9c
Response IP	104.26.9.104
Found	Yes
Hash	9c18538f7c8207ee44ef8e418d55179b8398a778c4b55c31723aa5d48e050993
SimHash	761f5260cc87

Groups

*

Rule	Path
Disallow	/core/*
Disallow	/r/*
Disallow	/mnt/*
Disallow	/res/*

Rule

Path

Disallow

/core/*

Disallow

/r/*

Disallow

/mnt/*

Disallow

/res/*

anthropic-ai
claude-web
claudebot
webedia
adequat
adequat-systems
adsbot-google
alexibot
alvinetspider
amazonbot
amisoftware
antenne hatena
anthropic-ai
apocalxexplorerbot
applebot-extended
argus
ask n read
asknread.com
asterias
augure
augure
auramundi
awakari
backdoorbot/1.0
bizinformation
black hole
bloodhound
bloomberg
blowfish/1.0
botalot
builtbottough
bullseye/1.0
bunnyslippers
bytespider
ccbot
cegbfeieh
chatgpt-user
cheesebot
cherrypicker
cherrypickerelite/1.0
cherrypickerse/1.0
cision
claude-web
claude-web
claudebot
coexel
cohere-ai
converacrawler
copyrightcheck
corporama
cosmos
crescent
crescnt internet toolpak http ole control v.1.0
cydralspider
diffbot
digimind
disco pump 3.1
dittospyder
dotbot
download ninja
downloadexpress
edd
ellisphere
emailcollector
emailsiphon
emailwolf
erocrawler
eureka
europresse
explore
extractorpro
facebookbot
factiva
fasterfox
fetch
flamingo_searchengine
foobot
friendlycrawler
gammaspider
google-extended
googleother
gptbot
grub-client
harvest/1.5
hloader
httplib
httrack
httrack 3.0
humanlinks
ia_archiver
ia_archiver-web.archive.org
igentia
imagesiftbot
img2dataset
indexer
infonavirobot
infoseek
jennybot
jetbot
jikespider
k2spider
kantar
kbcrawl
kenjin spider
knowings
larbin
leadbox
lexibot
libweb/clshttp
libwww
linkextractorpro
linkfluence
linko
linkscan/8.1a unix
linkwalker
lwp-trivial
lwp-trivial/1.34
manageo
mata hari
mediacompil
meltwater
mention
microsoft url control - 5.01.4511
microsoft url control - 6.00.8169
miixpc
miixpc/4.2
mister pix
mlbot
moget
moget/2.1
moreover
ms search 4.0 robot
ms search 5.0 robot
msiecrawler
mytwip
naverbot
netants
netattache
netmechanic
newscan-online
newsnow
newzbin
nicerspro
npbot
objectssearch
offline explorer
omgili
omgilibot
omigili
omigilibot
openfind
openindexspider
opinion-tracker
peer39_crawler
peer39_crawler/1.0
perplexitybot
pimptrain
propowerbot/2.14
prowebwalker
proxem
psbot
quepasacreep
queryn metasearch
qwam content intelligence
raven
readability.com
repomonkey
rma
scoop.it
score3
semrushbot
sightupbot
sindup
sitebot
sitecheck.internetseer.com
sitesnagger
sitesucker
sogou web spider
sosospider
spankbot
spanner
speedy
spotter
suggybot
superbot
superbot/2.6
suzuran
synthesio
szukacz/1.4
talkwalker
teleport
teleportpro
telesoft
the intraformant
thenomad
tighttwatbot
titan
tocrawl/urldispatcher
toscrawler
trendeo
trendybuzz
true_robot
true_robot/1.0
tunitinbot
turingos
turnitinbot
up2news
urlpouls
urly warning
vci
vecteurplus
verif
verticalsearch
vsw
wapspider
web image collector
webauto
webbandit
webbandit/3.50
webcopier
webcopy
webenhancer
webmasterworldforumbot
webmirror
webreaper
websauger
website extractor
website quester
webster pro
webstripper
webstripper/2.02
webzinger
webzip
wget
wikiofeedbot
winello
winhttrack
www-collector-e
xenu link sleuth/1.3.8
yacy
yandex
youbot
youmag
yrspider
zealbot
zeus
zite
zookabot
zyborg

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://worldcrunch.com/sitemap.xml
sitemap	https://worldcrunch.com/sitemap_news.xml
sitemap	https://worldcrunch.com/sitemap_video.xml
sitemap	https://worldcrunch.com/sitemap_sections.xml
sitemap	https://worldcrunch.com/sitemap_tags.xml

Field

Value

sitemap

https://worldcrunch.com/sitemap.xml

sitemap

https://worldcrunch.com/sitemap_news.xml

sitemap

https://worldcrunch.com/sitemap_video.xml

sitemap

https://worldcrunch.com/sitemap_sections.xml

sitemap

https://worldcrunch.com/sitemap_tags.xml

Back to top

Warnings

2 invalid lines.

Back to top

worldcrunch.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

Warnings

worldcrunch.com
robots.txt