crn.de
robots.txt

Robots Exclusion Standard data for crn.de

Resource Scan

Scan Details

Site Domain crn.de
Base Domain crn.de
Scan Status Ok
Last Scan2024-09-27T19:52:00+00:00
Next Scan 2024-10-04T19:52:00+00:00

Last Scan

Scanned2024-09-27T19:52:00+00:00
URL https://crn.de/robots.txt
Redirect https://www.crn.de/robots.txt
Redirect Domain www.crn.de
Redirect Base crn.de
Domain IPs 104.18.2.46, 104.18.3.46, 2606:4700::6812:22e, 2606:4700::6812:32e
Redirect IPs 104.18.2.46, 104.18.3.46, 2606:4700::6812:22e, 2606:4700::6812:32e
Response IP 104.18.3.46
Found Yes
Hash 34bd9552f5f7f9ea95ec7f69dfaadedff1b9844dc26ece9a92b3c98449a7e202
SimHash 2156f252c831

Groups

*

Rule Path
Allow /

msnbot

Rule Path
Allow /

slurp

Rule Path
Allow /

teoma

Rule Path
Allow /

gigabot

Rule Path
Allow /

robozilla

Rule Path
Allow /

nutch

Rule Path
Allow /

ia_archiver

Rule Path
Allow /

baiduspider

Rule Path
Allow /

naverbot

Rule Path
Allow /

yeti

Rule Path
Allow /

yahoo-mmcrawler

Rule Path
Allow /

psbot

Rule Path
Allow /

yahoo-blogs/v3.9

Rule Path
Allow /
Allow /cgi-bin/

ahrefsbot
compspybot
crystalsemanticsbot
curious george
cybeye.com
daumoa
docomo
exb language crawler
ezooms
flamingo_searchengine
genieo
genio
gsa-crawler
lexxebot
libcrawl
linkdex
lwnutch
magpie-crawler
meltwater
mnogosearch
omgilibot/0.3
openwebindex
psbot
rediffnewsbot
repparser
scanmine
seoengworldbot
shopwiki
showyoubot
sindice-site-manager
sogou
sogou spider
sosospider
webvac
wocbot
woriobot
yacybot
yeti
yolinkbot_text
youdaobot

Rule Path
Disallow /

Comments

  • Sitemap declarations
  • Fully exclude these robots from crawling anything