ulearn.photography
robots.txt

Robots Exclusion Standard data for ulearn.photography

Resource Scan

Scan Details

Site Domain ulearn.photography
Base Domain ulearn.photography
Scan Status Ok
Last Scan2024-10-11T16:50:18+00:00
Next Scan 2024-11-10T16:50:18+00:00

Last Scan

Scanned2024-10-11T16:50:18+00:00
URL https://ulearn.photography/robots.txt
Domain IPs 91.109.5.76
Response IP 91.109.5.76
Found Yes
Hash f450c86c5badc39a490e2c45d5132d16ebef8ca5c64ca6053f42197f7c3dc580
SimHash 1b57ddf2cb1d

Groups

ahrefsbot
baiduspider
blexbot
emailcollector
emailsiphon
emailwolf
ezooms
ia_archiver
linkedinbot
mj12bot
msiecrawler
msnbot
netvibes
nutch
offline explorer
offline.explorer
pgbot
pingdom
psbot
relcybot
scoutjet
seznambot
sitesnagger
slurp
sogou
sosobot
sougou
teleport
teleport pro
teoma
twitterbot
webcopier
webstripper
yandex
yandexantivirus
yandexblogs
yandexbot
yandexcatalog
yandexdirect
yandexfavicons
yandeximages
yandexmedia
yandexnews
yandexpagechecker
yandexvideo
yandexwebmaster
yandexzakladki

Rule Path
Disallow /

Comments

  • Generic robots.txt to stop the worst:
  • (1) stops big chinese/russian crawlers
  • (2) stops email scanners
  • (3) stops web copiers
  • This list is not complete or exhaustive at all, the whole idea is just stop the worst offenders