skuce.com
robots.txt

Robots Exclusion Standard data for skuce.com

Resource Scan

Scan Details

Site Domain skuce.com
Base Domain skuce.com
Scan Status Ok
Last Scan2024-11-14T04:05:48+00:00
Next Scan 2024-12-14T04:05:48+00:00

Last Scan

Scanned2024-11-14T04:05:48+00:00
URL http://skuce.com/robots.txt
Domain IPs 107.180.46.218
Response IP 107.180.46.218
Found Yes
Hash 2ede6b463df82b1358063137fd5eed8b9a5d5b1e7278830823cd5eb0e3d8432b
SimHash a50e4e803342

Groups

*

Rule Path
Disallow /images/
Disallow /photos/
Disallow /stats/
Disallow /work/
Disallow /thetoque/
Disallow /genealogy/data/
Disallow /genealogy/data2/
Disallow /genealogy/bdm/
Disallow /genealogy/burial/
Disallow /genealogy/immigration/
Disallow /genealogy/gen-ireland.html
Disallow /genealogy/gen-datadump.html
Disallow /genealogy/dna.html
Disallow /about.html

Other Records

Field Value
crawl-delay 5

soso
yandex
baiduspider
sogou

Rule Path
Disallow /

plsearch

Rule Path
Disallow /

myfamilybot
werelate

Rule Path
Disallow /

panscient
findestars
myonid
peekyou
pipl
piplbot
rapleaf
snitch
spock
tweepz
wink
yasni
yoname
yourtraces
zoominfo
personworld
yatedo
waatp
kpopmusic
pplsorce
name-list
plsearch

Rule Path
Disallow /

picsbox.biz

Rule Path
Disallow /

Comments

  • /robots.txt file for http://www.skuce.com/
  • Search Engines
  • Rogue Search Engines (may ignore robots.txt)
  • Genealogy Site Scrapers
  • People Search Engines
  • Image Scrapers

Warnings

  • 1 invalid line.