archiveshub.jisc.ac.uk
robots.txt

Robots Exclusion Standard data for archiveshub.jisc.ac.uk

Resource Scan

Scan Details

Site Domain archiveshub.jisc.ac.uk
Base Domain jisc.ac.uk
Scan Status Ok
Last Scan2024-05-31T12:32:31+00:00
Next Scan 2024-06-30T12:32:31+00:00

Last Scan

Scanned2024-05-31T12:32:31+00:00
URL https://archiveshub.jisc.ac.uk/robots.txt
Domain IPs 54.247.10.115, 63.34.205.202
Response IP 54.247.10.115
Found Yes
Hash bcec2f70478e59b97f19e9178df4ce4c29f64f3e20ef029d8038870120abd1e0
SimHash 606cc851a2ea

Groups

*

Rule Path
Disallow /manchesteruniversity/
Disallow /glaas/
Disallow /designarchives/
Disallow /bruneluniversity/
Disallow /uel/
Disallow /kingstonuniversity/
Disallow /universityofportsmouth/
Disallow /salforduniversity/
Disallow /rcpsg/
Disallow /newcastleuniversity/
Disallow /southamptonuniversity/
Disallow /hattongallery/

semrushbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

vegi bot

Rule Path
Disallow /

superfeedr

Rule Path
Disallow /

bubing

Rule Path
Disallow /

velenpublicwebcrawler

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

gluten free crawler

Rule Path
Disallow /

yandex

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

ccbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

garlikcrawler

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 30

bingbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

Other Records

Field Value
sitemap https://archiveshub.jisc.ac.uk/sitemap_index.xml

Comments

  • Block crawling of Microsites as is duplicate content
  • Bots block and crawl delays last updated 9th Feb 2018
  • Block some bots
  • Crawl delay some bots