warwick.ac.uk
robots.txt

Robots Exclusion Standard data for warwick.ac.uk

Archived Snapshots

Resource Scan

Scan Details

Site Domain	warwick.ac.uk
Base Domain	warwick.ac.uk
Scan Status	Ok
Last Scan	2024-09-24T17:25:21+00:00
Next Scan	2024-10-08T17:25:21+00:00

Last Scan

Scanned	2024-09-24T17:25:21+00:00
URL	https://warwick.ac.uk/robots.txt
Domain IPs	137.205.28.41
Response IP	137.205.28.41
Found	Yes
Hash	796665d34be286e1794d16b969ac7b71e1fe4f0d2cbee5aec0df564415d676f2
SimHash	0c82a003adb6

Groups

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

Rule	Path
Disallow	/training/
Disallow	/sitebuilder2/
Allow	/sitebuilder2/api/sitebuilder.ics
Allow	/sitebuilder2/api/gadgets/
Allow	/sitebuilder2/api/rss/
Allow	/sitebuilder2/api/sitemap/
Allow	/sitebuilder2/api/videoSitemap.xml
Allow	/sitebuilder2/file/*

Rule

Path

Disallow

/training/

Disallow

/sitebuilder2/

Allow

/sitebuilder2/api/sitebuilder.ics

Allow

/sitebuilder2/api/gadgets/

Allow

/sitebuilder2/api/rss/

Allow

/sitebuilder2/api/sitemap/

Allow

/sitebuilder2/api/videoSitemap.xml

Allow

/sitebuilder2/file/*

rogerbot

Rule	Path
Disallow	/services/sport/events/calendar/?
Disallow	/services/sport/news/?
Disallow	/services/sport/active/tennis/classes/?
Disallow	/services/sport/content-hub/feed/?
Disallow	/services/conferences/content-corner/?
Disallow	/services/conferences/news/?

Rule

Path

Disallow

/services/sport/events/calendar/*?*

Disallow

/services/sport/news/*?*

Disallow

/services/sport/active/tennis/classes/*?*

Disallow

/services/sport/content-hub/feed/*?*

Disallow

/services/conferences/content-corner/*?*

Disallow

/services/conferences/news/*?*

Back to top

Other Records

Field	Value
sitemap	https://warwick.ac.uk/sitebuilder2/api/sitemap/index.xml

Field

Value

sitemap

https://warwick.ac.uk/sitebuilder2/api/sitemap/index.xml

Back to top

Comments

robots.txt for https://warwick.ac.uk/
Apply to all user agents
Don't index the training pages to try and stop people who want to study architecture from applying here because Warwick doesn't offer an Architecture course
Explanation: https://twitter.com/matmannion/status/1146342325980975104
Disallow indexing of the CMS application itself as no useful content exists there for externals, with exclusions below
let google get ical feeds
Allow thumbnail images
Disallow query string variations of sports calendars/news
Disallow query string variations of conferences news

Back to top

warwick.ac.ukrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

gptbot

*

rogerbot

Other Records

Comments

warwick.ac.uk
robots.txt