uwtsd.ac.uk
robots.txt

Robots Exclusion Standard data for uwtsd.ac.uk

Resource Scan

Scan Details

Site Domain uwtsd.ac.uk
Base Domain uwtsd.ac.uk
Scan Status Ok
Last Scan2026-02-09T22:24:42+00:00
Next Scan 2026-02-23T22:24:42+00:00

Last Scan

Scanned2026-02-09T22:24:42+00:00
URL https://uwtsd.ac.uk/robots.txt
Redirect https://www.uwtsd.ac.uk/robots.txt
Redirect Domain www.uwtsd.ac.uk
Redirect Base uwtsd.ac.uk
Domain IPs 52.211.85.12
Redirect IPs 13.35.37.108, 13.35.37.109, 13.35.37.43, 13.35.37.77, 2600:9000:213e:1200:1c:892c:efc0:93a1, 2600:9000:213e:7000:1c:892c:efc0:93a1, 2600:9000:213e:a400:1c:892c:efc0:93a1, 2600:9000:213e:b400:1c:892c:efc0:93a1, 2600:9000:213e:d200:1c:892c:efc0:93a1, 2600:9000:213e:de00:1c:892c:efc0:93a1, 2600:9000:213e:ea00:1c:892c:efc0:93a1, 2600:9000:213e:f800:1c:892c:efc0:93a1
Response IP 13.35.37.43
Found Yes
Hash 1a98f4a74e17ce68389de1b97f16b21da387add110c91687f01010807e1c4eed
SimHash 2d14bbd38640

Groups

*

Rule Path
Allow /core/*.css$
Allow /core/*.css?
Allow /core/*.js$
Allow /core/*.js?
Allow /core/*.gif
Allow /core/*.jpg
Allow /core/*.jpeg
Allow /core/*.png
Allow /core/*.svg
Allow /profiles/*.css$
Allow /profiles/*.css?
Allow /profiles/*.js$
Allow /profiles/*.js?
Allow /profiles/*.gif
Allow /profiles/*.jpg
Allow /profiles/*.jpeg
Allow /profiles/*.png
Allow /profiles/*.svg
Disallow /core/
Disallow /profiles/
Disallow /README.txt
Disallow /web.config
Disallow /admin/
Disallow /comment/reply/
Disallow /filter/tips
Disallow /node/add/
Disallow /search/
Disallow /user/register/
Disallow /user/password/
Disallow /user/login/
Disallow /user/logout/
Disallow /media/oembed
Disallow /*/media/oembed
Disallow /index.php/admin/
Disallow /index.php/comment/reply/
Disallow /index.php/filter/tips
Disallow /index.php/node/add/
Disallow /index.php/search/
Disallow /index.php/user/password/
Disallow /index.php/user/register/
Disallow /index.php/user/login/
Disallow /index.php/user/logout/
Disallow /index.php/media/oembed
Disallow /index.php/*/media/oembed
Allow /programme-search
Disallow /programme-search*
Allow /cy/programme-search
Disallow /cy/programme-search*
Disallow /search?query=*

gptbot

Rule Path
Allow /

oai-searchbot

Rule Path
Allow /

chatgpt-user

Rule Path
Allow /

google-extended

Rule Path
Allow /

perplexitybot

Rule Path
Allow /

perplexity-user

Rule Path
Allow /

claude-searchbot

Rule Path
Allow /

claude-user

Rule Path
Allow /

ccbot

Rule Path
Allow /

omgili

Rule Path
Allow /

omgilibot

Rule Path
Allow /

facebookbot

Rule Path
Allow /

Other Records

Field Value
sitemap https://www.UWTSD.ac.uk/sitemap.xml

Comments

  • --------------------------------------------------
  • robots.txt — Optimised for SEO + AI Search visibility
  • Maintains secure/admin blocks + asset rendering,
  • while allowing major AI/search crawlers to access public content.
  • --------------------------------------------------
  • --- Allow essential assets for proper rendering ---
  • --- Disallow internal/system directories & files ---
  • --- Paths (clean URLs) ---
  • --- Paths (no clean URLs) ---
  • --- Programme search (budget control) ---
  • --------------------------------------------------
  • AI / Next-gen Search Crawlers — ALLOW
  • --------------------------------------------------
  • --------------------------------------------------
  • Sitemaps
  • --------------------------------------------------