gcu.edu
robots.txt

Robots Exclusion Standard data for gcu.edu

Resource Scan

Scan Details

Site Domain gcu.edu
Base Domain gcu.edu
Scan Status Ok
Last Scan2025-09-25T09:05:03+00:00
Next Scan 2025-10-25T09:05:03+00:00

Last Scan

Scanned2025-09-25T09:05:03+00:00
URL https://gcu.edu/robots.txt
Redirect https://www.gcu.edu/robots.txt
Redirect Domain www.gcu.edu
Redirect Base gcu.edu
Domain IPs 104.16.2.115, 104.16.3.115, 2606:4700::6810:273, 2606:4700::6810:373
Redirect IPs 104.17.210.95, 104.17.211.95, 2606:4700::6811:d25f, 2606:4700::6811:d35f
Response IP 104.17.210.95
Found Yes
Hash 84139d3c5dc59411c43a75120c047fc850f7e69b7419ddefbcc60b1a53c0b67d
SimHash 3816151a4564

Groups

swiftbot

Rule Path
Disallow

*

Rule Path
Disallow /core/
Disallow /profiles/
Disallow /modules/
Disallow /web.config
Disallow /admin/
Disallow /comment/
Disallow /filter/
Disallow /search*
Disallow /user/
Disallow /node/
Disallow /cdn-cgi/
Disallow /blog/author/*?page=
Disallow /blog/tag/*?page=

Other Records

Field Value
sitemap https://www.gcu.edu/sitemap.xml

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/robotstxt.html