corestandards.org
robots.txt

Robots Exclusion Standard data for corestandards.org

Resource Scan

Scan Details

Site Domain corestandards.org
Base Domain corestandards.org
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-06-15T03:50:58+00:00
Next Scan 2024-06-29T03:50:58+00:00

Last Successful Scan

Scanned2024-05-08T02:58:32+00:00
URL http://corestandards.org/robots.txt
Domain IPs 104.26.8.64, 104.26.9.64, 172.67.74.142, 2606:4700:20::681a:840, 2606:4700:20::681a:940, 2606:4700:20::ac43:4a8e
Response IP 172.67.74.142
Found Yes
Hash 7ef9ca5668105e619daedff0fe33502daf81f63434afa6788d77effa5048c476
SimHash 39b55d216555

Groups

*

Rule Path
Disallow

Comments

  • ****************************************************************************
  • robots.txt
  • : Robots, spiders, and search engines use this file to detmine which
  • content they should *not* crawl while indexing your website.
  • : This system is called "The Robots Exclusion Standard."
  • : It is strongly encouraged to use a robots.txt validator to check
  • for valid syntax before any robots read it!
  • Examples:
  • Instruct all robots to stay out of the admin area.
  • : User-agent: *
  • : Disallow: /admin/
  • Restrict Google and MSN from indexing your images.
  • : User-agent: Googlebot
  • : Disallow: /images/
  • : User-agent: MSNBot
  • : Disallow: /images/
  • ****************************************************************************