/.well-known/

Log In Sign Up

corestandards.org
robots.txt

Robots Exclusion Standard data for corestandards.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	corestandards.org
Base Domain	corestandards.org
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Couldn't connect to server.
Last Scan	2024-06-15T03:50:58+00:00
Next Scan	2024-06-29T03:50:58+00:00

Last Successful Scan

Scanned	2024-05-08T02:58:32+00:00
URL	http://corestandards.org/robots.txt
Domain IPs	104.26.8.64, 104.26.9.64, 172.67.74.142, 2606:4700:20::681a:840, 2606:4700:20::681a:940, 2606:4700:20::ac43:4a8e
Response IP	172.67.74.142
Found	Yes
Hash	7ef9ca5668105e619daedff0fe33502daf81f63434afa6788d77effa5048c476
SimHash	39b55d216555

Groups

*

Rule

Path

Disallow

Back to top

Comments

****************************************************************************
robots.txt
: Robots, spiders, and search engines use this file to detmine which
content they should *not* crawl while indexing your website.
: This system is called "The Robots Exclusion Standard."
: It is strongly encouraged to use a robots.txt validator to check
for valid syntax before any robots read it!
Examples:
Instruct all robots to stay out of the admin area.
: User-agent: *
: Disallow: /admin/
Restrict Google and MSN from indexing your images.
: User-agent: Googlebot
: Disallow: /images/
: User-agent: MSNBot
: Disallow: /images/
****************************************************************************

Back to top