/.well-known/

Log In Sign Up

brightguy.com
robots.txt

Robots Exclusion Standard data for brightguy.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	brightguy.com
Base Domain	brightguy.com
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2025-07-01T05:27:11+00:00
Next Scan	2025-09-29T05:27:11+00:00

Last Successful Scan

Scanned	2024-08-12T06:43:29+00:00
URL	https://brightguy.com/robots.txt
Domain IPs	104.26.12.228, 104.26.13.228, 172.67.75.53, 2606:4700:20::681a:ce4, 2606:4700:20::681a:de4, 2606:4700:20::ac43:4b35
Response IP	172.67.75.53
Found	Yes
Hash	7c60bfe0089d89f95b8fe6b9f6b5421c85a047d6031833df3ef5176e85a2af27
SimHash	29941d216555

Groups

scrapy

Rule

Path

Allow

/

*

Rule

Path

Disallow

Back to top

Comments

****************************************************************************
robots.txt
: Robots, spiders, and search engines use this file to detmine which
content they should *not* crawl while indexing your website.
: This system is called "The Robots Exclusion Standard."
: It is strongly encouraged to use a robots.txt validator to check
for valid syntax before any robots read it!
Examples:
Instruct all robots to stay out of the admin area.
: User-agent: *
: Disallow: /admin/
Restrict Google and MSN from indexing your images.
: User-agent: Googlebot
: Disallow: /images/
: User-agent: MSNBot
: Disallow: /images/
****************************************************************************

Back to top