craigdoesdata.com
robots.txt

Robots Exclusion Standard data for craigdoesdata.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	craigdoesdata.com
Base Domain	craigdoesdata.com
Scan Status	Ok
Last Scan	2025-06-06T12:16:46+00:00
Next Scan	2025-06-13T12:16:46+00:00

Last Scan

Scanned	2025-06-06T12:16:46+00:00
URL	https://craigdoesdata.com/robots.txt
Redirect	https://www.craigdoesdata.com/robots.txt
Redirect Domain	www.craigdoesdata.com
Redirect Base	craigdoesdata.com
Domain IPs	2406:da18:9d0:143f:2124:4e9c:36a9:d9de, 52.221.42.138
Redirect IPs	104.21.58.48, 172.67.200.121, 2606:4700:3031::ac43:c879, 2606:4700:3035::6815:3a30
Response IP	104.21.58.48
Found	Yes
Hash	581f42701dcdfac856ef5daea6fb58813689930095e9b305c4ab47ec10592ff9
SimHash	a14d90125f20

Groups

*

Rule	Path
Allow	/
Allow	/blog/
Allow	/resources/
Allow	/projects.html
Allow	/about.html
Allow	/services.html
Allow	/cv.html
Allow	/contact.html
Allow	/certs.html
Disallow	/.git/
Disallow	/github-widget/
Disallow	/santa/
Disallow	/css/timeline.css
Disallow	/custom.css
Disallow	/media/certs/
Disallow	/media/income.mp4
Disallow	/media/race.gif
Disallow	/?

Rule

Path

Allow

/

Allow

/blog/

Allow

/resources/

Allow

/projects.html

Allow

/about.html

Allow

/services.html

Allow

/cv.html

Allow

/contact.html

Allow

/certs.html

Disallow

/.git/

Disallow

/github-widget/

Disallow

/santa/

Disallow

/css/timeline.css

Disallow

/custom.css

Disallow

/media/certs/

Disallow

/media/income.mp4

Disallow

/media/race.gif

Disallow

/*?*

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

10

Back to top

Other Records

Field	Value
sitemap	https://craigdoesdata.com/sitemap.xml

Field

Value

sitemap

https://craigdoesdata.com/sitemap.xml

Back to top

Comments

Optimize crawl budget by disallowing development files and directories
Prevent duplicate content
Sitemap
Crawl-delay

Back to top

craigdoesdata.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

Other Records

Comments

craigdoesdata.com
robots.txt