gwern.net
robots.txt

Robots Exclusion Standard data for gwern.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	gwern.net
Base Domain	gwern.net
Scan Status	Ok
Last Scan	2025-11-24T09:11:09+00:00
Next Scan	2025-12-24T09:11:09+00:00

Last Scan

Scanned	2025-11-24T09:11:09+00:00
URL	https://gwern.net/robots.txt
Domain IPs	104.26.10.177, 104.26.11.177, 172.67.71.248, 2606:4700:20::681a:ab1, 2606:4700:20::681a:bb1, 2606:4700:20::ac43:47f8
Response IP	104.26.11.177
Found	Yes
Hash	b97e30a11dd5969b162bfbabff3637558cb2c26004376ac439897efb6b34bea4
SimHash	60200a3aced0

Groups

ia_archiver

Rule	Path
Disallow	/
Allow	/modafinil
Allow	/dnm-arrest

Rule

Path

Disallow

/

Allow

/modafinil

Allow

/dnm-arrest

*

Rule	Path
Disallow	/fulltext
Disallow	/*.md
Disallow	/*.md.html
Disallow	/static/..html
Disallow	/static/nginx/*
Disallow	/static/redirect/*
Disallow	/metadata/*
Disallow	/metadata/annotation/backlink/*
Disallow	/metadata/annotation/similar/*
Disallow	/metadata/annotation/link-bibliography/*
Disallow	/confidential/*
Disallow	/private/*
Disallow	/secret/*
Disallow	/doc/www/*

Rule

Path

Disallow

/fulltext

Disallow

/*.md

Disallow

/*.md.html

Disallow

/static/*.*.html

Disallow

/static/nginx/*

Disallow

/static/redirect/*

Disallow

/metadata/*

Disallow

/metadata/annotation/backlink/*

Disallow

/metadata/annotation/similar/*

Disallow

/metadata/annotation/link-bibliography/*

Disallow

/confidential/*

Disallow

/private/*

Disallow

/secret/*

Disallow

/doc/www/*

Back to top

Other Records

Field	Value
sitemap	https://gwern.net/sitemap.xml

Field

Value

sitemap

https://gwern.net/sitemap.xml

Back to top

Comments

Hide copies:
Duplicate content is bad for SEO (and clutters search results), so no Markdown sources, WWW archives, metadata snippets, or link-bibliography compilations.
disallow syntax-highlighted versions of source code as duplicates:
spurious google hits for filenames cluttering results:
I disallow doc/*/index pages because those keep cluttering up Google Scholar
Disallow: /doc/*/index
Allow: /doc/rotten.com/*

Back to top

gwern.netrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

ia_archiver

*

Other Records

Comments

gwern.net
robots.txt