gts.kelbie.scot
robots.txt

Robots Exclusion Standard data for gts.kelbie.scot

Archived Snapshots

Resource Scan

Scan Details

Site Domain	gts.kelbie.scot
Base Domain	kelbie.scot
Scan Status	Ok
Last Scan	2024-06-12T11:12:09+00:00
Next Scan	2024-07-12T11:12:09+00:00

Last Scan

Scanned	2024-06-12T11:12:09+00:00
URL	https://gts.kelbie.scot/robots.txt
Domain IPs	183.176.41.205
Response IP	183.176.41.205
Found	Yes
Hash	dbb790911c6bc91ca16fc84f26194d75870d195a3f71d3b6e7e1043947c00281
SimHash	7a3acb1e8d4c

Groups

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

wellknownbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/api/
Disallow	/auth/
Disallow	/oauth/
Disallow	/check_your_email
Disallow	/wait_for_approval
Disallow	/account_disabled
Disallow	/.well-known/
Disallow	/fileserver/
Disallow	/users/
Disallow	/emoji/
Disallow	/admin
Disallow	/user
Disallow	/settings/
Disallow	/about/suspended

Rule

Path

Disallow

/api/

Disallow

/auth/

Disallow

/oauth/

Disallow

/check_your_email

Disallow

/wait_for_approval

Disallow

/account_disabled

Disallow

/.well-known/

Disallow

/fileserver/

Disallow

/users/

Disallow

/emoji/

Disallow

/admin

Disallow

/user

Disallow

/settings/

Disallow

/about/suspended

Other Records

Field	Value
crawl-delay	500

Field

Value

crawl-delay

500

Comments

GoToSocial robots.txt -- to edit, see internal/web/robots.go
More info @ https://developers.google.com/search/docs/crawling-indexing/robots/intro
Before we commence, a giant fuck you to ChatGPT in particular.
https://platform.openai.com/docs/gptbot
As of September 2023, GPTBot and ChatGPT-User are equivalent. But there's no telling
when OpenAI might decide to change that, so block this one too.
And a giant fuck you to Google Bard and their other generative AI ventures too.
https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
Block CommonCrawl. Used in training LLMs and specifically GPT-3.
https://commoncrawl.org/faq
Block Omgilike/Webz.io, a "Big Web Data" engine.
https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/
Block Faceboobot, because Meta.
https://developers.facebook.com/docs/sharing/bot
Well-known.dev crawler. Indexes stuff under /.well-known.
https://well-known.dev/about/
Block Amazonbot, because Amazon.
https://developer.amazon.com/amazonbot
Rules for everything else.
API endpoints.
Auth/Sign in endpoints.
Well-known endpoints.
Fileserver/media.
Fedi S2S API endpoints.
Settings panels.
Domain blocklist.

gts.kelbie.scotrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

gptbot

chatgpt-user

google-extended

ccbot

omgilibot

facebookbot

wellknownbot

amazonbot

*

Other Records

Comments

gts.kelbie.scot
robots.txt