abhrajit.com
robots.txt

Robots Exclusion Standard data for abhrajit.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	abhrajit.com
Base Domain	abhrajit.com
Scan Status	Ok
Last Scan	2025-10-31T22:53:56+00:00
Next Scan	2025-11-30T22:53:56+00:00

Last Scan

Scanned	2025-10-31T22:53:56+00:00
URL	https://abhrajit.com/robots.txt
Domain IPs	104.21.31.148, 172.67.177.203, 2606:4700:3030::6815:1f94, 2606:4700:3035::ac43:b1cb
Response IP	104.21.31.148
Found	Yes
Hash	a482e7d12889c0efdbb6e2e5706dfa2a0f8463be9acc565e22ae5b89e74f8fc7
SimHash	46b44b13c584

Groups

*

Rule	Path
Allow	/

Rule

Path

Allow

/

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

/

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot
claudebot
claude-user
claude-searchbot
ccbot
google-extended
applebot-extended
facebookbot
meta-externalagent
meta-externalfetcher
diffbot
perplexitybot
perplexity‑user
omgili
omgilibot
webzio-extended
imagesiftbot
bytespider
tiktokspider
amazonbot
youbot
semrushbot-ocob
petalbot
velenpublicwebcrawler
turnitinbot
timpibot
oai-searchbot
icc-crawler
ai2bot
ai2bot-dolma
dataforseobot
awariobot
awariosmartbot
awariorssbot
google-cloudvertexbot
pangubot
kangaroo bot
sentibot
img2dataset
meltwater
seekr
peer39_crawler
cohere-ai
cohere-training-data-crawler
duckassistbot
scrapy
cotoyogi
aihitbot
factset_spyderbot
firecrawlagent

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

Rule	Path
Allow	/

Rule

Path

Allow

/

Back to top

Comments

As a condition of accessing this website, you agree to abide by the following
content signals:
(a) If a content-signal = yes, you may collect content for the corresponding
use.
(b) If a content-signal = no, you may not collect content for the
corresponding use.
(c) If the website operator does not include a content signal for a
corresponding use, the website operator neither grants nor restricts
permission via content signal with respect to the corresponding use.
The content signals and their meanings are:
search: building a search index and providing search results (e.g., returning
hyperlinks and short excerpts from your website's contents). Search does not
include providing AI-generated search summaries.
ai-input: inputting content into one or more AI models (e.g., retrieval
augmented generation, grounding, or other real-time taking of content for
generative AI search answers).
ai-train: training or fine-tuning AI models.
ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
BEGIN Cloudflare Managed content
END Cloudflare Managed Content
Block all known AI crawlers and assistants
from using content for training AI models.
Source: https://robotstxt.com/ai
Block any non-specified AI crawlers (e.g., new
or unknown bots) from using content for training
AI models, while allowing the website to be
indexed and accessed by bots. These directives
are still experimental and may not be supported
by all AI crawlers.

Back to top

Warnings

`content-signal` is not a known field.
`content-usage` is not a known field.
`disallowaitraining` is not a known field.

Back to top

abhrajit.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

amazonbot

applebot-extended

bytespider

ccbot

claudebot

google-extended

gptbot

meta-externalagent

*

Comments

Warnings

abhrajit.com
robots.txt