aus.social
robots.txt

Robots Exclusion Standard data for aus.social

Archived Snapshots

Resource Scan

Scan Details

Site Domain	aus.social
Base Domain	aus.social
Scan Status	Ok
Last Scan	2024-11-04T21:23:58+00:00
Next Scan	2024-11-05T21:23:58+00:00

Last Scan

Scanned	2024-11-04T21:23:58+00:00
URL	https://aus.social/robots.txt
Domain IPs	118.127.62.214, 2400:8100:3d:c000::139
Response IP	118.127.62.214
Found	Yes
Hash	020704af2b0b882e206ac38f727a78445dce1883ac8d0b58e26881dc77ba1ca6
SimHash	a27a134204ce

Groups

*

Rule	Path
Disallow	/media_proxy/
Disallow	/interact/

Rule

Path

Disallow

/media_proxy/

Disallow

/interact/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

oai-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
AI Bots - Crawlers/scrapers and data harvesters. Blocking these should not negatively effect the end user.
OpenAI Scraper/Crawler/Assistant https://platform.openai.com/docs/bots
"GPTBot is used to make our generative AI foundation models more useful and safe. It is used to crawl content that may be used in training our generative AI foundation models"
"When users ask ChatGPT or a CustomGPT a question, it may visit a web page to help answer"
"OAI-SearchBot is used to link to and surface websites in search results in the SearchGPT prototype"
Amazon Alexa Crawler https://developer.amazon.com/amazonbot
"Amazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to answer even more questions for customers"
Apple Siri Crawler https://support.apple.com/en-au/HT204683
"Applebot is the web crawler for Apple. Products like Siri and Spotlight Suggestions use Applebot"
Apple AI models https://support.apple.com/en-au/119829
"Allowing Applebot-Extended will help improve the capabilities and quality of Appleâs generative AI models over time"
Common Crawl https://commoncrawl.org/ccbot
"democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable by anyone"
Facebook AI models https://developers.facebook.com/docs/sharing/bot/
"FacebookBot crawls public web pages to improve language models for our speech recognition technology."
Facebook AI models https://developers.facebook.com/docs/sharing/webmasters/crawler
"training AI models or improving products by indexing content directly"
Googles AI models https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
"help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products"
Anthropic AI Modals https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
"Our mission to build safe and reliable frontier systems and advance the field of responsible AI development"

aus.socialrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

gptbot

chatgpt-user

oai-searchbot

amazonbot

applebot

applebot-extended

ccbot

facebookbot

meta-externalagent

google-extended

anthropic-ai

claudebot

claude-web

Comments

aus.social
robots.txt