aus.social
robots.txt

Robots Exclusion Standard data for aus.social

Resource Scan

Scan Details

Site Domain aus.social
Base Domain aus.social
Scan Status Ok
Last Scan2024-09-30T07:16:36+00:00
Next Scan 2024-10-01T07:16:36+00:00

Last Scan

Scanned2024-09-30T07:16:36+00:00
URL https://aus.social/robots.txt
Domain IPs 118.127.62.214, 2400:8100:3d:c000::139
Response IP 118.127.62.214
Found Yes
Hash 020704af2b0b882e206ac38f727a78445dce1883ac8d0b58e26881dc77ba1ca6
SimHash a27a134204ce

Groups

*

Rule Path
Disallow /media_proxy/
Disallow /interact/

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

oai-searchbot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

applebot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • AI Bots - Crawlers/scrapers and data harvesters. Blocking these should not negatively effect the end user.
  • OpenAI Scraper/Crawler/Assistant https://platform.openai.com/docs/bots
  • "GPTBot is used to make our generative AI foundation models more useful and safe. It is used to crawl content that may be used in training our generative AI foundation models"
  • "When users ask ChatGPT or a CustomGPT a question, it may visit a web page to help answer"
  • "OAI-SearchBot is used to link to and surface websites in search results in the SearchGPT prototype"
  • Amazon Alexa Crawler https://developer.amazon.com/amazonbot
  • "Amazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to answer even more questions for customers"
  • Apple Siri Crawler https://support.apple.com/en-au/HT204683
  • "Applebot is the web crawler for Apple. Products like Siri and Spotlight Suggestions use Applebot"
  • Apple AI models https://support.apple.com/en-au/119829
  • "Allowing Applebot-Extended will help improve the capabilities and quality of Apple’s generative AI models over time"
  • Common Crawl https://commoncrawl.org/ccbot
  • "democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible and analyzable by anyone"
  • Facebook AI models https://developers.facebook.com/docs/sharing/bot/
  • "FacebookBot crawls public web pages to improve language models for our speech recognition technology."
  • Facebook AI models https://developers.facebook.com/docs/sharing/webmasters/crawler
  • "training AI models or improving products by indexing content directly"
  • Googles AI models https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
  • "help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products"
  • Anthropic AI Modals https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
  • "Our mission to build safe and reliable frontier systems and advance the field of responsible AI development"