admall.com
robots.txt

Robots Exclusion Standard data for admall.com

Resource Scan

Scan Details

Site Domain admall.com
Base Domain admall.com
Scan Status Ok
Last Scan2026-01-04T06:54:51+00:00
Next Scan 2026-02-03T06:54:51+00:00

Last Scan

Scanned2026-01-04T06:54:51+00:00
URL https://admall.com/robots.txt
Domain IPs 104.21.26.184, 172.67.138.88, 2606:4700:3035::ac43:8a58, 2606:4700:3037::6815:1ab8
Response IP 104.21.26.184
Found Yes
Hash 5f32c5e3762074ae124bab3a91019450f3ce5eb22c40aa7f622afedbac98893f
SimHash 0272d7558f66

Groups

*

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

img2dataset

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

Comments

  • The Common Crawl dataset. Original source for GPT and others.
  • The example for img2dataset, although the default is *None*
  • GPTBot is OpenAI's web crawler
  • ChatGPT-User takes direct actions on behalf of ChatGPT users
  • Google's Bard and Vertex AI generative APIs
  • Speculative blocks for Anthropic
  • webz.io - they sell data for training LLMs.
  • Meta's bot that crawls public web pages to improve language models
  • ByteDance's bot used to gather data for their LLMs, including Doubao.
  • Brandwatch - "AI to discover new trends"