debianforum.de
robots.txt

Robots Exclusion Standard data for debianforum.de

Resource Scan

Scan Details

Site Domain debianforum.de
Base Domain debianforum.de
Scan Status Ok
Last Scan2025-11-26T02:35:01+00:00
Next Scan 2025-12-26T02:35:01+00:00

Last Scan

Scanned2025-11-26T02:35:01+00:00
URL https://debianforum.de/robots.txt
Domain IPs 142.132.203.155, 2a01:4f8:261:4fe1::2
Response IP 142.132.203.155
Found Yes
Hash 307dd871adbca759c9cdfb8117eef5bd26c750a22613cf788d34dcc728b2b462
SimHash 2666d5d3cf72

Groups

*

Rule Path
Disallow /pics/
Disallow /seti-at-home/
Disallow /r
Disallow /misc/
Disallow /impressum/
Disallow /guides/
Disallow /old/
Disallow /webalizer/
Disallow /webalizer.old
Disallow /forum/admin/
Disallow /forum/db/
Disallow /forum/images/
Disallow /forum/includes/
Disallow /forum/language/
Disallow /forum/templates/
Disallow /forum/common.php
Disallow /forum/groupcp.php
Disallow /forum/faq.php
Disallow /forum/privmsg.php
Disallow /forum/profile.php
Disallow /forum/groupcp.php
Disallow /forum/viewonline.php
Disallow /forum/printview.php
Disallow /forum/modcp.php
Disallow /forum/login.php
Disallow /wiki/admin/
Disallow /forum/memberlist.php
Disallow /forum/search.php
Disallow /forum/ucp.php
Disallow /forum/posting.php
Disallow /forum/report.php
Disallow /forum/viewonline.php
Disallow /forum/download.php
Disallow /w2/

fasterfox

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

img2dataset

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

perplexity‑user

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

Comments

  • The Common Crawl dataset. Original source for GPT and others.
  • The example for img2dataset, although the default is *None*
  • GPTBot is OpenAI's web crawler
  • ChatGPT-User takes direct actions on behalf of ChatGPT users
  • Google's Bard and Vertex AI generative APIs
  • Speculative blocks for Anthropic
  • webz.io - they sell data for training LLMs.
  • Meta's bot that crawls public web pages to improve language models
  • ByteDance's bot used to gather data for their LLMs, including Doubao.
  • Brandwatch - "AI to discover new trends"