allgemeine-zeitung.de
robots.txt

Robots Exclusion Standard data for allgemeine-zeitung.de

Archived Snapshots

Resource Scan

Scan Details

Site Domain	allgemeine-zeitung.de
Base Domain	allgemeine-zeitung.de
Scan Status	Ok
Last Scan	2024-06-11T05:09:40+00:00
Next Scan	2024-06-18T05:09:40+00:00

Last Scan

Scanned	2024-06-11T05:09:40+00:00
URL	https://allgemeine-zeitung.de/robots.txt
Redirect	https://www.allgemeine-zeitung.de/robots.txt
Redirect Domain	www.allgemeine-zeitung.de
Redirect Base	allgemeine-zeitung.de
Domain IPs	75.2.84.18, 99.83.234.173
Redirect IPs	18.193.10.14, 52.59.126.207
Response IP	52.59.126.207
Found	Yes
Hash	5dc3467c1c22a5be2760ff0d865c871f531fee82f90cd14c3b3a8948666d6d63
SimHash	a8bcd162cd74

Groups

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

Rule	Path
Disallow	/?egy_cid=

Rule

Path

Disallow

/*?egy_cid=*

*

Rule	Path
Disallow	/api/
Disallow	/archive/
Disallow	/na/
Disallow	/service/profil/

Rule

Path

Disallow

/api/

Disallow

/archive/

Disallow

/na/

Disallow

/service/profil/

Back to top

Other Records

Field	Value
sitemap	https://www.allgemeine-zeitung.de/index-sitemap.xml

Field

Value

sitemap

https://www.allgemeine-zeitung.de/index-sitemap.xml

Back to top

Comments

ChatGPT Plugins
Common Crawl
Bard
Meta’s bot that crawls public web pages to improve language models for their speech recognition technology
Used for several purposes, apparently also selling crawled data to LLM companies (http://omgili.com/crawler.html)
urls with ?egy_cid= are not crawled

Back to top

allgemeine-zeitung.derobots.txt

Resource Scan

Scan Details

Last Scan

Groups

chatgpt-user

gptbot

ccbot

google-extended

facebookbot

omgilibot

omgili

*

*

Other Records

Comments

allgemeine-zeitung.de
robots.txt