linuxcapable.com
robots.txt

Robots Exclusion Standard data for linuxcapable.com

Resource Scan

Scan Details

Site Domain linuxcapable.com
Base Domain linuxcapable.com
Scan Status Ok
Last Scan2024-06-24T17:48:49+00:00
Next Scan 2024-07-01T17:48:49+00:00

Last Scan

Scanned2024-06-24T17:48:49+00:00
URL https://linuxcapable.com/robots.txt
Domain IPs 104.26.4.191, 104.26.5.191, 172.67.69.138, 2606:4700:20::681a:4bf, 2606:4700:20::681a:5bf, 2606:4700:20::ac43:458a
Response IP 104.26.5.191
Found Yes
Hash 2baa79bf696484054e3431e085f89cc39b88b262368e540acbc85c452cbe3263
SimHash 5b10515787b3

Groups

*

Rule Path
Allow /wp-admin/admin-ajax.php
Disallow /wp-admin/
Disallow /wp-login.php
Disallow /xmlrpc.php

anthropic-ai

Rule Path
Disallow /

archive.org_bot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

cocolyzebot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

httrack

Rule Path
Disallow /

httrack 3.0

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

ia_archiver-web.archive.org

Rule Path
Disallow /

rssingbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://linuxcapable.com/sitemap_index.xml