thenewstack.io
robots.txt

Robots Exclusion Standard data for thenewstack.io

Resource Scan

Scan Details

Site Domain thenewstack.io
Base Domain thenewstack.io
Scan Status Ok
Last Scan5/22/2025, 10:26:31 AM
Next Scan 6/21/2025, 10:26:31 AM

Last Scan

Scanned5/22/2025, 10:26:31 AM
URL https://thenewstack.io/robots.txt
Domain IPs 104.26.0.71, 104.26.1.71, 172.67.70.57, 2606:4700:20::681a:147, 2606:4700:20::681a:47, 2606:4700:20::ac43:4639
Response IP 172.67.70.57
Found Yes
Hash ec2550adf71d371a488c5357398525285fe9eb8b1fc073dedde1a4a93d7ca1c6
SimHash 67205941a430

Groups

*

Rule Path
Disallow /wp-admin/
Disallow /dls/
Disallow /e-books-stats/
Disallow /ebooks-subscribe/
Disallow /newsletter-sign-up-submit/
Disallow /?s$
Disallow /rss-feeds/*
Disallow /newsletter-archive/*
Disallow /sponsors/
Disallow /wp-login.php
Disallow /wp-content/themes/tns-2022/assets/
Disallow /wp-content/themes/tns-2022/data/
Disallow /assets/
Disallow /no-cache/
Disallow /cdn-cgi/
Allow /wp-admin/admin-ajax.php

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

seekr

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

Other Records

Field Value
sitemap https://thenewstack.io/sitemap_index.xml