caasindia.in
robots.txt

Robots Exclusion Standard data for caasindia.in

Resource Scan

Scan Details

Site Domain caasindia.in
Base Domain caasindia.in
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2025-12-03T13:57:25+00:00
Next Scan 2026-01-02T13:57:25+00:00

Last Successful Scan

Scanned2025-11-02T19:35:19+00:00
URL https://caasindia.in/robots.txt
Redirect https://www.caasindia.in/robots.txt
Redirect Domain www.caasindia.in
Redirect Base caasindia.in
Domain IPs 2a02:4780:15:9124:ce5a:7f7b:2718:1a04, 2a02:4780:38:9f9c:5cd:1779:2d2d:17d3, 77.37.75.101, 93.127.201.152
Redirect IPs 2a02:4780:15:e40c:3e43:2c3e:225:b570, 2a02:4780:38:ad79:e83c:1326:62d6:2572, 84.32.84.115, 84.32.84.69
Response IP 77.37.48.145
Found Yes
Hash ffc2681c5886981fb63d79d16de4dba2ea7b85879bcdb909be976aeeedb8d3a9
SimHash 2030cd00a5f2

Groups

googlebot

Rule Path
Allow /

googlebot-image

Rule Path
Allow /

googlebot-news

Rule Path
Allow /

bingbot

Rule Path
Allow /

slurp

Rule Path
Allow /

duckduckbot

Rule Path
Allow /

baiduspider

Rule Path
Allow /

yandexbot

Rule Path
Allow /

google-extended

Rule Path
Allow /

gptbot

Rule Path
Allow /

bingpreview

Rule Path
Allow /

*

Rule Path
Disallow /wp-admin/
Allow /wp-admin/admin-ajax.php
Allow /wp-includes/js/
Allow /wp-includes/css/
Allow /wp-content/uploads/
Allow /wp-content/litespeed/
Allow /ads.txt
Allow /robots.txt
Allow /favicon.ico

amazonbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

twitterbot

Rule Path
Allow /

facebookexternalhit

Rule Path
Allow /

Other Records

Field Value
sitemap https://www.caasindia.in/sitemap_index.xml
sitemap https://www.caasindia.in/page-sitemap.xml
sitemap https://www.caasindia.in/category-sitemap.xml
sitemap https://www.caasindia.in/post-sitemap.xml
sitemap https://www.caasindia.in/news-sitemap-test.xml

Comments

  • =========================================
  • robots.txt for CAAS India - Leading Health News Website
  • Purpose: Allow SEO-friendly crawling + trusted AI bots
  • Last Updated: 2025-09-29
  • =========================================
  • --- Major search & news crawlers ---
  • --- Trusted AI bots (Allow) ---
  • --- Default rules for all other crawlers ---
  • --- Sitemap locations ---
  • --- Block unwanted AI / bad bots ---
  • --- Social media preview bots ---