scialert.net
robots.txt

Robots Exclusion Standard data for scialert.net

Resource Scan

Scan Details

Site Domain scialert.net
Base Domain scialert.net
Scan Status Ok
Last Scan2024-06-13T22:28:49+00:00
Next Scan 2024-06-20T22:28:49+00:00

Last Scan

Scanned2024-06-13T22:28:49+00:00
URL https://scialert.net/robots.txt
Domain IPs 104.26.8.86, 104.26.9.86, 172.67.74.49, 2606:4700:20::681a:856, 2606:4700:20::681a:956, 2606:4700:20::ac43:4a31
Response IP 172.67.74.49
Found Yes
Hash f6f62042eee4bb9a87231822d036515e5acd015512aedf0d40bca9c45565a161
SimHash 50148b408132

Groups

*

Rule Path
Allow /
Allow /sitemap.html

petalbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

sogou web spider

Rule Path
Disallow /

sogou inst spider

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://scialert.net/sitemaps.xml

Comments

  • Block problem bots
  • User-agent: Baiduspider
  • User-agent: 360Spider
  • User-agent: Yisouspider
  • User-agent: Amazonbot
  • Block OpenAI
  • Block Google Bard AI
  • User-agent: Google-Extended
  • Disallow: /
  • Block Common Crawl AI scraper
  • Block Perplexity AI
  • Block other misc AI scrapers

Warnings

  • `clean-param` is not a known field.