physorg.com
robots.txt

Robots Exclusion Standard data for physorg.com

Resource Scan

Scan Details

Site Domain physorg.com
Base Domain physorg.com
Scan Status Ok
Last Scan2024-09-23T18:54:02+00:00
Next Scan 2024-09-30T18:54:02+00:00

Last Scan

Scanned2024-09-23T18:54:02+00:00
URL http://physorg.com/robots.txt
Redirect https://phys.org/robots.txt
Redirect Domain phys.org
Redirect Base phys.org
Domain IPs 72.251.236.55
Redirect IPs 2001:48c8:13:5::52, 72.251.233.232
Response IP 72.251.233.232
Found Yes
Hash 6e438f7d71f79940e13e0da913f9eaf797ecd7e603cca71a9825287526e9f756
SimHash 301c5843e0e3

Groups

*

Rule Path
Allow /
Disallow /search/
Disallow /rss-feed/search/
Disallow /rss-feed/breaking/search/
Disallow /rss-feed/tags/
Disallow /*/sort/

claudebot

Rule Path
Disallow /news/
Disallow /partners/
Disallow /journals/
Disallow /tags/

gptbot

Rule Path
Disallow /news/
Disallow /partners/
Disallow /journals/
Disallow /tags/

facebookbot

Rule Path
Disallow /news/
Disallow /partners/
Disallow /journals/
Disallow /tags/

ahrefsbot

Rule Path
Disallow /tags/

ccbot

Rule Path
Disallow /news/
Disallow /partners/
Disallow /journals/
Disallow /tags/

bytespider

Rule Path
Disallow /news/
Disallow /partners/
Disallow /journals/
Disallow /tags/

cohere-ai

Rule Path
Disallow /news/
Disallow /partners/
Disallow /journals/
Disallow /tags/

google-extended

Rule Path
Disallow /news/
Disallow /tags/

applebot-extended

Rule Path
Disallow /news/
Disallow /tags/

friendlycrawler

Rule Path
Disallow /

chatglm-spider

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://phys.org/sitemap/indx/