technologyreview.com
robots.txt

Robots Exclusion Standard data for technologyreview.com

Resource Scan

Scan Details

Site Domain technologyreview.com
Base Domain technologyreview.com
Scan Status Ok
Last Scan2024-10-30T15:49:57+00:00
Next Scan 2024-11-06T15:49:57+00:00

Last Scan

Scanned2024-10-30T15:49:57+00:00
URL https://technologyreview.com/robots.txt
Redirect https://www.technologyreview.com/robots.txt
Redirect Domain www.technologyreview.com
Redirect Base technologyreview.com
Domain IPs 192.0.66.184
Redirect IPs 192.0.66.184
Response IP 192.0.66.184
Found Yes
Hash a9721d427ded99f85c20174a2c0d9218ffbffa4ea18c3a9b32af899aee27d4e0
SimHash 5854da200826

Groups

*

Rule Path
Disallow /wp-admin/
Allow /wp-admin/admin-ajax.php
Disallow /*.pdf$

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.technologyreview.com/sitemap.xml
sitemap https://www.technologyreview.com/news-sitemap.xml

Comments

  • OpenAI GPTBot crawler (https://platform.openai.com/docs/gptbot)
  • OpenAI ChatGPT service (https://platform.openai.com/docs/plugins/bot)
  • Common Crawl crawler (https://commoncrawl.org/faq)
  • Google Bard / Gemini crawler (https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)