nutritionvalue.org
robots.txt

Robots Exclusion Standard data for nutritionvalue.org

Resource Scan

Scan Details

Site Domain nutritionvalue.org
Base Domain nutritionvalue.org
Scan Status Ok
Last Scan2024-11-16T06:55:30+00:00
Next Scan 2024-11-23T06:55:30+00:00

Last Scan

Scanned2024-11-16T06:55:30+00:00
URL https://nutritionvalue.org/robots.txt
Redirect https://www.nutritionvalue.org/robots.txt
Redirect Domain www.nutritionvalue.org
Redirect Base nutritionvalue.org
Domain IPs 104.21.46.66, 172.67.136.12, 2606:4700:3030::6815:2e42, 2606:4700:3033::ac43:880c
Redirect IPs 104.21.46.66, 172.67.136.12, 2606:4700:3030::6815:2e42, 2606:4700:3033::ac43:880c
Response IP 104.21.46.66
Found Yes
Hash 8b0e9722f9d77199ba41b6e15505d4b7e7cca7969a94065c985fb1b44986ec0c
SimHash 64768976cd1c

Groups

gptbot

Rule Path
Disallow /

*

Rule Path
Disallow /cite.php

Comments

  • Prevent msn from overwhealming the server, e.g some msn bot ips hit site 99558 per day in Feb 2015
  • Changed to any agent since mail.ru started to overload it as well
  • User-agent: *
  • Crawl-delay: 1
  • Prevent yandex from using too many resources
  • User-agent: yandex
  • Crawl-delay: 0.1
  • Prevent building site content in LLM