omim.org
robots.txt

Robots Exclusion Standard data for omim.org

Resource Scan

Scan Details

Site Domain omim.org
Base Domain omim.org
Scan Status Ok
Last Scan2024-10-06T10:07:28+00:00
Next Scan 2024-11-05T10:07:28+00:00

Last Scan

Scanned2024-10-06T10:07:28+00:00
URL https://omim.org/robots.txt
Domain IPs 35.173.52.3
Response IP 35.173.52.3
Found Yes
Hash b598077a8ac37f924b56c927ff9d3818fe8188a2c5a509d232dba38b11c1a283
SimHash 3a9e031bc0d4

Groups

*

Rule Path
Allow /$
Disallow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 2

googlebot

Rule Path
Allow /$
Disallow /entry/$
Allow /entry/
Disallow /clinicalSynopsis/$
Allow /clinicalSynopsis/
Allow /about
Allow /help/
Allow /downloads
Allow /api
Disallow /

google-extended

Rule Path
Disallow /

bingbot

Rule Path
Allow /$
Disallow /entry/$
Allow /entry/
Disallow /clinicalSynopsis/$
Allow /clinicalSynopsis/
Allow /about
Allow /help/
Allow /downloads
Allow /api
Disallow /

duckduckbot

Rule Path
Allow /$
Disallow /entry/$
Allow /entry/
Disallow /clinicalSynopsis/$
Allow /clinicalSynopsis/
Allow /about
Allow /help/
Allow /downloads
Allow /api
Disallow /

applebot

Rule Path
Allow /$
Disallow /

ai2bot
ai2bot-dolma
amazonbot
anthropic-ai
applebot
applebot-extended
bytespider
ccbot
chatgpt-user
claude-web
claudebot
cohere-ai
diffbot
facebookbot
facebookexternalhit
friendlycrawler
google-extended
googleother
googleother-image
googleother-video
gptbot
iaskspider/2.0
icc-crawler
imagesiftbot
img2dataset
isscyberriskcrawler
kangaroo bot
meta-externalagent
meta-externalfetcher
oai-searchbot
omgili
omgilibot
perplexitybot
petalbot
scrapy
sidetrade indexer bot
timpibot
velenpublicwebcrawler
webzio-extended
youbot

Rule Path
Disallow /

Comments

  • CRAWLER WARNING
  • - The terms of service and the robots.txt file disallows crawling of this site,
  • please see https://omim.org/help/agreement for more information.
  • - A number of data files are available for download at https://omim.org/downloads.
  • - We have an API which you can learn about at https://omim.org/help/api and register
  • for at https://omim.org/api, this provides access to the data in JSON & XML formats.
  • - You should feel free to contact us at https://omim.org/contact to figure out the best
  • approach to getting the data you need for your work.
  • - WE WILL AUTOMATICALLY BLOCK YOUR IP ADDRESS IF YOU CRAWL THIS SITE.
  • - WE WILL ALSO AUTOMATICALLY BLOCK SUB-DOMAINS AND ADDRESS RANGES IMPLICATED IN
  • DISTRIBUTED CRAWLS OF THIS SITE.
  • CRAWLER WARNING
  • Robots.txt
  • Specification for robots.txt
  • https://developers.google.com/search/reference/robots_txt
  • https://www.robotstxt.org
  • Global - disallow everything except for home page
  • Crawl delay, every two seconds (not in the official spec)
  • Google - disallow everything except for specific paths
  • https://www.google.com/bot.html
  • https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
  • Google-Extended - disallow everything
  • Bing - disallow everything except for specific paths
  • https://www.bing.com/bingbot.htm
  • DuckDuckGo - disallow everything except for specific paths
  • https://duckduckgo.com/duckduckbot
  • Applebot - disallow everything except for home page
  • https://support.apple.com/en-us/119829
  • AI Crawlers - disallow everything
  • https://github.com/ai-robots-txt/ai.robots.txt