ui.adsabs.harvard.edu
robots.txt

Robots Exclusion Standard data for ui.adsabs.harvard.edu

Resource Scan

Scan Details

Site Domain ui.adsabs.harvard.edu
Base Domain harvard.edu
Scan Status Ok
Last Scan2024-08-29T14:37:39+00:00
Next Scan 2024-09-28T14:37:39+00:00

Last Scan

Scanned2024-08-29T14:37:39+00:00
URL https://ui.adsabs.harvard.edu/robots.txt
Domain IPs 3.91.254.139, 52.3.170.185, 52.44.237.88
Response IP 52.44.237.88
Found Yes
Hash 92ff236b24f2f33c1d7d1d0ed372756490609c3de35fb4e9e754e89d2ca6ea7c
SimHash b254c52fdf27

Groups

*

Rule Path
Disallow /v1/
Disallow /resources
Disallow /core
Disallow /tugboat
Disallow /link_gateway/
Disallow /search/
Disallow /execute-query/
Disallow /status
Allow /help/
Allow /about/
Allow /blog/
Allow /abs/
Disallow /abs/*/citations
Disallow /abs/*/references
Disallow /abs/*/coreads
Disallow /abs/*/similar
Disallow /abs/*/toc
Disallow /abs/*/graphics
Disallow /abs/*/metrics
Disallow /abs/*/exportcitation

Other Records

Field Value
sitemap https://ui.adsabs.harvard.edu/sitemap/sitemap_index.xml

Comments

  • there may be a more elegant way to do this, but I fear that if we just use a
  • single regexp such as /abs/*/* we may miss out on indexing links containing
  • DOIs or arXiv ids, so we do it the pedantic way