ui.adsabs.harvard.edu
robots.txt

Robots Exclusion Standard data for ui.adsabs.harvard.edu

Archived Snapshots

Resource Scan

Scan Details

Site Domain	ui.adsabs.harvard.edu
Base Domain	harvard.edu
Scan Status	Ok
Last Scan	2024-08-29T14:37:39+00:00
Next Scan	2024-09-28T14:37:39+00:00

Last Scan

Scanned	2024-08-29T14:37:39+00:00
URL	https://ui.adsabs.harvard.edu/robots.txt
Domain IPs	3.91.254.139, 52.3.170.185, 52.44.237.88
Response IP	52.44.237.88
Found	Yes
Hash	92ff236b24f2f33c1d7d1d0ed372756490609c3de35fb4e9e754e89d2ca6ea7c
SimHash	b254c52fdf27

Groups

*

Rule	Path
Disallow	/v1/
Disallow	/resources
Disallow	/core
Disallow	/tugboat
Disallow	/link_gateway/
Disallow	/search/
Disallow	/execute-query/
Disallow	/status
Allow	/help/
Allow	/about/
Allow	/blog/
Allow	/abs/
Disallow	/abs/*/citations
Disallow	/abs/*/references
Disallow	/abs/*/coreads
Disallow	/abs/*/similar
Disallow	/abs/*/toc
Disallow	/abs/*/graphics
Disallow	/abs/*/metrics
Disallow	/abs/*/exportcitation

Rule

Path

Disallow

/v1/

Disallow

/resources

Disallow

/core

Disallow

/tugboat

Disallow

/link_gateway/

Disallow

/search/

Disallow

/execute-query/

Disallow

/status

Allow

/help/

Allow

/about/

Allow

/blog/

Allow

/abs/

Disallow

/abs/*/citations

Disallow

/abs/*/references

Disallow

/abs/*/coreads

Disallow

/abs/*/similar

Disallow

/abs/*/toc

Disallow

/abs/*/graphics

Disallow

/abs/*/metrics

Disallow

/abs/*/exportcitation

Back to top

Other Records

Field	Value
sitemap	https://ui.adsabs.harvard.edu/sitemap/sitemap_index.xml

Field

Value

sitemap

https://ui.adsabs.harvard.edu/sitemap/sitemap_index.xml

Back to top

Comments

there may be a more elegant way to do this, but I fear that if we just use a
single regexp such as /abs/*/* we may miss out on indexing links containing
DOIs or arXiv ids, so we do it the pedantic way

Back to top

ui.adsabs.harvard.edurobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

Comments

ui.adsabs.harvard.edu
robots.txt