arxiv.org
robots.txt

Robots Exclusion Standard data for arxiv.org

Resource Scan

Scan Details

Site Domain arxiv.org
Base Domain arxiv.org
Scan Status Ok
Last Scan2024-05-22T13:21:02+00:00
Next Scan 2024-06-21T13:21:02+00:00

Last Scan

Scanned2024-05-22T13:21:02+00:00
URL https://arxiv.org/robots.txt
Domain IPs 151.101.131.42, 151.101.195.42, 151.101.3.42, 151.101.67.42
Response IP 151.101.3.42
Found Yes
Hash eda2bc852443f3f29d31eb04d37b26476654dc33b29256fb882819da23d189a5
SimHash 601b1965e753

Groups

*

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search
Disallow /set_author_id
Disallow /show-email

Other Records

Field Value
crawl-delay 15

googlebot

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

yahoo! slurp

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 1

bingbot

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 1

baiduspider

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 10

toutiaospider

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 10

squid_configured_as_described_at_/help/faq/cache

Rule Path
Allow /list
Allow /abs
Allow /pdf
Disallow /archive
Disallow /year
Disallow /html
Disallow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 10

yandexbot

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /e-print/
Disallow /src/
Disallow /ps/
Disallow /psfigs/
Disallow /dvi/
Disallow /cookies/
Disallow /form/
Disallow /find/
Disallow /view/
Disallow /ftp/
Disallow /refs/
Disallow /cits/
Disallow /format/
Disallow /register
Disallow /submit
Disallow /replace
Disallow /cross
Disallow /jref
Disallow /paper_passwd/
Disallow /PS_cache/
Disallow /Stats/
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /uploads
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 1

applebot

Rule Path
Allow /archive
Allow /year
Allow /list
Allow /abs
Allow /pdf
Allow /html
Allow /catchup
Disallow /user
Disallow /e-print
Disallow /src
Disallow /ps
Disallow /dvi
Disallow /cookies
Disallow /form
Disallow /find
Disallow /view
Disallow /ftp
Disallow /refs
Disallow /cits
Disallow /format
Disallow /PS_cache
Disallow /Stats
Disallow /seek-and-destroy
Disallow /IgnoreMe
Disallow /oai2
Disallow /auth
Disallow /tb
Disallow /tb/recent
Disallow /tb-recent
Disallow /trackback
Disallow /prevnext
Disallow /ct
Disallow /api
Disallow /search

Other Records

Field Value
crawl-delay 1

semrushbot

Rule Path
Disallow /

Comments

  • robots.txt for http://arxiv.org/ and mirror sites http://*.arxiv.org/
  • Indiscriminate automated downloads from this site are not permitted
  • See also: http://arxiv.org/help/robots
  • 2021-10-14 - removed crawl-delay for Bingbot. Needs to be re-added if there are any problems.
  • 2021-10-26 - added back