citeseerx.ist.psu.edu
robots.txt
Robots Exclusion Standard data for citeseerx.ist.psu.edu
Resource Scan
Scan Details
Site Domain | citeseerx.ist.psu.edu |
Base Domain | psu.edu |
Scan Status | Failed |
Failure Stage | Fetching resource. |
Failure Reason | Couldn't establish SSL connection. |
Last Scan | 2025-05-29T16:50:03+00:00 |
Next Scan | 2025-06-28T16:50:03+00:00 |
Last Successful Scan
Scanned | 2025-04-07T14:57:46+00:00 |
URL | https://citeseerx.ist.psu.edu/robots.txt |
Domain IPs | 130.203.136.161, 130.203.136.162, 130.203.136.163 |
Response IP | 130.203.136.163 |
Found | Yes |
Hash | 36b7e9f14ffd55b2324f657c61c928d65664ec09fa77551aa7cd80b0c522109c |
SimHash | 061e9d508645 |
Groups
*
Rule | Path |
---|---|
Disallow | /doc_view/pid* |
Disallow | /pdf* |
Other Records
Field | Value |
---|---|
crawl-delay | 10 |
Other Records
Field | Value |
---|---|
sitemap | https://citeseerx.ist.psu.edu/sitemap_index.xml |
Comments