pdfocr.org
robots.txt
Robots Exclusion Standard data for pdfocr.org
Resource Scan
Scan Details
Site Domain | pdfocr.org |
Base Domain | pdfocr.org |
Scan Status | Ok |
Last Scan | 2025-10-09T04:13:59+00:00 |
Next Scan | 2025-10-16T04:13:59+00:00 |
Last Scan
Scanned | 2025-10-09T04:13:59+00:00 |
URL | https://pdfocr.org/robots.txt |
Domain IPs | 104.21.0.247, 172.67.151.127, 2606:4700:3030::6815:f7, 2606:4700:3033::ac43:977f |
Response IP | 172.67.151.127 |
Found | Yes |
Hash | 224d9c9d3ce722954da483259f817fe99a759160c4f21ec26fb05abe6b7339a9 |
SimHash | c154ef02c351 |
Groups
*
Rule | Path |
---|---|
Disallow | /*/? |
Disallow | /api/? |
Disallow | /*? |
Disallow | /privacy.html |
Disallow | /cookies.html |
Disallow | /contact.html |
Disallow | /pdf-xls.html |
Allow | / |