pdfocr.org
robots.txt

Robots Exclusion Standard data for pdfocr.org

Resource Scan

Scan Details

Site Domain pdfocr.org
Base Domain pdfocr.org
Scan Status Ok
Last Scan2025-10-09T04:13:59+00:00
Next Scan 2025-10-16T04:13:59+00:00

Last Scan

Scanned2025-10-09T04:13:59+00:00
URL https://pdfocr.org/robots.txt
Domain IPs 104.21.0.247, 172.67.151.127, 2606:4700:3030::6815:f7, 2606:4700:3033::ac43:977f
Response IP 172.67.151.127
Found Yes
Hash 224d9c9d3ce722954da483259f817fe99a759160c4f21ec26fb05abe6b7339a9
SimHash c154ef02c351

Groups

*

Rule Path
Disallow /*/?
Disallow /api/?
Disallow /*?
Disallow /privacy.html
Disallow /cookies.html
Disallow /contact.html
Disallow /pdf-xls.html
Allow /