penc.org
robots.txt

Robots Exclusion Standard data for penc.org

Resource Scan

Scan Details

Site Domain penc.org
Base Domain penc.org
Scan Status Ok
Last Scan2025-07-17T23:39:54+00:00
Next Scan 2025-08-16T23:39:54+00:00

Last Scan

Scanned2025-07-17T23:39:54+00:00
URL https://penc.org/robots.txt
Domain IPs 104.21.112.1, 104.21.16.1, 104.21.32.1, 104.21.48.1, 104.21.64.1, 104.21.80.1, 104.21.96.1, 2606:4700:3030::6815:1001, 2606:4700:3030::6815:2001, 2606:4700:3030::6815:3001, 2606:4700:3030::6815:4001, 2606:4700:3030::6815:5001, 2606:4700:3030::6815:6001, 2606:4700:3030::6815:7001
Response IP 104.21.48.1
Found Yes
Hash 1da70e076b71a89daaafb7366028e2bfede242756d8692f5d15c42a2fc09b736
SimHash ec949d42c1d9

Groups

*

Rule Path
Disallow /global_inc/
Allow /global_inc/*.css
Allow /global_inc/*.js

*

Rule Path
Disallow /global_engine/ajax/

Other Records

Field Value
sitemap https://penc.org/autositemapindex.xml

Comments

  • When crawlers hit the engine dir they sometimes publish confusing links to site content
  • in their search results so we exclude these specific engines from crawling it.
  • Note: Certain crawlers do need access to this directory so we do not want a blanket
  • exlude statment here.

Warnings

  • 18 invalid lines.