www.capitol.hawaii.gov
robots.txt

Robots Exclusion Standard data for www.capitol.hawaii.gov

Resource Scan

Scan Details

Site Domain www.capitol.hawaii.gov
Base Domain hawaii.gov
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-10-19T23:22:13+00:00
Next Scan 2025-01-17T23:22:13+00:00

Last Successful Scan

Scanned2023-09-03T20:07:05+00:00
URL https://www.capitol.hawaii.gov/robots.txt
Domain IPs 104.18.40.178, 172.64.147.78, 2606:4700:4400::6812:28b2, 2606:4700:4400::ac40:934e
Response IP 172.64.147.78
Found Yes
Hash aaec24bf73c758684f8a1ea2e87280e2fb8f3f14838d475416ab5bf36774a9c6
SimHash 2c51871bedd5

Groups

bingbot
bingpreview
deusu
duckduckbot
duckduckgo-favicons-bot
facebot
feedly
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
*

Rule Path
Disallow /*.axd$
Disallow /*.axd
Disallow /ScriptResource.axd
Disallow /WebResource.axd
Disallow /scriptresource.axd
Disallow /webresource.axd

siteimprovebot
siteimprovebot-crawler

Rule Path
Disallow /*.htm$

Comments

  • Disallow for WebResource.axd caching issues. Several instances below to cover all search engines.
  • To specify matching the end of a URL, use $
  • However, WebResource.axd and ScriptResource.axd always include a query string parameter the URL does
  • not end with .axd thus, the correct robots.txt record for Google would be:
  • Not all crawlers recognize the wildcard '*' syntax. To comply with the robots.txt draft RFC
  • Note that the records are case sensitive, and error page is showing the requests to be in lower case
  • so let's include both cases below:
  • Disallow: /*.pdf$