www.capitol.hawaii.gov
robots.txt

Robots Exclusion Standard data for www.capitol.hawaii.gov

Resource Scan

Site Domain	www.capitol.hawaii.gov
Base Domain	hawaii.gov
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2024-10-19T23:22:13+00:00
Next Scan	2025-01-17T23:22:13+00:00

Scanned	2023-09-03T20:07:05+00:00
URL	https://www.capitol.hawaii.gov/robots.txt
Domain IPs	104.18.40.178, 172.64.147.78, 2606:4700:4400::6812:28b2, 2606:4700:4400::ac40:934e
Response IP	172.64.147.78
Found	Yes
Hash	aaec24bf73c758684f8a1ea2e87280e2fb8f3f14838d475416ab5bf36774a9c6
SimHash	2c51871bedd5

Rule

Path

Disallow

/*.axd$

Disallow

/*.axd

Disallow

/ScriptResource.axd

Disallow

/WebResource.axd

Disallow

/scriptresource.axd

Disallow

/webresource.axd

Rule	Path
Disallow	/*.htm$

Rule

Path

Disallow

/*.htm$

Back to top

Disallow for WebResource.axd caching issues. Several instances below to cover all search engines.
To specify matching the end of a URL, use $
However, WebResource.axd and ScriptResource.axd always include a query string parameter the URL does
not end with .axd thus, the correct robots.txt record for Google would be:
Not all crawlers recognize the wildcard '*' syntax. To comply with the robots.txt draft RFC
Note that the records are case sensitive, and error page is showing the requests to be in lower case
so let's include both cases below:
Disallow: /*.pdf$

Back to top