in.gov
robots.txt

Robots Exclusion Standard data for in.gov

Resource Scan

Scan Details

Site Domain in.gov
Base Domain in.gov
Scan Status Ok
Last Scan2024-11-04T15:48:29+00:00
Next Scan 2024-12-04T15:48:29+00:00

Last Scan

Scanned2024-11-04T15:48:29+00:00
URL https://in.gov/robots.txt
Redirect https://www.in.gov/robots.txt
Redirect Domain www.in.gov
Redirect Base in.gov
Domain IPs 208.40.244.65
Redirect IPs 208.40.244.65
Response IP 208.40.244.65
Found Yes
Hash 7b576f5e60a62d9d44e5fc532fb5c62ca1bfecd9c280b2f1391a8f74b3b22969
SimHash 2a9e757f6e92

Groups

*

Rule Path
Disallow /serv/
Disallow /cgi-bin/
Disallow /isdh/drafts_local/
Disallow /demand
Disallow /search
Disallow /ai/errors/
Disallow /dor/4572.htm
Disallow /dor/reference/legal/rulings/unused/
Disallow /dwd/files/swic/
Disallow /dwd/files/JWIB/
Disallow /dwd/files/CM_Files/
Disallow /dwd/files/policy/
Disallow /dwd/test/
Disallow /ActiveCalendar/mobile/mobilelist.aspx
Disallow *subscribetocalendar.aspx*
Disallow *RSSSyndicator.aspx*
Disallow *downloadtype.aspx*
Disallow /indot/3212.htm
Disallow /sos/online_corps/
Disallow /sos/clerical/
Disallow /sos/registration/

Comments

  • robots.txt for http://www.IN.gov/