idc.com
robots.txt

Robots Exclusion Standard data for idc.com

Resource Scan

Scan Details

Site Domain idc.com
Base Domain idc.com
Scan Status Ok
Last Scan2024-04-28T05:40:58+00:00
Next Scan 2024-05-28T05:40:58+00:00

Last Scan

Scanned2024-04-28T05:40:58+00:00
URL https://idc.com/robots.txt
Redirect https://www.idc.com/robots.txt
Redirect Domain www.idc.com
Redirect Base idc.com
Domain IPs 18.155.68.111, 18.155.68.40, 18.155.68.77, 18.155.68.96
Redirect IPs 18.155.68.111, 18.155.68.40, 18.155.68.77, 18.155.68.96
Response IP 18.155.68.111
Found Yes
Hash 98163158f6e85ce4e7118c629e95d33460f3b92259bd6100726be9fbe6b64e7d
SimHash ea1dd8128f53

Groups

ahrefsbot

Rule Path
Disallow /

sitebot

Rule Path
Disallow /

*

Rule Path
Disallow /getdoc.jsp?containerId=SEV
Disallow /search/
Disallow /action/login

Other Records

Field Value
crawl-delay 10

Comments

  • This is a file retrieved by webwalkers a.k.a. spiders that
  • conform to a defacto standard.
  • See <URL:http://www.robotstxt.org/wc/exclusion.html#robotstxt>
  • Comments to the webmaster should be sent to webmaster@idc.com
  • Format is:
  • User-agent: <name of spider>
  • Disallow: <nothing> | <path>
  • -----------------------------------------------------------------------------