guruin.com
robots.txt

Robots Exclusion Standard data for guruin.com

Resource Scan

Scan Details

Site Domain guruin.com
Base Domain guruin.com
Scan Status Ok
Last Scan2024-11-13T15:54:46+00:00
Next Scan 2024-11-20T15:54:46+00:00

Last Scan

Scanned2024-11-13T15:54:46+00:00
URL https://guruin.com/robots.txt
Redirect https://www.guruin.com/robots.txt
Redirect Domain www.guruin.com
Redirect Base guruin.com
Domain IPs 104.22.32.218, 104.22.33.218, 172.67.23.98
Redirect IPs 104.22.32.218, 104.22.33.218, 172.67.23.98
Response IP 104.22.32.218
Found Yes
Hash 1b951e21f943013122579e55a07844949cd90a6b5d4bd88ec4170a47cddb045f
SimHash a204cf17f740

Groups

baiduspider

Rule Path
Disallow /

googlebot-image

Rule Path
Allow /favicon.ico
Disallow /

*

Rule Path
Disallow /401.html
Disallow /403.html
Disallow /404.html
Disallow /422.html
Disallow /500.html
Disallow /online.html
Disallow /offline.html
Disallow /error.html
Disallow /*.png$
Disallow /*.jpg$
Disallow /*.gif$
Disallow /cdn-cgi/
Disallow /open/
Disallow /api/
Disallow /embed/
Disallow /qrcode
Disallow /bdmail
Disallow /csmail
Disallow /db/attachments/
Allow /

Other Records

Field Value
sitemap https://www.guruin.com/system/com/sitemap.xml.gz

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines: