son.web.id
robots.txt

Robots Exclusion Standard data for son.web.id

Resource Scan

Scan Details

Site Domain son.web.id
Base Domain son.web.id
Scan Status Ok
Last Scan2026-01-22T18:38:55+00:00
Next Scan 2026-02-21T18:38:55+00:00

Last Scan

Scanned2026-01-22T18:38:55+00:00
URL https://son.web.id/robots.txt
Domain IPs 72.167.65.42
Response IP 72.167.65.42
Found Yes
Hash bd5ac839d38f298ddbf3ae775e04d11519cf2dd6ae2cf84451af3b9af1a46eba
SimHash 61154d121610

Groups

*

Rule Path
Disallow /cgi-bin
Disallow /wp-admin
Disallow /wp-includes

mediapartners-google

Rule Path
Allow /

adsbot-google

Rule Path
Allow /

googlebot-image

Rule Path
Allow /

googlebot-mobile

Rule Path
Allow /

ia_archiver-web.archive.org

Rule Path
Allow /

ia_archiver

Rule Path
Allow /

duggmirror

Rule Path Comment
Disallow / BEGIN XML-SITEMAP-PLUGIN

Other Records

Field Value
sitemap http://www.son.web.id/sitemap.xml.gz

Comments

  • disallow archiving site
  • disable duggmirror
  • END XML-SITEMAP-PLUGIN