aapkimedia.com
robots.txt

Robots Exclusion Standard data for aapkimedia.com

Resource Scan

Scan Details

Site Domain aapkimedia.com
Base Domain aapkimedia.com
Scan Status Ok
Last Scan2024-11-12T02:04:29+00:00
Next Scan 2024-11-19T02:04:29+00:00

Last Scan

Scanned2024-11-12T02:04:29+00:00
URL https://aapkimedia.com/robots.txt
Redirect https://www.aapkimedia.com/robots.txt
Redirect Domain www.aapkimedia.com
Redirect Base aapkimedia.com
Domain IPs 216.239.32.21, 216.239.34.21, 216.239.36.21, 216.239.38.21, 2404:6800:4003:c00::79
Redirect IPs 216.239.32.21, 216.239.34.21, 216.239.36.21, 216.239.38.21, 2404:6800:4003:c01::79
Response IP 216.239.34.21
Found Yes
Hash 8338c4301a09624276515edddca9af1bfe327dcc5d179ef2beb82e5bab36c4e7
SimHash 4f3c9ba036e0

Groups

*

Rule Path
Allow /
Disallow /search
Disallow /page/
Disallow /archive/
Disallow /labels/
Disallow /private/
Disallow /*.pdf$
Disallow /*.doc$
Disallow /*.ppt$
Allow /images/
Allow /blog/

Other Records

Field Value
sitemap https://www.aapkimedia.com/sitemap.xml

Comments

  • Robots.txt for https://www.aapkimedia.com/
  • Allow all user agents
  • Allow Googlebot to crawl all content
  • Block specific folders
  • Block specific file types
  • Allow specific folders for Googlebot
  • Sitemap location