epaper.pknewspapers.com
robots.txt

Robots Exclusion Standard data for epaper.pknewspapers.com

Resource Scan

Scan Details

Site Domain epaper.pknewspapers.com
Base Domain pknewspapers.com
Scan Status Ok
Last Scan2025-04-26T10:35:52+00:00
Next Scan 2025-05-10T10:35:52+00:00

Last Scan

Scanned2025-04-26T10:35:52+00:00
URL https://epaper.pknewspapers.com/robots.txt
Redirect https://newspaperspk.com/robots.txt
Redirect Domain newspaperspk.com
Redirect Base newspaperspk.com
Domain IPs 104.21.24.146, 172.67.219.64, 2606:4700:3036::ac43:db40, 2606:4700:3037::6815:1892
Redirect IPs 104.21.84.199, 172.67.196.149, 2606:4700:3031::6815:54c7, 2606:4700:3037::ac43:c495
Response IP 172.67.196.149
Found Yes
Hash 3b5f9a5ce12d3f8711217ab46aaa1f9fa14ba089f03c4deb1f0bc315cf1f32cf
SimHash 01d4556169bc

Groups

*

Rule Path
Allow /*.js*
Allow /*.css*
Allow /*.png*
Allow /*.jpg*
Allow /*.gif*
Disallow /administrator/
Disallow /api/
Disallow /bin/
Disallow /cache/
Disallow /cli/
Disallow /includes/
Disallow /installation/
Disallow /language/
Disallow /layouts/
Disallow /libraries/
Disallow /logs/
Disallow /tmp/

Other Records

Field Value
sitemap https://newspaperspk.com/sitemap.xml
sitemap https://newspaperspk.com/sitemap_articles_pakistani_newspapers.xml
sitemap https://newspaperspk.com/sitemap_articles_indian_newspapers.xml
sitemap https://newspaperspk.com/sitemap_articles_world_newspapers.xml
sitemap https://newspaperspk.com/sitemap_images.xml

Comments

  • robots.txt for https://newspaperspk.com
  • This file is used to guide web crawlers on how to interact with the site.
  • It allows crawlers to access certain resources and disallows others to ensure optimal indexing.
  • Allowing crawlers to access common web resources like JavaScript, CSS, and image files
  • Disallowing crawlers from accessing sensitive or backend directories
  • Sitemap entries to guide crawlers to the correct sitemap locations for better indexing