vornews.com
robots.txt

Robots Exclusion Standard data for vornews.com

Resource Scan

Scan Details

Site Domain vornews.com
Base Domain vornews.com
Scan Status Ok
Last Scan2026-02-20T00:28:46+00:00
Next Scan 2026-02-27T00:28:46+00:00

Last Scan

Scanned2026-02-20T00:28:46+00:00
URL https://vornews.com/robots.txt
Domain IPs 104.26.12.138, 104.26.13.138, 172.67.72.198, 2606:4700:20::681a:c8a, 2606:4700:20::681a:d8a, 2606:4700:20::ac43:48c6
Response IP 172.67.72.198
Found Yes
Hash 6a1f343bc42f42e6630e4337b8148182d27656470c2694dec079268b1168fc2e
SimHash 1fec440085a5

Groups

*

Rule Path Comment
Disallow -
Disallow /admin/ -
Disallow /wp-admin/ -
Allow /wp-admin/admin-ajax.php Common exception for WordPress functionality
Disallow /private/ -
Disallow /login/ -
Disallow /logout/ -
Disallow /register/ -
Disallow /*?s=* -
Disallow /search/ -
Disallow /thank-you/ -
Disallow /*/feed/$ -
Disallow /*/trackback/$ -

googlebot
googlebot-news

Rule Path
Allow /

bingbot
yandexbot

No rules defined. All paths allowed.

Other Records

Field Value
sitemap https://www.vornews.com/sitemap.xml
sitemap https://www.vornews.com/news-sitemap.xml

Comments

  • vornews.com Robots.txt
  • Last updated: [Current Date - e.g., 2025-11-24]
  • --- Sitemaps ---
  • Define the location of your main XML Sitemaps.
  • This is crucial for news sites to ensure search engines find all recent content.
  • If you have separate sitemaps for different categories, add them here.
  • Sitemap: https://www.vornews.com/video-sitemap.xml
  • Sitemap: https://www.vornews.com/images-sitemap.xml
  • --- Rules for All Crawlers (*) ---
  • Allow everything by default.
  • Disallow common administrative and non-public areas
  • Disallow internal search results (these pages often offer low value for external search)
  • Disallow pages/directories that are not meant for search (e.g., confirmation/thank you pages)
  • Disallow common unoptimized file types (optional, but can save crawl budget)
  • Note: Google generally needs CSS/JS for proper rendering, so be careful blocking those.
  • Disallow: /*.zip$
  • Disallow: /*.rar$
  • --- Rules for Google Specific Bots ---
  • Googlebot is the primary web crawler
  • (No Disallow here means it follows the general rules above)
  • Googlebot-News is critical for a news site; explicitly allow everything to ensure rapid indexing.
  • Google's AI model crawler (Optional: Block if you want to prevent content being used for AI training)
  • User-agent: Google-Extended
  • Disallow: /
  • --- Rules for other Major Search Engines ---
  • Bing's main web crawler
  • (No Disallow here means it follows the general rules above)
  • Yandex's main web crawler
  • (No Disallow here means it follows the general rules above)
  • Specific example for a less important or aggressive bot (if needed)
  • User-agent: SomeLesserBot
  • Disallow: /
  • Crawl-delay: 5