machineryplanet.io
robots.txt

Robots Exclusion Standard data for machineryplanet.io

Resource Scan

Scan Details

Site Domain machineryplanet.io
Base Domain machineryplanet.io
Scan Status Ok
Last Scan2025-12-25T06:01:56+00:00
Next Scan 2026-01-01T06:01:56+00:00

Last Scan

Scanned2025-12-25T06:01:56+00:00
URL https://machineryplanet.io/robots.txt
Domain IPs 104.21.59.176, 172.67.182.5, 2606:4700:3032::6815:3bb0, 2606:4700:3034::ac43:b605
Response IP 172.67.182.5
Found Yes
Hash 1596bed3ea3ad14de629e996876dd965d9d80da75d6a01ae98291d8e8fc4fdf4
SimHash 0708980374f4

Groups

*

Rule Path
Allow /
Disallow /admin
Disallow /admin/
Disallow /_next/
Disallow /api/
Allow /api/sitemap*.xml
Allow /api/health
Disallow /debug/
Disallow /test/
Disallow /staging/
Allow /search?categories=*
Allow /search?childCategories=*
Allow /search?make=*
Allow /search?model=*
Allow /search?productType=*
Allow /search$
Allow /search/$
Disallow /search?*sort=*
Disallow /search?*filter=*
Disallow /search?*page=*
Disallow /search?*limit=*
Disallow /search?*offset=*
Disallow /search?*view=*
Disallow /search?*&*&*&*
Allow /images/
Allow /icons/
Allow /fonts/
Allow /*.css
Allow /*.js
Allow /*.woff
Allow /*.woff2
Allow /*.jpg
Allow /*.jpeg
Allow /*.png
Allow /*.webp
Allow /*.avif
Allow /*.svg
Allow /*.gif
Allow /favicon.ico
Allow /robots.txt
Allow /sitemap*.xml
Disallow /private/
Disallow /temp/
Disallow /cache/
Disallow /.git/
Disallow /node_modules/
Disallow /.next/

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

semrushbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 5

ahrefsbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 5

dotbot

Rule Path
Allow /

Other Records

Field Value
crawl-delay 5

screaming frog seo spider

Rule Path
Allow /

mj12bot

Rule Path
Disallow /

semrushbot
ahrefsbot
baiduspider

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://www.machineryplanet.ae/api/sitemap-index.xml
sitemap https://www.machineryplanet.ae/api/sitemap.xml
sitemap https://www.machineryplanet.ae/api/sitemap-products.xml
sitemap https://www.machineryplanet.ae/api/sitemap-categories.xml
sitemap https://www.machineryplanet.ae/api/sitemap-blogs.xml
sitemap https://www.machineryplanet.ae/api/sitemap-images.xml

Comments

  • ================================================================
  • Machinery Planet - Robots.txt (SEO Optimized)
  • Updated: 2025-11-26
  • Purpose: Allow search engines to crawl valuable content while
  • blocking duplicate/low-value pages
  • ================================================================
  • ====================
  • MAIN SEARCH ENGINES
  • ====================
  • Block admin and development paths
  • ====================
  • SEARCH & FILTERING
  • ====================
  • ✅ CRITICAL FIX: Allow category and brand pages but block filters/sorting
  • Allow valuable pages:
  • Block duplicate content parameters (must come AFTER Allow rules)
  • Block search with multiple filter combinations (low value)
  • ====================
  • STATIC ASSETS
  • ====================
  • Allow crawling of important assets for proper rendering
  • ====================
  • PRIVATE DIRECTORIES
  • ====================
  • ====================
  • SPECIAL USER AGENTS
  • ====================
  • Block AI crawlers (GPT, Claude, etc.)
  • ====================
  • SEO TOOL CRAWLERS
  • ====================
  • Allow but rate-limit aggressive SEO crawlers
  • ====================
  • BAD BOTS (Optional)
  • ====================
  • Block known bad bots/scrapers
  • ====================
  • SITEMAPS
  • ====================
  • ✅ FIXED: Only reference sitemaps for THIS domain
  • ====================
  • HOST PREFERENCE
  • ====================
  • Preferred domain (www version)
  • ====================
  • NOTES FOR DEVELOPERS
  • ====================
  • 1. This file allows Google to crawl 12,800+ product/category pages
  • 2. Blocks only duplicate/filtered versions to save crawl budget
  • 3. AI crawlers blocked to prevent content scraping
  • 4. SEO crawlers rate-limited but allowed for auditing
  • 5. All sitemaps reference THIS domain only (no cross-domain refs)
  • ================================================================

Warnings

  • `host` is not a known field.