spacelist.co
robots.txt

Robots Exclusion Standard data for spacelist.co

Resource Scan

Scan Details

Site Domain spacelist.co
Base Domain spacelist.co
Scan Status Ok
Last Scan2024-11-12T15:35:16+00:00
Next Scan 2024-11-19T15:35:16+00:00

Last Scan

Scanned2024-11-12T15:35:16+00:00
URL https://www.spacelist.co/robots.txt
Domain IPs 15.197.246.237, 3.33.193.101, 52.223.46.195, 99.83.183.127
Response IP 99.83.183.127
Found Yes
Hash 7827fa60e2c68a9a3b943aea7aa44e1c3ff34820d29d4e75b4a2fca25e70683c
SimHash 2a141ca5ec40

Groups

semrushbot

Rule Path
Disallow /

ahrefssiteaudit

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

*

Rule Path
Disallow /advertising
Disallow /data
Disallow /listings/*/print

Other Records

Field Value
sitemap https://www.spacelist.ca/sitemap.xml.gz
sitemap https://www.spacelist.co/sitemap.xml.gz

Comments

  • See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-agent: *
  • Disallow: /
  • Block SemRush bot
  • Block AhrefsSiteAudit bot
  • Block AhrefsBot bot
  • Block URLs with /listings/[listingid]/print