weblocal.ca
robots.txt

Robots Exclusion Standard data for weblocal.ca

Resource Scan

Scan Details

Site Domain weblocal.ca
Base Domain weblocal.ca
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a server error.
Last Scan2025-10-06T17:30:18+00:00
Next Scan 2026-01-04T17:30:18+00:00

Last Successful Scan

Scanned2024-09-11T18:58:16+00:00
URL https://www.weblocal.ca/robots.txt
Domain IPs 104.21.5.122, 172.67.133.101, 2606:4700:3033::6815:57a, 2606:4700:3034::ac43:8565
Response IP 172.67.133.101
Found Yes
Hash 9d706be6c0b5c20978897c37a1c7db9e7df8962192f0e0e73c03ce1e0f77eb68
SimHash 28077f36e7d1

Groups

*

Rule Path
Disallow /api/
Disallow /messages
Disallow /invite
Disallow /signout
Disallow /signin
Disallow /pictures/upload
Disallow /owner/
Disallow /map/
Disallow /submit/
Disallow /call/
Disallow /nfredirect/
Disallow /directions
Disallow /merchant-verification
Disallow /brands/
Disallow /css/
Disallow /user/
Disallow /*?inline=
Disallow /*?map=inline
Disallow /*?pictures=inline
Disallow /*?pictures=1
Disallow /*?videos=inline
Disallow /*?videos=1

fasterfox

Rule Path
Disallow /

bender

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.weblocal.ca/sitemap-urls-index.xml.gz

Comments

  • these generally need logins, javascript or at least humans to do
  • something. It's pointless for the crawlers to crawl them...
  • http://www.google.com/support/webmasters/bin/answer.py?answer=35303
  • Googlebot supports wildcards - the have a tool in their "webmaster tools"
  • to check the rules (these work)
  • http://www.edochan.com/programming/pf.htm
  • http://sites.google.com/site/bendercrawler