newcom.ca
robots.txt
Robots Exclusion Standard data for newcom.ca
Resource Scan
Scan Details
Site Domain | newcom.ca |
Base Domain | newcom.ca |
Scan Status | Ok |
Last Scan | 2024-10-19T20:45:06+00:00 |
Next Scan | 2024-11-18T20:45:06+00:00 |
Last Scan
Scanned | 2024-10-19T20:45:06+00:00 |
URL | https://newcom.ca/robots.txt |
Redirect | https://www.newcom.ca/robots.txt |
Redirect Domain | www.newcom.ca |
Redirect Base | newcom.ca |
Domain IPs | 2600:1f11:793:c400:dbff:7a6a:43bb:43f2, 3.97.123.209 |
Redirect IPs | 2600:1f11:793:c400:dbff:7a6a:43bb:43f2, 3.97.123.209 |
Response IP | 3.97.123.209 |
Found | Yes |
Hash | f3423b9bf2b0aa522ebabbf05003e09528e23aa506a686afbad9699ca6ab8a36 |
SimHash | 5c2c59d0a4b1 |
Groups
*
Rule | Path |
---|---|
Disallow |
amazonbot
anthropic-ai
applebot
awariorssbot
awariosmartbot
bytespider
ccbot
chatgpt-user
claudebot
claude-web
cohere-ai
dataforseobot
diffbot
facebookbot
friendlycrawler
googleother
img2dataset
imagesiftbot
magpie-crawler
meltwater
omgili
omgilibot
peer39_crawler
peer39_crawler/1.0
perplexitybot
piplbot
scoop.it
seekr
youbot
Rule | Path |
---|---|
Disallow | / |
Other Records
Field | Value |
---|---|
sitemap | https://www.newcom.ca/sitemap_index.xml |
Comments