headlinesnewyork.com
robots.txt
Robots Exclusion Standard data for headlinesnewyork.com
Resource Scan
Scan Details
| Site Domain | headlinesnewyork.com |
| Base Domain | headlinesnewyork.com |
| Scan Status | Failed |
| Failure Stage | Fetching resource. |
| Failure Reason | Couldn't connect to server. |
| Last Scan | 2025-11-03T04:05:39+00:00 |
| Next Scan | 2026-02-01T04:05:39+00:00 |
Last Successful Scan
| Scanned | 2023-09-22T03:32:05+00:00 |
| URL | https://headlinesnewyork.com/robots.txt |
| Domain IPs | 104.21.7.231, 172.67.188.20, 2606:4700:3033::6815:7e7, 2606:4700:3033::ac43:bc14 |
| Response IP | 172.67.188.20 |
| Found | Yes |
| Hash | 5b014c2964ac88c3155b796dd68659e16f80bd1eb46e7a7e3616703710da10f3 |
| SimHash | 6d0d8cd12f93 |
Groups
*
| Rule | Path |
|---|---|
| Disallow | /src/ |
| Disallow | /?page=* |
| Disallow | /*?page=* |
| Disallow | /*?page=*&s=* |
| Disallow | /*?page=*&feed=* |
| Disallow | /*/*?page=* |
Other Records
| Field | Value |
|---|---|
| sitemap | https://headlinesnewyork.com/sitemap.xml |
| sitemap | https://headlinesnewyork.com/sitemap-news.xml |
| sitemap | https://headlinesnewyork.com/feed |
Comments