superpages.com
robots.txt

Robots Exclusion Standard data for superpages.com

Resource Scan

Scan Details

Site Domain superpages.com
Base Domain superpages.com
Scan Status Ok
Last Scan2024-09-15T09:22:52+00:00
Next Scan 2024-09-22T09:22:52+00:00

Last Scan

Scanned2024-09-15T09:22:52+00:00
URL https://superpages.com/robots.txt
Redirect https://www.superpages.com/robots.txt
Redirect Domain www.superpages.com
Redirect Base superpages.com
Domain IPs 151.138.15.26, 151.138.150.150
Redirect IPs 151.138.15.26
Response IP 151.138.15.26
Found Yes
Hash 60e0fe5f81abe30bca12ec97025eb488ba143eca525e8f6737f63ce3a03ac196
SimHash 590d9a82c1b0

Groups

mediapartners-google

Rule Path
Disallow

*

Rule Path
Disallow /*images/li.gif
Disallow /*images/logging_requests.gif
Disallow /relevance_feedback
Disallow /listings/
Disallow /listing_feedback/
Disallow */report_abuse
Disallow /gallery/*/copyright
Disallow /gallery/*/flag
Disallow /contribute/
Disallow /reservations/
Disallow */print_ad?*
Disallow */audio_ad?*
Disallow */map_locations
Disallow /reviews/*/up
Disallow /reviews/*/down
Disallow /reviews/*/follow
Disallow /reviews/*/unfollow
Disallow /semp/*
Disallow /login
Disallow /register
Disallow /user/
Disallow /undefined/
Disallow /lwes/
Disallow /route?*

scrapy

Rule Path
Disallow /

twitterbot

Rule Path
Allow *

Warnings

  • 2 invalid lines.
  • `host` is not a known field.