yellowpages.com
robots.txt

Robots Exclusion Standard data for yellowpages.com

Resource Scan

Scan Details

Site Domain yellowpages.com
Base Domain yellowpages.com
Scan Status Ok
Last Scan2024-11-01T01:22:06+00:00
Next Scan 2024-11-08T01:22:06+00:00

Last Scan

Scanned2024-11-01T01:22:06+00:00
URL https://yellowpages.com/robots.txt
Redirect https://www.yellowpages.com/robots.txt
Redirect Domain www.yellowpages.com
Redirect Base yellowpages.com
Domain IPs 151.138.15.18, 208.93.105.116
Redirect IPs 208.93.105.116
Response IP 208.93.105.116
Found Yes
Hash 1548b52d6c29e551df3128c8050a3926b3f3bfa6e679b2232865d357dcbc51ad
SimHash 1809fb0043b0

Groups

*

Rule Path
Disallow /*images/li.gif
Disallow /*images/logging_requests.gif
Disallow /relevance_feedback
Disallow /listings/
Disallow /listing_feedback/
Disallow */report_abuse
Disallow /gallery/*/copyright
Disallow /gallery/*/flag
Disallow /contribute/
Disallow /reservations/
Disallow */print_ad?*
Disallow */audio_ad?*
Disallow */map_locations
Disallow /reviews/*/up
Disallow /reviews/*/down
Disallow /reviews/*/follow
Disallow /reviews/*/unfollow
Disallow */no-internet-heading-assigned
Disallow */no-internet-heading-assisted
Disallow /login
Disallow /register
Disallow /user/
Disallow /ypu/js/compiled/tripadvisor*
Disallow /ypu/apps/ypm-core/ypm/javascripts/bundle_tripadvisor*
Disallow /undefined/
Disallow /improve_listing/*
Disallow /search*
Disallow /lwes/
Disallow /route?*
Disallow /listings/*/directions*

scrapy

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

twitterbot

Rule Path
Allow *

Warnings

  • 2 invalid lines.
  • `host` is not a known field.