htyellowpages.com
robots.txt

Robots Exclusion Standard data for htyellowpages.com

Resource Scan

Scan Details

Site Domain htyellowpages.com
Base Domain htyellowpages.com
Scan Status Failed
Failure ReasonScan timed out.
Last Scan2024-06-09T06:13:27+00:00
Next Scan 2024-08-08T06:13:27+00:00

Last Successful Scan

Scanned2022-12-13T15:29:45+00:00
URL http://htyellowpages.com/robots.txt
Redirect http://www.yellowpages.com/robots.txt
Redirect Domain www.yellowpages.com
Redirect Base yellowpages.com
Domain IPs 216.21.224.199
Redirect IPs 151.138.15.18
Response IP 151.138.15.18
Found Yes
Hash 817a6fc4817fa422e488233b50713adc4a6f1d16b34a2e376a48fcb05755b763
SimHash 3809fb00c3b0

Groups

*

Rule Path
Disallow /*images/li.gif
Disallow /*images/logging_requests.gif
Disallow /relevance_feedback
Disallow /listings/
Disallow /listing_feedback/
Disallow */report_abuse
Disallow /gallery/*/copyright
Disallow /gallery/*/flag
Disallow /contribute/
Disallow /reservations/
Disallow */print_ad?*
Disallow */audio_ad?*
Disallow */map_locations
Disallow /reviews/*/up
Disallow /reviews/*/down
Disallow /reviews/*/follow
Disallow /reviews/*/unfollow
Disallow /semp/*
Disallow */no-internet-heading-assigned
Disallow */no-internet-heading-assisted
Disallow /login
Disallow /register
Disallow /user/
Disallow /ypu/js/compiled/tripadvisor*
Disallow /ypu/apps/ypm-core/ypm/javascripts/bundle_tripadvisor*
Disallow /undefined/
Disallow /improve_listing/*
Disallow /search*
Disallow /lwes/
Disallow /route?*

scrapy

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

twitterbot

Rule Path
Allow *

Warnings

  • 2 invalid lines.
  • `host` is not a known field.