h5-newsbreakapp-com-1793144517.us-west-2.elb.amazonaws.com
robots.txt

Resource Scan

Scan Details

Site Domain h5-newsbreakapp-com-1793144517.us-west-2.elb.amazonaws.com
Base Domain h5-newsbreakapp-com-1793144517.us-west-2.elb.amazonaws.com
Scan Status Ok
Last Scan2024-09-28T18:29:02+00:00
Next Scan 2024-10-05T18:29:02+00:00

Last Scan

Scanned2024-09-28T18:29:02+00:00
URL http://h5-newsbreakapp-com-1793144517.us-west-2.elb.amazonaws.com/robots.txt
Redirect https://www.newsbreakapp.com/robots.txt
Redirect Domain www.newsbreakapp.com
Redirect Base newsbreakapp.com
Domain IPs 34.210.33.35, 44.235.107.78
Redirect IPs 54.186.155.155, 54.71.140.86
Response IP 54.71.140.86
Found Yes
Hash abc0e48ededd548b05aa4cd41915dcae8d5e34a35d0b3e3f5a2f3c21954d5049
SimHash 004c529387c2

Groups

ccbot

Rule Path
Disallow /

ccbot/2.0

Rule Path
Disallow /

ccbot/2.0 (http://commoncrawl.org/faq/)

Rule Path
Disallow /

wikido

Rule Path
Disallow /

fr_crawler

Rule Path
Disallow /

yandex

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

baiduspider-image

Rule Path
Disallow /

baiduspider-video

Rule Path
Disallow /

baiduspider-favo

Rule Path
Disallow /

baiduspider-news

Rule Path
Disallow /

baiduspider-cpro

Rule Path
Disallow /

baiduspider-ads

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

bitvorebot

Rule Path
Disallow /

blp_bbot

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

kraken

Rule Path
Disallow /

moatbot

Rule Path
Disallow /

bhcbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

synthesio

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

brandonbot

Rule Path
Disallow /

germcrawler

Rule Path
Disallow /

sogou

Rule Path
Disallow /

exabot

Rule Path
Disallow /

maxpointcrawler

Rule Path
Disallow /

admantx

Rule Path
Disallow /

*

Rule Path
Disallow /_api/
Disallow /n/
Disallow /v/
Disallow /s/

twitterbot

Rule Path
Allow /n/
Allow /v/
Allow /s/

facebookexternalhit

Rule Path
Allow /n/
Allow /v/
Allow /s/

Other Records

Field Value
sitemap https://www.newsbreak.com/sitemap.xml

Comments

  • New crawlers to block 2016