newsbreakapp.com
robots.txt

Robots Exclusion Standard data for newsbreakapp.com

Resource Scan

Scan Details

Site Domain newsbreakapp.com
Base Domain newsbreakapp.com
Scan Status Ok
Last Scan2024-11-08T14:14:27+00:00
Next Scan 2024-11-15T14:14:27+00:00

Last Scan

Scanned2024-11-08T14:14:27+00:00
URL https://newsbreakapp.com/robots.txt
Redirect https://www.newsbreakapp.com/robots.txt
Redirect Domain www.newsbreakapp.com
Redirect Base newsbreakapp.com
Domain IPs 52.39.89.221, 54.68.174.152
Redirect IPs 52.39.89.221, 54.68.174.152
Response IP 54.68.174.152
Found Yes
Hash abc0e48ededd548b05aa4cd41915dcae8d5e34a35d0b3e3f5a2f3c21954d5049
SimHash 004c529387c2

Groups

ccbot

Rule Path
Disallow /

ccbot/2.0

Rule Path
Disallow /

ccbot/2.0 (http://commoncrawl.org/faq/)

Rule Path
Disallow /

wikido

Rule Path
Disallow /

fr_crawler

Rule Path
Disallow /

yandex

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

baiduspider-image

Rule Path
Disallow /

baiduspider-video

Rule Path
Disallow /

baiduspider-favo

Rule Path
Disallow /

baiduspider-news

Rule Path
Disallow /

baiduspider-cpro

Rule Path
Disallow /

baiduspider-ads

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

bitvorebot

Rule Path
Disallow /

blp_bbot

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

kraken

Rule Path
Disallow /

moatbot

Rule Path
Disallow /

bhcbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

synthesio

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

brandonbot

Rule Path
Disallow /

germcrawler

Rule Path
Disallow /

sogou

Rule Path
Disallow /

exabot

Rule Path
Disallow /

maxpointcrawler

Rule Path
Disallow /

admantx

Rule Path
Disallow /

*

Rule Path
Disallow /_api/
Disallow /n/
Disallow /v/
Disallow /s/

twitterbot

Rule Path
Allow /n/
Allow /v/
Allow /s/

facebookexternalhit

Rule Path
Allow /n/
Allow /v/
Allow /s/

Other Records

Field Value
sitemap https://www.newsbreak.com/sitemap.xml

Comments

  • New crawlers to block 2016