si.com
robots.txt

Robots Exclusion Standard data for si.com

Resource Scan

Scan Details

Site Domain si.com
Base Domain si.com
Scan Status Ok
Last Scan2024-05-01T19:05:26+00:00
Next Scan 2024-05-08T19:05:26+00:00

Last Scan

Scanned2024-05-01T19:05:26+00:00
URL https://si.com/robots.txt
Redirect https://www.si.com/robots.txt
Redirect Domain www.si.com
Redirect Base si.com
Domain IPs 35.83.190.225, 44.241.188.24
Redirect IPs 18.155.68.4, 18.155.68.51, 18.155.68.62, 18.155.68.86, 2600:9000:2200:1000:f:c1f3:880:93a1, 2600:9000:2200:2000:f:c1f3:880:93a1, 2600:9000:2200:3400:f:c1f3:880:93a1, 2600:9000:2200:3e00:f:c1f3:880:93a1, 2600:9000:2200:5000:f:c1f3:880:93a1, 2600:9000:2200:8a00:f:c1f3:880:93a1, 2600:9000:2200:c600:f:c1f3:880:93a1, 2600:9000:2200:d400:f:c1f3:880:93a1
Response IP 18.155.68.51
Found Yes
Hash a0204f367a10a1ed031490dc25bd29ea4d38a9ce37103eb01cd5604f2f429fad
SimHash 6a04420adde2

Groups

*

Rule Path
Allow /

*

Rule Path
Disallow */?*utm_source=*
Disallow */?*utm_campaign=*
Disallow */?*utm_medium=*
Disallow /*?source=*
Disallow */?*utm_newsbreak=*

*

Rule Path
Allow /ads.txt

*

Rule Path
Disallow *?embed*

*

Rule Path
Disallow */*a_aid%3D*
Disallow /*?partner=*

*

Rule Path
Disallow */*?mm-experiments=*

*

Rule Path
Disallow /*?*s=*

*

Rule Path
Disallow *?app=*

*

Rule Path
Disallow *?fbclid=*

*

Rule Path
Disallow /*?setLocale=*
Disallow *?georedirect=*

*

Rule Path
Disallow /*?term=*
Disallow *?ref=*
Disallow /*?view_source=*
Disallow /*?view_medium=*
Disallow /*?initialLeagueId=*

*

Rule Path
Disallow *_ga_*
Disallow */?_gl=*

*

Rule Path
Disallow */api/*

*

Rule Path
Disallow */videos/undefinedc_fill%2Cw_360%2Car_16%3A9%2Cf_auto%2Cq_auto%2Cg_auto/undefined
Disallow */teams/mainNavigationChevron_icon.svg?*
Disallow */leagues/mainNavigationChevron_icon.svg?*
Disallow */undefinedc_fill%2Cw_360%2Car_16%3A9%2Cf_auto%2Cq_auto%2Cg_auto/undefined

*

Rule Path
Disallow */xposts/
Disallow */reader/
Disallow */embed/*
Disallow */ads/*
Disallow */editor/*
Disallow */posts/*/edit$
Disallow */posts/*/publish$
Disallow */singlepage/pipe/*
Disallow */singlepage/uncached_pipe/*
Disallow */singlepage/epipe/*
Disallow */reads/*/read$
Disallow */embed_code
Disallow */matches/*
Disallow */es/partidos/*
Disallow */it/partite/*
Disallow */de/spiele/*
Disallow */zh-CN/*
Disallow */ping
Disallow */sessions/*
Disallow */admin/*
Disallow */management/*
Disallow */castr
Disallow */unfeature$
Disallow */unfeature/*
Disallow */videos/*
Disallow */channels/videos/*

*

Rule Path
Disallow /_partial/*
Disallow */amazingfactgenerator/*
Disallow /_modules/*
Disallow /subscribe/v3/*
Disallow /search/*
Disallow */file/*
Disallow */node/*
Disallow /sites/*
Disallow /longform/*
Disallow /blogs/*
Disallow */store/*
Disallow */us/*
Disallow */store/*
Disallow */us/*
Disallow */HDYK/*
Disallow /dist/*
Disallow */shopping/*
Disallow /trivia/*
Disallow /worksheets/*
Disallow /music/*
Disallow /magazine/*
Disallow /puzzle/*

*

Rule Path
Disallow */files/*
Disallow */wp-admin/*
Disallow */?*utm_newsbreak=*
Disallow */wp-content/*
Disallow */wp-includes/*
Disallow */app/*
Disallow */embed_code*
Disallow */%7B%7Burl/*
Disallow */v2/*

twitterbot

Rule Path
Allow *

facebookbot

Rule Path
Allow *

Comments

  • Allow all search engines to crawl
  • Disallow Parameters
  • GA traffic source parameters
  • Allow crawling ads
  • Embedded widget parameters
  • Influencers/Affiliate Links parameters
  • Experiments testing team
  • Search box param
  • Apps param
  • FB campaigns
  • GEO targeting:
  • Unknown Parameters
  • GA parameters - Generated from Google caching
  • Generated from Voltax API url
  • Voltax HTML Unknown crawled urls
  • Generated from 90min and TBL Monolith
  • Generated from Mentalfloss WP and Drupal
  • Generated from Fansided WP
  • Social Media Robots

Warnings

  • 2 invalid lines.