extremeoverclocking.com
robots.txt

Robots Exclusion Standard data for extremeoverclocking.com

Resource Scan

Scan Details

Site Domain extremeoverclocking.com
Base Domain extremeoverclocking.com
Scan Status Ok
Last Scan2024-11-14T21:01:43+00:00
Next Scan 2024-11-21T21:01:43+00:00

Last Scan

Scanned2024-11-14T21:01:43+00:00
URL https://extremeoverclocking.com/robots.txt
Redirect https://www.extremeoverclocking.com/robots.txt
Redirect Domain www.extremeoverclocking.com
Redirect Base extremeoverclocking.com
Domain IPs 216.230.228.242
Redirect IPs 216.230.228.242
Response IP 216.230.228.242
Found Yes
Hash fb2239e7f4c2fc257b564e382220afd5e6e984b1696a306766a99160f6b9ee1e
SimHash a0800b4408b4

Groups

voltron

Rule Path
Disallow /

adbeat_bot

Rule Path
Disallow /

adsbot

Rule Path
Disallow /

adsrvrbot

Rule Path
Disallow /

adstxt.com

Rule Path
Disallow /

adstxtcrawler

Rule Path
Disallow /

appnexusadstxtcrawler

Rule Path
Disallow /

gumgumadstxtcrawler

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

akka-http

Rule Path
Disallow /

alphaseobot

Rule Path
Disallow /

aspiegelbot

Rule Path
Disallow /

awariorssbot

Rule Path
Disallow /

awariosmartbot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

barkrowler

Rule Path
Disallow /

bubing

Rule Path
Disallow /

bidswitchbot

Rule Path
Disallow /

bidtellect

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

brandverity

Rule Path
Disallow /

bublupbot

Rule Path
Disallow /

builtwith

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

centro ads.txt crawler

Rule Path
Disallow /

checkmarknetwork

Rule Path
Disallow /

clickagy intelligence bot v2

Rule Path
Disallow /

cliqzbot

Rule Path
Disallow /

commonscan

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

dataprovider

Rule Path
Disallow /

daum

Rule Path
Disallow /

dcrawl

Rule Path
Disallow /

deepcrawl

Rule Path
Disallow /

domaincrawler

Rule Path
Disallow /

domainstatsbot

Rule Path
Disallow /

domcopbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

extlinksbot

Rule Path
Disallow /

femtosearchbot

Rule Path
Disallow /

garlik

Rule Path
Disallow /

getintent crawler

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

g-i-g-a-b-o-t

Rule Path
Disallow /

gluten free crawler

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

grammarly

Rule Path
Disallow /

hatena antenna

Rule Path
Disallow /

hexometer

Rule Path
Disallow /

http banner detection

Rule Path
Disallow /

hubpages

Rule Path
Disallow /

ias_crawler

Rule Path
Disallow /

infotigerbot

Rule Path
Disallow /

istellabot

Rule Path
Disallow /

jersey

Rule Path
Disallow /

keybot

Rule Path
Disallow /

linespider

Rule Path
Disallow /

linguee

Rule Path
Disallow /

linkdex

Rule Path
Disallow /

linkdexbot

Rule Path
Disallow /

ltx71

Rule Path
Disallow /

macocu

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

mail.ru

Rule Path
Disallow /

mail.ru_bot

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

mbcrawler

Rule Path
Disallow /

megaindex.ru

Rule Path
Disallow /

megaindex.com

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

moatbot

Rule Path
Disallow /

mtrobot

Rule Path
Disallow /

netcraftsurveyagent

Rule Path
Disallow /

okhttp

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

panscient

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

pinterestbot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

proximic

Rule Path
Disallow /

psbot

Rule Path
Disallow /

riddler

Rule Path
Disallow /

quick-crawler

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

screaming frog seo spider

Rule Path
Disallow /

scooperbot

Rule Path
Disallow /

seekport crawler

Rule Path
Disallow /

semanticscholarbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

seobility

Rule Path
Disallow /

seokicks

Rule Path
Disallow /

serpstatbot

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

siteexplorer

Rule Path
Disallow /

smtbot

Rule Path
Disallow /

sogou spider

Rule Path
Disallow /

spbot

Rule Path
Disallow /

spiderling

Rule Path
Disallow /

surdotlybot

Rule Path
Disallow /

the knowledge ai

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

ttd-content

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

uclassify

Rule Path
Disallow /

webdatastats

Rule Path
Disallow /

yandex

Rule Path
Disallow /

yandexbot

Rule Path
Disallow /

zombiebot

Rule Path
Disallow /

zoombot

Rule Path
Disallow /

linkbot

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

*

Rule Path
Disallow /wp-login.php
Disallow /xmlrpc.php
Disallow /lists/

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://www.extremeoverclocking.com/sitemap.xml.gz

Comments

  • Category: Chinese Search Engine
  • Hits: Minimal - robots & random URLs
  • URL: https://www.so.com/
  • Category: Commercial Web Scrape / Web Crawl Company Crap
  • Hits: Unknown
  • URL: http://80legs.com/the-80legs-web-crawler/
  • Category: Commercial Advertising Crap
  • Hits: Moderate - robots
  • URL: https://www.adbeat.com/operation_policy
  • Category: Commercial Advertising Crap
  • Hits: Moderate - Random URLs
  • URL: https://www.admantx.com/
  • User-agent: admantx
  • Disallow: /
  • Category: Commercial SEO Crap
  • Hits: Minimal
  • URL: https://seostar.co/robot/
  • Category: Unknown
  • Hits: Unknown
  • URL: Unknown
  • Category: Commercial Company Aggregating ads.txt
  • Hits: Minimal - ads
  • URL: https://www.adstxt.com/
  • Category: Unknown Sites Scanning ads.txt
  • Hits: Minimal - ads
  • URL: https://github.com/InteractiveAdvertisingBureau/adstxtcrawler
  • Category: Commercial Marketing Link Crap
  • Hits: Minimal - robots
  • URL: https://ahrefs.com/
  • Category: Distributed JVM App
  • Hits: Minimal - ads
  • URL: https://akka.io/
  • Category: Commercial SEO Crap
  • Hits: Minimal - Index
  • URL: http://alphaseobot.com/bot.html
  • AKA: AlphaBot
  • Category: Huawei Web Crawler
  • Hits: High - robots
  • URL: https://aspiegel.com/
  • Category: Commercial Social Media monitoring
  • Hits: Minimal - Non-working RSS Links
  • URL: https://awario.com/bots.html
  • Category: Chinese Search Engine
  • Category: Commercial Data Mining
  • Hits: Unknown
  • URL: https://www.exensa.com/
  • Category: Commercial Advertising Crap
  • Hits: Excessive - robots & ads
  • URL: https://www.bidswitch.com/
  • Category: Commercial Advertising Crap
  • Hits: Minimal - ads
  • URL: https://bidtellect.com/
  • Category: Commercial SEO Backlink Crap
  • Hits: Moderate - robots & random URLs
  • URL: http://webmeup-crawler.com/
  • Category: Commercial Brand Protection Crap
  • Hits: Moderate - random URLs
  • URL: https://www.brandverity.com/why-is-brandverity-visiting-me
  • Category: Commercial Pinterest Wannabe
  • Hits: Minimal - Random URLs
  • URL: https://www.bublup.com/bublup-bot
  • Category: Lists what technologies it finds sites built with
  • Hits: Light - robots
  • URL: https://builtwith.com/
  • Category: Non-Profit Data Harvesting
  • Hits: Lots - robots & random URLs
  • URL: http://commoncrawl.org/big-picture/frequently-asked-questions/
  • Category: Commercial Advertising Crap
  • Hits: Minimal - ads
  • URL: https://www.centro.net/
  • Category: Commercial Brand monitoring
  • Hits: Minimal - robots & index
  • URL: https://www.checkmarknetwork.com/
  • Category: Commercial Data Mining Crap
  • Hits: Mild - robots
  • URL: https://www.clickagy.com/
  • Category: Commercial German Browser / Search Engine
  • Hits: Unknown
  • URL: https://cliqz.com/en/cliqzbot
  • Category: Shady Vulnerability Scanner
  • Hits: Minimal - index
  • URL: https://commonscan.org/
  • Category: SEO Crap
  • Hits: Excessive - robots & random URLs
  • URL: https://dataforseo.com/dataforseo-bot
  • Category: Commercial Analytics Company
  • Hits: Unknown
  • URL: https://www.dataprovider.com/
  • Category: Korean Search Engine
  • Hits: Minimal - robots & URLs it shouldn't index
  • URL: https://www.daum.net/
  • Category: Domain Harvester
  • Hits: Minimal - random URLs
  • URL: https://github.com/kgretzky/dcrawl
  • Category: Commercial SEO Marketing Crap
  • Hits: Unknown
  • URL: https://www.deepcrawl.com/bot/
  • Category: Commercial SEO Harvesting
  • Hits: Excessive - robots & index
  • URL: http://www.domaincrawler.com/
  • Category: Commercial Backlink, Metrics, Rankings, etc...
  • Hits: Moderate - robots & random URLs (some broken / shouldn't index)
  • URL: https://domainstats.com/
  • Category: Expired Domain Bot?
  • Hits: Minimal - robots & random URLs
  • URL: https://www.domcop.com/bot
  • Category: Commercial Backlink Crap
  • Hits: ABUSIVE - pounding robots
  • URL: https://moz.com/
  • Category: Commercial Marketing Crap
  • Hits: Minimal - random URLs
  • URL: https://www.exalead.com
  • Category: Unknown (Website Down)
  • Hits: Unknown
  • URL: https://extlinks.com/Bot.html
  • Category: "New" Search Engine for Maximum Privacy
  • Hits: Unknown
  • URL: http://femtosearch.com/
  • Category:
  • Hits: Minimal - robots & index
  • URL: https://garlik.com/
  • Category: Commercial Ad Network
  • Hits: Moderate - robots, ads & random URLs
  • URL: https://getintent.com/bot.html
  • Category: Gigablast Search Engine
  • Hits: Unknown
  • URL: https://www.gigablast.com/
  • Category: Crawling Project
  • Hits: Moderate - index & random URLs
  • URL: http://glutenfreepleasure.com/
  • ChatGPT
  • URL: https://openai.com/gptbot
  • Category: Spell Checker - Indexing?
  • Hits: Moderate - Random URLs
  • URL: https://www.grammarly.com/
  • Category: Commercial Contextual Intelligence Crap
  • Hits: Unknown
  • URL: https://www.grapeshot.com/crawler/
  • User-agent: grapeshot
  • Disallow: /
  • Category: Commercial Japanese Marketing Firm
  • Hits: Moderate - index and random - skips robots!
  • URL: http://hatenaantenna.g.hatena.ne.jp/
  • Category: Commercial Website Audit & Monitoring
  • Hits: Unknown
  • URL: https://hexometer.com/
  • Category: Chinese GeoIP Wannabe
  • Hits: Minimal - index
  • URL: https://en.ipip.net/
  • Category: Random Blogs
  • Hits: Moderate - Random URLs
  • URL: https://hubpages.com/
  • Category: Commercial Advertising Crap
  • Hits: ABUSIVE - robots
  • URL: https://integralads.com/site-indexing-policy/
  • Category: German Search Engine
  • Hits: Minimal
  • URL: https://infotiger.com/bot
  • Category: Italian ISP
  • Hits: Unknown
  • URL: https://www.tiscali.it/
  • Category: Java based HTTP client
  • Hits: Moderate - ads
  • URL: https://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html
  • Category: Chinese Translation Site
  • Hits: Unknown
  • URL: https://www.keybot.com/
  • Category: Crap
  • Hits: Unknown
  • URL: https://line.me/en/
  • Category: Translation Bot
  • Hits: Unknown
  • URL: https://www.linguee.com/
  • Category: Commercial Link Indexer
  • Hits: Unknown
  • URL: https://www.linkdex.com/en-us/about/bots/
  • Category: Unknown - "security research purposes"
  • Hits: Moderate - robots & random URLs
  • URL: http://ltx71.com/
  • MaCoCu - Some BS student project
  • URL: https://www.clarin.si/
  • Category: Commercial Social Media Monitoring
  • Hits: Minimal - random URLs
  • URL: https://www.brandwatch.com/legal/magpie-crawler/
  • Category: Russian Mail / Social / Other Crap
  • Hits: Moderate - robots
  • URL: http://go.mail.ru/help/robots
  • Category: Unknown
  • Hits: Minimal - robots
  • URL: Unknown
  • Category: Commercial Backlinks Crawler
  • Hits: Minimal - robots
  • URL: https://monitorbacklinks.com
  • Category: Commercial Russian SEO Crap
  • Hits: Minimal - index
  • URL: https://megaindex.com/
  • Category: Commercial SEO Marketing Crap
  • Hits: ABUSIVE - robots
  • URL: https://mj12bot.com/
  • Category: Commercial Analytics Crap
  • Hits: Minimal - robots & random URLs
  • URL: https://moat.com/
  • Category: UK Search Engine
  • Hits: Moderate - robots
  • URL: https://www.mojeek.com/bot.html
  • NOTE: See how they behave....
  • User-agent: MojeekBot
  • Disallow: /
  • Category: SEO Crap
  • Hits: Moderate - robots & random URLs
  • URL: https://metrics-tools.de/robot.html
  • Category: Commercial Metrics Crap
  • Hits: Unknown
  • URL: https://www.netcraft.com/
  • Category: A HTTP client for Android, Kotlin, and Java
  • Hits: Unknown
  • URL: https://square.github.io/okhttp/
  • Category: A vertical search engine
  • Hits: Minimal - Random URLs
  • URL: http://omgili.com/Crawler.html
  • Category: Commercial Data Mining Crap
  • Hits: Unknown
  • URL: https://panscient.com/faq.htm
  • Category: Huawei Search Engine
  • Hits: High - robots
  • URL: https://aspiegel.com/
  • Category: Commercial Site
  • Hits: Moderate - Random URLs & Robots
  • URL: http://www.pinterest.com/bot.html
  • Category: Commercial Data Mining Crap
  • Hits: Unknown
  • URL: https://pipl.com/bot/
  • Category: Commercial Advertising
  • Hits: Moderate - robots & ads
  • URL: https://www.comscore.com/
  • Category: Commercial Pic Search Indexer
  • Hits: Unknown
  • URL: https://www.picsearch.com/bot.html
  • Category: F-Secure Research Crap
  • Hits: Moderate - Random URLs
  • URL: http://riddler.io/about
  • Category: Commercial web scraper for hire.
  • Hits: Unknown
  • URL: https://scrapinghub.com/
  • Category: Commercial SEO Spider Software
  • Hits: Minimal - Random URLs
  • URL: https://www.screamingfrog.co.uk/
  • Category: Commercial Media Intelligence Crap
  • Hits: Minimal - robots & random URLs
  • URL: http://www.carma.com
  • Category: German Search Engine?
  • Hits: Excessive
  • URL: http://seekport.com/
  • Semantic Scholar - Looking for academic PDFs
  • Category: Commercial Marketing Crap
  • Hits: Minimal - robots
  • URL: https://www.semrush.com/bot/
  • Category: Commercial SEO Garbage
  • URL: https://www.seobility.net/en/bot/
  • Category: Commercial Backlink Checker
  • Hits: Minimal - robots
  • URL: https://en.seokicks.de/
  • Category: Commercial SEO Crap
  • Hits: Unknown
  • URL: https://serpstat.com/
  • Category: Czech Portal / Search Engine
  • Hits: Minimal - robots
  • URL: https://napoveda.seznam.cz/en/seznamcz-web-search/
  • Category: Unknown (Website Down) - Backlink Checker
  • Hits: Unknown
  • URL: https://siteexplorer.info
  • Category: Commercial Advertising Marketing Crap
  • Hits: Minimal - robots & index
  • URL: http://www.similartech.com/smtbot
  • Category: Chinese Search Engine
  • Category: Commercial SEO Solution Crap
  • Hits: Unknown
  • URL: https://www.seoprofiler.com/
  • Category: Commercial Language Processing
  • Hits: Moderate - robots
  • URL: https://nlp.fi.muni.cz/projects/biwec/
  • Category: Some Commercial Crap
  • Hits: Moderate
  • URL: http://sur.ly/bot.html
  • Category: Unknown
  • Hits: Moderate - robots & random pages
  • URL: Unknown
  • Category: Commercial Social media monitoring & analytics
  • Hits: ABUSIVE - robots
  • URL: http://www.trendiction.com/en/publisher/bot
  • Category: Commercial Advertising Crap
  • Hits: Moderate - Random URLs
  • URL: https://www.thetradedesk.com/us/ttd-content
  • Category: Helps edu prevent plagiarism
  • Hits: Minimal
  • URL: https://turnitin.com/robot/crawlerinfo.html
  • Category: Commercial Machine Learning Text Classifier
  • Hits: Unknown
  • URL: https://www.uclassify.com/
  • Category: Commercial Russian CMS Detector Crap
  • Hits: Unknown
  • URL: https://webdatastats.com/policy.html
  • Category: Russian Search Engine
  • Category: Backlink Checker
  • Hits: Minimal - random URLs
  • URL: http://www.zombiedomain.net/robot/
  • Category: Commercial Italian SEO Crap
  • Hits: Moderate - Random URLs
  • URL: https://suite.seozoom.it/
  • Category: Commercial Advertising Crap
  • Hits: Excessive - robots
  • URL: https://www.zoominfo.com/

Warnings

  • 4 invalid lines.