fourmilab.ch
robots.txt

Robots Exclusion Standard data for fourmilab.ch

Resource Scan

Scan Details

Site Domain fourmilab.ch
Base Domain fourmilab.ch
Scan Status Ok
Last Scan2024-11-06T13:52:34+00:00
Next Scan 2024-12-06T13:52:34+00:00

Last Scan

Scanned2024-11-06T13:52:34+00:00
URL https://fourmilab.ch/robots.txt
Domain IPs 2a05:d014:d43:3101:94aa:a276:e035:6a2a, 52.28.236.0
Response IP 52.28.236.0
Found Yes
Hash 3ecb409681e0ae0c25654adccb0c7048c845b2d94362b0791328debb3914a7c4
SimHash fa1e0944c8f5

Groups

*

Rule Path Comment
Disallow /bullets/zounds Audio files for illustration only
Disallow /cgi-bin/ Dynamic services
Disallow /documents/DOS/ Denial of service countermeasures
Disallow /hotbits/figures Figures for illustration only
Disallow /hotbits/source Source code
Disallow /earthview/cache/ Ephemeral files
Disallow /earthview/satellite.html Satellite orbital elements
Disallow /entrenous/ Files for specific people
Disallow /etexts/www/gergel Mirror of Gergel PDF books
Disallow /goldberg/ Under construction
Disallow /netfone/ Renamed to /speakfree
Disallow /serverstats/ Changes every day, confusing
Disallow /sitemap.html Temporary link to home page
Disallow /speakfree/unix/prior-releases Obsolete releases
Disallow /speakfree/windows/prior-releases Obsolete releases
Disallow /uscode/8usc/www/ Bulk text of 8 USC
Disallow /uscode/26usc/www/ Bulk text of 26 USC
Disallow /ustax/ Renamed to /uscode/26usc
Disallow /yoursky/catalogues/ Yoursky object catalogues

npbot

Rule Path Comment
Disallow / Nameprotect.com spybot blocking

twiceler

Product Comment
twiceler Moronic Twiceler cgi-bin chaser
Rule Path
Disallow /

twitterbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

Comments

  • robots.txt for http://www.fourmilab.ch
  • Twitterbot blasts in dozens of requests in seconds,
  • AhrefsBot generates lots of traffic and provides no
  • benefit to sites it scrapes. It is also known for
  • ignoring robots.txt, so we also block it in .htaccess.
  • MJ12bot generates a lot of traffic to no benefit.
  • It claims to respect robots.txt. Let's see.
  • DotBot (Moz.com) crawls indiscriminately to no
  • benefit but their own.
  • MauiBot crawls indiscriminately and quickly, and
  • nobody knows what it is. Comes from an AWS
  • address range.
  • SEMrushBot is up to no good. In December 2019, it accounted for
  • almost 4% of all hits on the site.
  • TurnitinBot is a "plagiarism" checker that hits the site
  • indiscriminately.