Re: disks becoming slow but not explicitly failing anyone?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bill Davidsen <davidsen@tmr.com>
To: Nix <nix@esperi.org.uk>
Cc: Mark Hahn <hahn@physics.mcmaster.ca>, linux-raid@vger.kernel.org
Subject: Re: disks becoming slow but not explicitly failing anyone?
Date: Thu, 04 May 2006 20:45:23 -0400	[thread overview]
Message-ID: <445AA023.6070406@tmr.com> (raw)
In-Reply-To: <874q0irdah.fsf@hades.wkstn.nix>

Nix wrote:

>On 23 Apr 2006, Mark Hahn stipulated:
>  
>
>>>I've seen a lot of cheap disks say (generally deep in the data sheet
>>>that's only available online after much searching and that nobody ever
>>>reads) that they are only reliable if used for a maximum of twelve hours
>>>a day, or 90 hours a week, or something of that nature. Even server
>>>      
>>>
>>I haven't, and I read lots of specs.  they _will_ sometimes say that 
>>non-enterprise drives are "intended" or "designed" for a 8x5 desktop-like
>>usage pattern.
>>    
>>
>
>That's the phrasing, yes: foolish me assumed that meant `if you leave it
>on for much longer than that, things will go wrong'.
>
>  
>
>>                to the normal way of thinking about reliability, this would 
>>simply mean a factor of 4.2x lower reliability - say from 1M to 250K hours
>>MTBF.  that's still many times lower rate of failure than power supplies or 
>>fans.
>>    
>>
>
>Ah, right, it's not a drastic change.
>
>  
>
>>>It still stuns me that anyone would ever voluntarily buy drives that
>>>can't be left switched on (which is perhaps why the manufacturers hide
>>>      
>>>
>>I've definitely never seen any spec that stated that the drive had to be 
>>switched off.  the issue is really just "what is the designed duty-cycle?"
>>    
>>
>
>I see. So it's just `we didn't try to push the MTBF up as far as we would
>on other sorts of disks'.
>
>  
>
>>I run a number of servers which are used as compute clusters.  load is
>>definitely 24x7, since my users always keep the queues full.  but the servers
>>are not maxed out 24x7, and do work quite nicely with desktop drives
>>for years at a time.  it's certainly also significant that these are in a 
>>decent machineroom environment.
>>    
>>
>
>Yeah; i.e., cooled. I don't have a cleanroom in my house so the RAID
>array I run there is necessarily uncooled, and the alleged aircon in the
>room housing work's array is permanently on the verge of total collapse
>(I think it lowers the temperature, but not by much).
>
>  
>
>>it's unfortunate that disk vendors aren't more forthcoming with their drive
>>stats.  for instance, it's obvious that "wear" in MTBF terms would depend 
>>nonlinearly on the duty cycle.  it's important for a customer to know where 
>>that curve bends, and to try to stay in the low-wear zone.  similarly, disk
>>    
>>
>
>Agreed! I tend to assume that non-laptop disks hate being turned on and
>hate temperature changes, so just keep them running 24x7. This seems to be OK,
>with the only disks this has ever killed being Hitachi server-class disks in
>a very expensive Sun server which was itself meant for 24x7 operation; the
>cheaper disks in my home systems were quite happy. (Go figure...)
>
>  
>
>>specs often just give a max operating temperature (often 60C!), which is 
>>almost disingenuous, since temperature has a superlinear effect on reliability.
>>    
>>
>
>I'll say. I'm somewhat twitchy about the uncooled 37C disks in one of my
>machines: but one of the other disks ran at well above 60C for *years*
>without incident: it was an old one with no onboard temperature sensing,
>and it was perhaps five years after startup that I opened that machine
>for the first time in years and noticed that the disk housing nearly
>burned me when I touched it. The guy who installed it said that yes, it
>had always run that hot, and was that important? *gah*
>
>I got a cooler for that disk in short order.
>
>  
>
>>a system designer needs to evaluate the expected duty cycle when choosing
>>disks, as well as many other factors which are probably more important.
>>for instance, an earlier thread concerned a vast amount of read traffic 
>>to disks resulting from atime updates.
>>    
>>
>
>Oddly, I see a steady pulse of write traffic, ~100Kb/s, to one dm device
>(translating into read+write on the underlying disks) even when the
>system is quiescient, all daemons killed, and all fsen mounted with
>noatime. One of these days I must fish out blktrace and see what's
>causing it (but that machine is hard to quiesce like that: it's in heavy
>use).
>
>  
>
>>simply using more disks also decreases the load per disk, though this is 
>>clearly only a win if it's the difference in staying out of the disks 
>>"duty-cycle danger zone" (since more disks divide system MTBF).
>>    
>>
>
>Well, yes, but if you have enough more you can make some of them spares
>and push up the MTBF again (and the cooling requirements, and the power
>consumption: I wish there was a way to spin down spares until they were
>needed, but non-laptop controllers don't often seem to provide a way to
>spin anything down at all that I know of).
>
>  
>
hdparam will let you set the spindown time. I have all mine set that way 
for power and heat reasons, they tend to be in burst use. Dropped the CR 
temp by enough to notice, but I need some more local cooling for that 
room still.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

next prev parent reply	other threads:[~2006-05-05  0:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-22 20:05 disks becoming slow but not explicitly failing anyone? Carlos Carvalho
2006-04-23  0:45 ` Mark Hahn
2006-04-23 13:38   ` Nix
2006-04-23 18:04     ` Mark Hahn
2006-04-24 19:20       ` Nix
2006-05-05  0:45         ` Bill Davidsen [this message]
2006-04-27  3:31 ` Konstantin Olchanski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=445AA023.6070406@tmr.com \
    --to=davidsen@tmr.com \
    --cc=hahn@physics.mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).