Re: disks becoming slow but not explicitly failing anyone?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bill Davidsen <davidsen@tmr.com>
To: Nix <nix@esperi.org.uk>
Cc: Mark Hahn <hahn@physics.mcmaster.ca>, linux-raid@vger.kernel.org
Subject: Re: disks becoming slow but not explicitly failing anyone?
Date: Thu, 04 May 2006 20:45:23 -0400	[thread overview]
Message-ID: <445AA023.6070406@tmr.com> (raw)
In-Reply-To: <874q0irdah.fsf@hades.wkstn.nix>

Nix wrote:

>On 23 Apr 2006, Mark Hahn stipulated:
>  
>
>>>I've seen a lot of cheap disks say (generally deep in the data sheet
>>>that's only available online after much searching and that nobody ever
>>>reads) that they are only reliable if used for a maximum of twelve hours
>>>a day, or 90 hours a week, or something of that nature. Even server
>>>      
>>>
>>I haven't, and I read lots of specs.  they _will_ sometimes say that 
>>non-enterprise drives are "intended" or "designed" for a 8x5 desktop-like
>>usage pattern.
>>    
>>
>
>That's the phrasing, yes: foolish me assumed that meant `if you leave it
>on for much longer than that, things will go wrong'.
>
>  
>
>>                to the normal way of thinking about reliability, this would 
>>simply mean a factor of 4.2x lower reliability - say from 1M to 250K hours
>>MTBF.  that's still many times lower rate of failure than power supplies or 
>>fans.
>>    
>>
>
>Ah, right, it's not a drastic change.
>
>  
>
>>>It still stuns me that anyone would ever voluntarily buy drives that
>>>can't be left switched on (which is perhaps why the manufacturers hide
>>>      
>>>
>>I've definitely never seen any spec that stated that the drive had to be 
>>switched off.  the issue is really just "what is the designed duty-cycle?"
>>    
>>
>
>I see. So it's just `we didn't try to push the MTBF up as far as we would
>on other sorts of disks'.
>
>  
>
>>I run a number of servers which are used as compute clusters.  load is
>>definitely 24x7, since my users always keep the queues full.  but the servers
>>are not maxed out 24x7, and do work quite nicely with desktop drives
>>for years at a time.  it's certainly also significant that these are in a 
>>decent machineroom environment.
>>    
>>
>
>Yeah; i.e., cooled. I don't have a cleanroom in my house so the RAID
>array I run there is necessarily uncooled, and the alleged aircon in the
>room housing work's array is permanently on the verge of total collapse
>(I think it lowers the temperature, but not by much).
>
>  
>
>>it's unfortunate that disk vendors aren't more forthcoming with their drive
>>stats.  for instance, it's obvious that "wear" in MTBF terms would depend 
>>nonlinearly on the duty cycle.  it's important for a customer to know where 
>>that curve bends, and to try to stay in the low-wear zone.  similarly, disk
>>    
>>
>
>Agreed! I tend to assume that non-laptop disks hate being turned on and
>hate temperature changes, so just keep them running 24x7. This seems to be OK,
>with the only disks this has ever killed being Hitachi server-class disks in
>a very expensive Sun server which was itself meant for 24x7 operation; the
>cheaper disks in my home systems were quite happy. (Go figure...)
>
>  
>
>>specs often just give a max operating temperature (often 60C!), which is 
>>almost disingenuous, since temperature has a superlinear effect on reliability.
>>    
>>
>
>I'll say. I'm somewhat twitchy about the uncooled 37C disks in one of my
>machines: but one of the other disks ran at well above 60C for *years*
>without incident: it was an old one with no onboard temperature sensing,
>and it was perhaps five years after startup that I opened that machine
>for the first time in years and noticed that the disk housing nearly
>burned me when I touched it. The guy who installed it said that yes, it
>had always run that hot, and was that important? *gah*
>
>I got a cooler for that disk in short order.
>
>  
>
>>a system designer needs to evaluate the expected duty cycle when choosing
>>disks, as well as many other factors which are probably more important.
>>for instance, an earlier thread concerned a vast amount of read traffic 
>>to disks resulting from atime updates.
>>    
>>
>
>Oddly, I see a steady pulse of write traffic, ~100Kb/s, to one dm device
>(translating into read+write on the underlying disks) even when the
>system is quiescient, all daemons killed, and all fsen mounted with
>noatime. One of these days I must fish out blktrace and see what's
>causing it (but that machine is hard to quiesce like that: it's in heavy
>use).
>
>  
>
>>simply using more disks also decreases the load per disk, though this is 
>>clearly only a win if it's the difference in staying out of the disks 
>>"duty-cycle danger zone" (since more disks divide system MTBF).
>>    
>>
>
>Well, yes, but if you have enough more you can make some of them spares
>and push up the MTBF again (and the cooling requirements, and the power
>consumption: I wish there was a way to spin down spares until they were
>needed, but non-laptop controllers don't often seem to provide a way to
>spin anything down at all that I know of).
>
>  
>
hdparam will let you set the spindown time. I have all mine set that way 
for power and heat reasons, they tend to be in burst use. Dropped the CR 
temp by enough to notice, but I need some more local cooling for that 
room still.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

next prev parent reply	other threads:[~2006-05-05  0:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-22 20:05 disks becoming slow but not explicitly failing anyone? Carlos Carvalho
2006-04-23  0:45 ` Mark Hahn
2006-04-23 13:38   ` Nix
2006-04-23 18:04     ` Mark Hahn
2006-04-24 19:20       ` Nix
2006-05-05  0:45         ` Bill Davidsen [this message]
2006-04-27  3:31 ` Konstantin Olchanski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=445AA023.6070406@tmr.com \
    --to=davidsen@tmr.com \
    --cc=hahn@physics.mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.