public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: FYI: RAID5 unusably unstable through 2.6.14
Date: Thu, 02 Feb 2006 17:10:23 -0500	[thread overview]
Message-ID: <43E2834F.8040009@tmr.com> (raw)
In-Reply-To: <43CD8A19.3010100@cfl.rr.com>

[-- Attachment #1: Type: text/plain, Size: 3117 bytes --]

Phillip Susi wrote:
> Your understanding of statistics leaves something to be desired.  As you 
> add disks the probability of a single failure is grows linearly, but the 
> probability of double failure grows much more slowly.  For example:
> 
> If 1 disk has a 1/1000 chance of failure, then
> 2 disks have a (1/1000)^2 chance of double failure, and
> 3 disks have a (1/1000)^2 * 3 chance of double failure
> 4 disks have a (1/1000)^2 * 7 chance of double failure

After the first drive fails you have no redundancy, the chance of an 
additional failure is linear to the number of remaining drives.

Assume:
   p - probability of a drive failing in unit time
   n - number of drives
   F - probability of double failure

The chance of a single drive failure is n*p. After that you have a new 
"independent trial" for the failure any one of n-1 drives, so the chance 
of a double drive failure is actually:
   F = (n*p) * (n-1)*p

But wait, there's more:
   p - chance of a drive failing in unit time
   n - number of drives
   R - the time to rebuild to a hot spare in the same units as p
   F - probability of double failure

So:

   F = n*p * (n-1)*(R * p)

If you rebuild a track at a time, each track takes the time to read the 
slowest drive plus the time to write the spare. If the array remains in 
use load increases those times.

And the ugly part is that p is changing all the time, there's infant 
mortality on new drives, fairly constant electronic probability and 
increasing probability of mechanical failure over time. If all of your 
drives are the same age they are less reliable than mixed age drives.

> 
> Thus the probability of double failure on this 4 drive array is ~142 
> times less than the odds of a single drive failing.  As the probably of 
> a single drive failing becomes more remote, then the ratio of that 
> probability to the probability of double fault in the array grows 
> exponentially.
> 
> ( I think I did that right in my head... will check on a real calculator 
>  later )
> 
> This is why raid-5 was created: because the array has a much lower 
> probabiliy of double failure, and thus, data loss, than a single drive. 
>  Then of course, if you are really paranoid, you can go with raid-6 ;)

If you're paranoid you mirror over two RAID-5 arrays. The mirrors are on 
independent controllers. RAID-10.

> 
> 
> Michael Loftis wrote:
> 
>> Absolutely not.  The more spindles the more chances of a double 
>> failure. Simple statistics will mean that unless you have mirrors the 
>> more drives you add the more chance of two of them (really) failing at 
>> once and choking the whole system.
>>
>> That said, there very well could be (are?) cases where md needs to do 
>> a better job of handling the world unravelling.
>> -
A small graph of the effect of the rebuild time on RAID-5 attached, it 
assumes probability of failure = 1/1000 per the original post, for 
various rebuild times the probability of failure drops.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

[-- Attachment #2: 2dfail-2.png --]
[-- Type: image/png, Size: 3106 bytes --]

  parent reply	other threads:[~2006-02-02 22:08 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-17 19:35 FYI: RAID5 unusably unstable through 2.6.14 Cynbe ru Taren
2006-01-17 19:39 ` Benjamin LaHaise
2006-01-17 20:13   ` Martin Drab
2006-01-17 23:39     ` Michael Loftis
2006-01-18  2:30       ` Martin Drab
2006-02-02 20:33     ` Bill Davidsen
2006-02-03  0:57       ` Martin Drab
2006-02-03  1:13         ` Martin Drab
2006-02-03 15:41         ` Phillip Susi
2006-02-03 16:13           ` Martin Drab
2006-02-03 16:38             ` Phillip Susi
2006-02-03 17:22               ` Roger Heflin
2006-02-03 19:38                 ` Phillip Susi
2006-02-03 17:51             ` Martin Drab
2006-02-03 19:10               ` Roger Heflin
2006-02-03 19:12                 ` Martin Drab
2006-02-03 19:41                   ` Phillip Susi
2006-02-03 19:45                     ` Martin Drab
2006-01-17 19:56 ` Kyle Moffett
2006-01-17 19:58 ` David R
2006-01-17 20:00 ` Kyle Moffett
2006-01-17 23:27 ` Michael Loftis
2006-01-18  0:12   ` Kyle Moffett
2006-01-18 11:24     ` Erik Mouw
2006-01-18  0:21   ` Phillip Susi
2006-01-18  0:29     ` Michael Loftis
2006-01-18  2:10       ` Phillip Susi
2006-01-18  3:01         ` Michael Loftis
2006-01-18 16:49           ` Krzysztof Halasa
2006-01-18 16:47         ` Krzysztof Halasa
2006-02-02 22:10     ` Bill Davidsen [this message]
2006-02-08 21:58       ` Pavel Machek
2006-01-18 10:54 ` Helge Hafting
2006-01-18 16:15   ` Mark Lord
2006-01-18 17:32     ` Alan Cox
2006-01-19 15:59       ` Mark Lord
2006-01-19 16:25         ` Alan Cox
2006-02-08 14:46           ` Alan Cox
2006-01-18 23:37     ` Neil Brown
2006-01-19 15:53       ` Mark Lord
2006-01-19  0:13 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2006-02-03 17:00 Salyzyn, Mark
2006-02-03 17:39 ` Martin Drab
2006-02-03 19:46 ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43E2834F.8040009@tmr.com \
    --to=davidsen@tmr.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox