linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* from 2x RAID1 to 1x RAID6 ?
@ 2011-06-07 18:12 Stefan G. Weichinger
  2011-06-07 20:07 ` Maurice Hilarius
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Stefan G. Weichinger @ 2011-06-07 18:12 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org


Greetings, could you please advise me how to proceed?

On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:

md5 : active raid1 sde1[0] sdf1[1]
      976759936 blocks [2/2] [UU]

md6 : active raid1 sdh1[1] sdg1[0]
      976759936 blocks [2/2] [UU]


md5 and md6 are right now physical volumes (PVs) in an LVM-volume-group.
Nearly all the space is used right now (1.7 TB out of the ~2 TB).

Now I would like to move things to a more reliable RAID6 consisting of
all the four TB-drives ...

How to do that with minimum risk?

For sure it would be best to move all data aside, stop the arrays and
build a new one ... etc

Failing two drives and remove them from the RAID1s to build a new
degraded RAID6 seems dangerous to me?

Maybe I overlook a clever alternative?

Suggestions welcome, thanks in advance.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-07 18:12 from 2x RAID1 to 1x RAID6 ? Stefan G. Weichinger
@ 2011-06-07 20:07 ` Maurice Hilarius
  2011-06-07 23:59   ` Thomas Harold
  2011-06-08  1:16 ` John Robinson
  2011-06-08  9:43 ` David Brown
  2 siblings, 1 reply; 15+ messages in thread
From: Maurice Hilarius @ 2011-06-07 20:07 UTC (permalink / raw)
  To: lists, linux-raid

On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
> Greetings, could you please advise me how to proceed?
>
> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>
> ..
>
> Now I would like to move things to a more reliable RAID6 consisting of
> all the four TB-drives ...
>
> How to do that with minimum risk?
>
> ..
> Maybe I overlook a clever alternative?

RAID 10 is as secure, and risk free, and much faster.
And will cause much less CPU load.

-- 
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-07 20:07 ` Maurice Hilarius
@ 2011-06-07 23:59   ` Thomas Harold
  2011-06-08  8:06     ` Stefan G. Weichinger
  2011-06-08  9:38     ` David Brown
  0 siblings, 2 replies; 15+ messages in thread
From: Thomas Harold @ 2011-06-07 23:59 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: lists, linux-raid

On 6/7/2011 4:07 PM, Maurice Hilarius wrote:
> On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
>> Greetings, could you please advise me how to proceed?
>>
>> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>>
>> ..
>>
>> Now I would like to move things to a more reliable RAID6 consisting of
>> all the four TB-drives ...
>>
>> How to do that with minimum risk?
>>
>> ..
>> Maybe I overlook a clever alternative?
>
> RAID 10 is as secure, and risk free, and much faster.
> And will cause much less CPU load.
>

Well, with both a pair of RAID1 arrays and a pair of RAID-10 arrays, you 
can lose 2 disks without losing data, but only if the right 2 disks fail.

With RAID6, any two of the four can fail without data loss.

(I still prefer RAID-10 over RAID-6 unless space is at an absolute 
premium.  But for a four-disk setup, net disk space is the same and it's 
just a question of whether you want the speed of RAID-10 or the 
reliability of RAID-6.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-07 18:12 from 2x RAID1 to 1x RAID6 ? Stefan G. Weichinger
  2011-06-07 20:07 ` Maurice Hilarius
@ 2011-06-08  1:16 ` John Robinson
  2011-06-08  8:16   ` Stefan G. Weichinger
  2011-06-08  9:43 ` David Brown
  2 siblings, 1 reply; 15+ messages in thread
From: John Robinson @ 2011-06-08  1:16 UTC (permalink / raw)
  To: lists; +Cc: linux-raid@vger.kernel.org

On 07/06/2011 19:12, Stefan G. Weichinger wrote:
>
> Greetings, could you please advise me how to proceed?
>
> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>
> md5 : active raid1 sde1[0] sdf1[1]
>        976759936 blocks [2/2] [UU]
>
> md6 : active raid1 sdh1[1] sdg1[0]
>        976759936 blocks [2/2] [UU]
>
>
> md5 and md6 are right now physical volumes (PVs) in an LVM-volume-group.
> Nearly all the space is used right now (1.7 TB out of the ~2 TB).
>
> Now I would like to move things to a more reliable RAID6 consisting of
> all the four TB-drives ...
>
> How to do that with minimum risk?
>
> For sure it would be best to move all data aside, stop the arrays and
> build a new one ... etc
>
> Failing two drives and remove them from the RAID1s to build a new
> degraded RAID6 seems dangerous to me?
>
> Maybe I overlook a clever alternative?
>
> Suggestions welcome, thanks in advance.

There may be a clever alternative, retaining single redundancy, if you 
don't mind buying one more disc, which I'm guessing you might do soon 
anyway as you're already 85% full. Or if not, it won't do too much harm 
to have a spare drive sitting on a shelf.

You can convert a 2-drive RAID1 to a 2-drive RAID5, then add the new 
drive to double the size of the array, resize the PV, then move the PEs 
over from the other RAID1, then tear down that PV and RAID1, add one or 
both of those drives into the RAID5 and grow it to a RAID6. The only 
step at which you have a little less redundancy is while you're running 
the 3-drive RAID5 (well, it's still 1 drive but against 2 drives, 
instead of 1:1).

On the other hand it might be easier to take a backup, which you 
probably ought to do anyway!

Cheers,

John.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-07 23:59   ` Thomas Harold
@ 2011-06-08  8:06     ` Stefan G. Weichinger
  2011-06-08  9:38     ` David Brown
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan G. Weichinger @ 2011-06-08  8:06 UTC (permalink / raw)
  To: linux-raid

Am 08.06.2011 01:59, schrieb Thomas Harold:

> Well, with both a pair of RAID1 arrays and a pair of RAID-10 arrays, you
> can lose 2 disks without losing data, but only if the right 2 disks fail.
> 
> With RAID6, any two of the four can fail without data loss.

Yes, that was my initial reason to try that.

> (I still prefer RAID-10 over RAID-6 unless space is at an absolute
> premium.  But for a four-disk setup, net disk space is the same and it's
> just a question of whether you want the speed of RAID-10 or the
> reliability of RAID-6.)

Reliability. There are backups done to that array.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08  1:16 ` John Robinson
@ 2011-06-08  8:16   ` Stefan G. Weichinger
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan G. Weichinger @ 2011-06-08  8:16 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

Am 08.06.2011 03:16, schrieb John Robinson:

> There may be a clever alternative, retaining single redundancy, if
> you don't mind buying one more disc, which I'm guessing you might do
> soon anyway as you're already 85% full. Or if not, it won't do too
> much harm to have a spare drive sitting on a shelf.

In fact I already have one ... but I can't use it as the 8 bays of that
server are already fully used. I could only attach that drive
temporarily with USB or so ...

> You can convert a 2-drive RAID1 to a 2-drive RAID5, then add the new 
> drive to double the size of the array, resize the PV, then move the
> PEs over from the other RAID1, then tear down that PV and RAID1, add
> one or both of those drives into the RAID5 and grow it to a RAID6.
> The only step at which you have a little less redundancy is while
> you're running the 3-drive RAID5 (well, it's still 1 drive but
> against 2 drives, instead of 1:1).

Clever idea, yes ... but a rather long way somehow ...

> On the other hand it might be easier to take a backup, which you 
> probably ought to do anyway!

Yep, I assume it will be that way: mv data aside, new array, data back
in ...

Thanks anyway, Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-07 23:59   ` Thomas Harold
  2011-06-08  8:06     ` Stefan G. Weichinger
@ 2011-06-08  9:38     ` David Brown
  2011-06-08 10:11       ` John Robinson
  1 sibling, 1 reply; 15+ messages in thread
From: David Brown @ 2011-06-08  9:38 UTC (permalink / raw)
  To: linux-raid

On 08/06/2011 01:59, Thomas Harold wrote:
> On 6/7/2011 4:07 PM, Maurice Hilarius wrote:
>> On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
>>> Greetings, could you please advise me how to proceed?
>>>
>>> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>>>
>>> ..
>>>
>>> Now I would like to move things to a more reliable RAID6 consisting of
>>> all the four TB-drives ...
>>>
>>> How to do that with minimum risk?
>>>
>>> ..
>>> Maybe I overlook a clever alternative?
>>
>> RAID 10 is as secure, and risk free, and much faster.
>> And will cause much less CPU load.
>>
>
> Well, with both a pair of RAID1 arrays and a pair of RAID-10 arrays, you
> can lose 2 disks without losing data, but only if the right 2 disks fail.
>
> With RAID6, any two of the four can fail without data loss.
>

It /sounds/ like RAID6 is more reliable here because it can always 
survive a second disk failure, while with RAID10 you have only a 66% 
chance of surviving a second disk failure.

However, how often does a disk fail?  What is the chance of a random 
disk failure in a given space of time?  And how long will it go between 
one disk failing, and it being replaced and the array rebuilt?  If you 
figure out these numbers, you'll have the probability of losing your 
RAID10 array due to the second critical disk failing.

To pick some rough numbers - say you've got low reliability, cheap disks 
with a 500,000 hour MTBF.  If it takes you 3 days to replace a disk 
(over the weekend), and 8 hours to rebuild, you have a risk period of 80 
hours.  That gives you a 0.016% chance of having the second disk 
failing.  Even if you consider that a rebuild is quite stressful on the 
critical disk, it's not a big risk.

Compare that to the chance of losing data through other causes (fire, 
theft, user-error, motherboard failure, power supply problems, etc., 
etc.) and in reality the "higher risk" of RAID10 compared to RAID6 is a 
drop in the ocean.  RAID10 is /far/ from being the weak point in a 
typical server.

And you can also take into account that the disk usage patterns on RAID6 
are a lot more intensive and stressful on the disk than RAID10 - I would 
expect the lifetime of a RAID10 member disk to be much higher than that 
of a RAID6 member disk.

I don't have the statistics to prove it, but I am certainly happy to use 
RAID10 rather than RAID6 for our company servers.

Of course, I also have two backup servers on two different sites...

> (I still prefer RAID-10 over RAID-6 unless space is at an absolute
> premium. But for a four-disk setup, net disk space is the same and it's
> just a question of whether you want the speed of RAID-10 or the
> reliability of RAID-6.)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-07 18:12 from 2x RAID1 to 1x RAID6 ? Stefan G. Weichinger
  2011-06-07 20:07 ` Maurice Hilarius
  2011-06-08  1:16 ` John Robinson
@ 2011-06-08  9:43 ` David Brown
  2011-06-08 12:31   ` Stefan G. Weichinger
  2 siblings, 1 reply; 15+ messages in thread
From: David Brown @ 2011-06-08  9:43 UTC (permalink / raw)
  To: linux-raid

On 07/06/2011 20:12, Stefan G. Weichinger wrote:
>
> Greetings, could you please advise me how to proceed?
>
> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>
> md5 : active raid1 sde1[0] sdf1[1]
>        976759936 blocks [2/2] [UU]
>
> md6 : active raid1 sdh1[1] sdg1[0]
>        976759936 blocks [2/2] [UU]
>
>
> md5 and md6 are right now physical volumes (PVs) in an LVM-volume-group.
> Nearly all the space is used right now (1.7 TB out of the ~2 TB).
>
> Now I would like to move things to a more reliable RAID6 consisting of
> all the four TB-drives ...
>
> How to do that with minimum risk?
>
> For sure it would be best to move all data aside, stop the arrays and
> build a new one ... etc
>
> Failing two drives and remove them from the RAID1s to build a new
> degraded RAID6 seems dangerous to me?
>
> Maybe I overlook a clever alternative?
>
> Suggestions welcome, thanks in advance.
>

This may be stating the obvious, but you do realise that converting to a 
four-disk RAID6 will not give you any more space?

You might want to consider replacing the drives when you do your 
re-shaping or rebuilding (whether you go for RAID6 or RAID10,far).




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08  9:38     ` David Brown
@ 2011-06-08 10:11       ` John Robinson
  2011-06-08 10:33         ` David Brown
  0 siblings, 1 reply; 15+ messages in thread
From: John Robinson @ 2011-06-08 10:11 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On 08/06/2011 10:38, David Brown wrote:
> On 08/06/2011 01:59, Thomas Harold wrote:
>> On 6/7/2011 4:07 PM, Maurice Hilarius wrote:
>>> On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
>>>> Greetings, could you please advise me how to proceed?
>>>>
>>>> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>>>>
>>>> ..
>>>>
>>>> Now I would like to move things to a more reliable RAID6 consisting of
>>>> all the four TB-drives ...
>>>>
>>>> How to do that with minimum risk?
>>>>
>>>> ..
>>>> Maybe I overlook a clever alternative?
>>>
>>> RAID 10 is as secure, and risk free, and much faster.
>>> And will cause much less CPU load.
>>>
>>
>> Well, with both a pair of RAID1 arrays and a pair of RAID-10 arrays, you
>> can lose 2 disks without losing data, but only if the right 2 disks fail.
>>
>> With RAID6, any two of the four can fail without data loss.
>>
>
> It /sounds/ like RAID6 is more reliable here because it can always
> survive a second disk failure, while with RAID10 you have only a 66%
> chance of surviving a second disk failure.
>
> However, how often does a disk fail? What is the chance of a random disk
> failure in a given space of time? And how long will it go between one
> disk failing, and it being replaced and the array rebuilt? If you figure
> out these numbers, you'll have the probability of losing your RAID10
> array due to the second critical disk failing.
>
> To pick some rough numbers - say you've got low reliability, cheap disks
> with a 500,000 hour MTBF. If it takes you 3 days to replace a disk (over
> the weekend), and 8 hours to rebuild, you have a risk period of 80
> hours. That gives you a 0.016% chance of having the second disk failing.
> Even if you consider that a rebuild is quite stressful on the critical
> disk, it's not a big risk.

It's not so much that the mirror disc might fail that I'd be worried 
about, it's that you might find the odd sector failure during the 
rebuild - this is the reason why RAID5 is now so disliked, and the 
reasons apply similarly to RAID1 and RAID10 too, even if you're only 
relying on one disc ('s worth of data) being perfect rather than two or 
more.

Still, I don't have any stats to back this up...

Cheers,

John.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08 10:11       ` John Robinson
@ 2011-06-08 10:33         ` David Brown
  2011-06-08 14:20           ` Phil Turmel
  2011-06-09 13:18           ` Nikolay Kichukov
  0 siblings, 2 replies; 15+ messages in thread
From: David Brown @ 2011-06-08 10:33 UTC (permalink / raw)
  To: linux-raid

On 08/06/2011 12:11, John Robinson wrote:
> On 08/06/2011 10:38, David Brown wrote:
>> On 08/06/2011 01:59, Thomas Harold wrote:
>>> On 6/7/2011 4:07 PM, Maurice Hilarius wrote:
>>>> On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
>>>>> Greetings, could you please advise me how to proceed?
>>>>>
>>>>> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>>>>>
>>>>> ..
>>>>>
>>>>> Now I would like to move things to a more reliable RAID6 consisting of
>>>>> all the four TB-drives ...
>>>>>
>>>>> How to do that with minimum risk?
>>>>>
>>>>> ..
>>>>> Maybe I overlook a clever alternative?
>>>>
>>>> RAID 10 is as secure, and risk free, and much faster.
>>>> And will cause much less CPU load.
>>>>
>>>
>>> Well, with both a pair of RAID1 arrays and a pair of RAID-10 arrays, you
>>> can lose 2 disks without losing data, but only if the right 2 disks
>>> fail.
>>>
>>> With RAID6, any two of the four can fail without data loss.
>>>
>>
>> It /sounds/ like RAID6 is more reliable here because it can always
>> survive a second disk failure, while with RAID10 you have only a 66%
>> chance of surviving a second disk failure.
>>
>> However, how often does a disk fail? What is the chance of a random disk
>> failure in a given space of time? And how long will it go between one
>> disk failing, and it being replaced and the array rebuilt? If you figure
>> out these numbers, you'll have the probability of losing your RAID10
>> array due to the second critical disk failing.
>>
>> To pick some rough numbers - say you've got low reliability, cheap disks
>> with a 500,000 hour MTBF. If it takes you 3 days to replace a disk (over
>> the weekend), and 8 hours to rebuild, you have a risk period of 80
>> hours. That gives you a 0.016% chance of having the second disk failing.
>> Even if you consider that a rebuild is quite stressful on the critical
>> disk, it's not a big risk.
>
> It's not so much that the mirror disc might fail that I'd be worried
> about, it's that you might find the odd sector failure during the
> rebuild - this is the reason why RAID5 is now so disliked, and the
> reasons apply similarly to RAID1 and RAID10 too, even if you're only
> relying on one disc ('s worth of data) being perfect rather than two or
> more.

I can see that problem, but it again boils down to probabilities.  The 
chances of seeing an unrecoverable read error are very low, just as with 
other disk errors.

The issue with RAID5 is that people often had large arrays with multiple 
disks, and on a rebuild /every/ sector had to be read.  So if you have a 
ten disk RAID5 and are rebuilding, you are reading from all other 9 
disks - you have 9 times as high a chance of having an unrecoverable 
read error ruin your day.

I look forward to the day bad block lists and hot replace are ready in 
mdraid - it will give us close to another disk's worth of redundancy 
without the cost.  For example, if one half of your raid1 mirror fails 
but is not totally dead (such as by having too many bad blocks), during 
rebuild you can keep both the good and bad halves in place.  Then if 
there is a read failure on the "good" half, you can probably still get 
the data from the "bad" half.

>
> Still, I don't have any stats to back this up...
>

Statistics on these things are pretty much worthless unless you have 
hundreds of systems deployed - either your array dies, or it does not. 
It's like lottery tickets, but in reverse - no matter how many tickets 
you buy, you can be confident that you won't win, despite statistics 
that prove that /somebody/ wins each draw.

So you install your RAID10 (or RAID6, if you prefer) system, and make 
sure you keep backups.  And if you /do/ get hit by a double disk failure 
in the wrong place, you spend the day restoring everything from the 
backups.  When management complain that a 24 hour downtime doesn't fit 
with their 99.99% uptime expectations, you remind them that this is 
amortized over the next 27 years...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08  9:43 ` David Brown
@ 2011-06-08 12:31   ` Stefan G. Weichinger
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan G. Weichinger @ 2011-06-08 12:31 UTC (permalink / raw)
  To: linux-raid

Am 08.06.2011 11:43, schrieb David Brown:

> This may be stating the obvious, but you do realise that converting to a
> four-disk RAID6 will not give you any more space?

Yes, I know. It is the improved redundancy I aim for.

S


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08 10:33         ` David Brown
@ 2011-06-08 14:20           ` Phil Turmel
  2011-06-08 14:42             ` Stefan G. Weichinger
  2011-06-09 13:18           ` Nikolay Kichukov
  1 sibling, 1 reply; 15+ messages in thread
From: Phil Turmel @ 2011-06-08 14:20 UTC (permalink / raw)
  To: David Brown
  Cc: John Robinson, linux-raid@vger.kernel.org, Stefan G. Weichinger,
	Maurice Hilarius, Thomas Harold

Hi All,

On 06/08/2011 06:33 AM, David Brown wrote:
> On 08/06/2011 12:11, John Robinson wrote:
>> On 08/06/2011 10:38, David Brown wrote:
>>> On 08/06/2011 01:59, Thomas Harold wrote:
>>>> On 6/7/2011 4:07 PM, Maurice Hilarius wrote:
>>>>> On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
>>>>>> Greetings, could you please advise me how to proceed?
>>>>>>
>>>>>> On a server I have 2 RAID1-arrays, each consisting of 2 TB-drives:
>>>>>>
>>>>>> ..
>>>>>>
>>>>>> Now I would like to move things to a more reliable RAID6 consisting of
>>>>>> all the four TB-drives ...
>>>>>>
>>>>>> How to do that with minimum risk?
>>>>>>
>>>>>> ..
>>>>>> Maybe I overlook a clever alternative?
>>>>>
>>>>> RAID 10 is as secure, and risk free, and much faster.
>>>>> And will cause much less CPU load.
>>>>>
>>>>
>>>> Well, with both a pair of RAID1 arrays and a pair of RAID-10 arrays, you
>>>> can lose 2 disks without losing data, but only if the right 2 disks
>>>> fail.
>>>>
>>>> With RAID6, any two of the four can fail without data loss.
>>>>
>>>
>>> It /sounds/ like RAID6 is more reliable here because it can always
>>> survive a second disk failure, while with RAID10 you have only a 66%
>>> chance of surviving a second disk failure.
>>>
>>> However, how often does a disk fail? What is the chance of a random disk
>>> failure in a given space of time? And how long will it go between one
>>> disk failing, and it being replaced and the array rebuilt? If you figure
>>> out these numbers, you'll have the probability of losing your RAID10
>>> array due to the second critical disk failing.
>>>
>>> To pick some rough numbers - say you've got low reliability, cheap disks
>>> with a 500,000 hour MTBF. If it takes you 3 days to replace a disk (over
>>> the weekend), and 8 hours to rebuild, you have a risk period of 80
>>> hours. That gives you a 0.016% chance of having the second disk failing.
>>> Even if you consider that a rebuild is quite stressful on the critical
>>> disk, it's not a big risk.
>>
>> It's not so much that the mirror disc might fail that I'd be worried
>> about, it's that you might find the odd sector failure during the
>> rebuild - this is the reason why RAID5 is now so disliked, and the
>> reasons apply similarly to RAID1 and RAID10 too, even if you're only
>> relying on one disc ('s worth of data) being perfect rather than two or
>> more.
> 
> I can see that problem, but it again boils down to probabilities.  The chances of seeing an unrecoverable read error are very low, just as with other disk errors.

The chances of any given unrecoverable read error are low, but during the rebuild, you are going to read every sector of the remaining drive in a mirror pair, or every sector of every remaining drive in a degraded raid5.  On large drives, you suddenly have a probability of uncorrectable error during rebuild that is orders of magnitude larger than the risk of a generic drive failure (in the rebuild window).

Since Stefan reported that he does backups to this array, I suspect the performance is less important than the redundancy.  The difference in redundancy is *very* significant.

Here's some stats on disk failures themselves:
http://www.storagemojo.com/2007/02/19/googles-disk-failure-experience/

Here's some stats on read errors during rebuild:
http://storagemojo.com/2010/02/27/does-raid-6-stops-working-in-2019/

If I recall correctly, Google switched to exclusive use of triple-disk mirrors on its production servers for this very reason.  (I can't find a link at the moment....)

> The issue with RAID5 is that people often had large arrays with multiple disks, and on a rebuild /every/ sector had to be read.  So if you have a ten disk RAID5 and are rebuilding, you are reading from all other 9 disks - you have 9 times as high a chance of having an unrecoverable read error ruin your day.
> 
> I look forward to the day bad block lists and hot replace are ready in mdraid - it will give us close to another disk's worth of redundancy without the cost.  For example, if one half of your raid1 mirror fails but is not totally dead (such as by having too many bad blocks), during rebuild you can keep both the good and bad halves in place.  Then if there is a read failure on the "good" half, you can probably still get the data from the "bad" half.

I don't see where either of these actually help the "rebuild after disk failure" situation?

Phil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08 14:20           ` Phil Turmel
@ 2011-06-08 14:42             ` Stefan G. Weichinger
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan G. Weichinger @ 2011-06-08 14:42 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

Am 08.06.2011 16:20, schrieb Phil Turmel:
> Hi All,
> 
> On 06/08/2011 06:33 AM, David Brown wrote:
>> On 08/06/2011 12:11, John Robinson wrote:
>>> On 08/06/2011 10:38, David Brown wrote:
>>>> On 08/06/2011 01:59, Thomas Harold wrote:
>>>>> On 6/7/2011 4:07 PM, Maurice Hilarius wrote:
>>>>>> On 6/7/2011 12:12 PM, Stefan G. Weichinger wrote:
>>>>>>> Greetings, could you please advise me how to proceed?
>>>>>>> 
>>>>>>> On a server I have 2 RAID1-arrays, each consisting of 2
>>>>>>> TB-drives:
>>>>>>> 
>>>>>>> ..
>>>>>>> 
>>>>>>> Now I would like to move things to a more reliable RAID6
>>>>>>> consisting of all the four TB-drives ...
>>>>>>> 
>>>>>>> How to do that with minimum risk?
>>>>>>> 
>>>>>>> .. Maybe I overlook a clever alternative?
>>>>>> 
>>>>>> RAID 10 is as secure, and risk free, and much faster. And
>>>>>> will cause much less CPU load.
>>>>>> 
>>>>> 
>>>>> Well, with both a pair of RAID1 arrays and a pair of RAID-10
>>>>> arrays, you can lose 2 disks without losing data, but only if
>>>>> the right 2 disks fail.
>>>>> 
>>>>> With RAID6, any two of the four can fail without data loss.
>>>>> 
>>>> 
>>>> It /sounds/ like RAID6 is more reliable here because it can
>>>> always survive a second disk failure, while with RAID10 you
>>>> have only a 66% chance of surviving a second disk failure.
>>>> 
>>>> However, how often does a disk fail? What is the chance of a
>>>> random disk failure in a given space of time? And how long will
>>>> it go between one disk failing, and it being replaced and the
>>>> array rebuilt? If you figure out these numbers, you'll have the
>>>> probability of losing your RAID10 array due to the second
>>>> critical disk failing.
>>>> 
>>>> To pick some rough numbers - say you've got low reliability,
>>>> cheap disks with a 500,000 hour MTBF. If it takes you 3 days to
>>>> replace a disk (over the weekend), and 8 hours to rebuild, you
>>>> have a risk period of 80 hours. That gives you a 0.016% chance
>>>> of having the second disk failing. Even if you consider that a
>>>> rebuild is quite stressful on the critical disk, it's not a big
>>>> risk.
>>> 
>>> It's not so much that the mirror disc might fail that I'd be
>>> worried about, it's that you might find the odd sector failure
>>> during the rebuild - this is the reason why RAID5 is now so
>>> disliked, and the reasons apply similarly to RAID1 and RAID10
>>> too, even if you're only relying on one disc ('s worth of data)
>>> being perfect rather than two or more.
>> 
>> I can see that problem, but it again boils down to probabilities.
>> The chances of seeing an unrecoverable read error are very low,
>> just as with other disk errors.
> 
> The chances of any given unrecoverable read error are low, but during
> the rebuild, you are going to read every sector of the remaining
> drive in a mirror pair, or every sector of every remaining drive in a
> degraded raid5.  On large drives, you suddenly have a probability of
> uncorrectable error during rebuild that is orders of magnitude larger
> than the risk of a generic drive failure (in the rebuild window).
> 
> Since Stefan reported that he does backups to this array, I suspect
> the performance is less important than the redundancy.  The
> difference in redundancy is *very* significant.
> 
> Here's some stats on disk failures themselves: 
> http://www.storagemojo.com/2007/02/19/googles-disk-failure-experience/
>
>  Here's some stats on read errors during rebuild: 
> http://storagemojo.com/2010/02/27/does-raid-6-stops-working-in-2019/
> 
> If I recall correctly, Google switched to exclusive use of
> triple-disk mirrors on its production servers for this very reason.
> (I can't find a link at the moment....)
> 
>> The issue with RAID5 is that people often had large arrays with
>> multiple disks, and on a rebuild /every/ sector had to be read.  So
>> if you have a ten disk RAID5 and are rebuilding, you are reading
>> from all other 9 disks - you have 9 times as high a chance of
>> having an unrecoverable read error ruin your day.
>> 
>> I look forward to the day bad block lists and hot replace are ready
>> in mdraid - it will give us close to another disk's worth of
>> redundancy without the cost.  For example, if one half of your
>> raid1 mirror fails but is not totally dead (such as by having too
>> many bad blocks), during rebuild you can keep both the good and bad
>> halves in place.  Then if there is a read failure on the "good"
>> half, you can probably still get the data from the "bad" half.
> 
> I don't see where either of these actually help the "rebuild after
> disk failure" situation?

phew ... thanks to all of you for your statements ...
I have to read through all this at first ...

;-)

Thanks, Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-08 10:33         ` David Brown
  2011-06-08 14:20           ` Phil Turmel
@ 2011-06-09 13:18           ` Nikolay Kichukov
  2011-06-09 13:42             ` David Brown
  1 sibling, 1 reply; 15+ messages in thread
From: Nikolay Kichukov @ 2011-06-09 13:18 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 06/08/2011 01:33 PM, David Brown wrote:

> So you install your RAID10 (or RAID6, if you prefer) system, and make sure you keep backups.  And if you /do/ get hit by
> a double disk failure in the wrong place, you spend the day restoring everything from the backups.  When management
> complain that a 24 hour downtime doesn't fit with their 99.99% uptime expectations, you remind them that this is
> amortized over the next 27 years...

Hi David,

nice one ;-) Did you actually calculate 24 hours for those 99.99% within 27 years? ;-)

Cheers,
- -Nik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN8MgdAAoJEDFLYVOGGjgXIiYH/0qMOkHCKTV5WeBhlPdGOpjr
RzniFUYxpVYLvHAna7DWmrUaYqGMgZWadljt2GZB90NLqhDQX0OgIKm5thGRwaLD
09x2h2zpT4XV8a78VRU63blS2jHBygCxqVkUnagCHlYVZ63Jm4qZZH0jeHJkWzPV
YjQXhGILzx8H02P1G8WDCnzg32+k8XNleatV2+441OUidnYV1019SyYDX6/5/UDh
88VMIiWOMA0RvJP4b9yGw9vV/pEx2LReAahfhRAZ3iu9sOc5kUtCjiHzghE8n2nW
oF9t4i5raS4q54tz2WGs/iDiV20gO8lsNtIjReIAAEnMFlpIZapelVyxj9HjNe0=
=Y5nu
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: from 2x RAID1 to 1x RAID6 ?
  2011-06-09 13:18           ` Nikolay Kichukov
@ 2011-06-09 13:42             ` David Brown
  0 siblings, 0 replies; 15+ messages in thread
From: David Brown @ 2011-06-09 13:42 UTC (permalink / raw)
  To: linux-raid

On 09/06/2011 15:18, Nikolay Kichukov wrote:
> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>
>
>
> On 06/08/2011 01:33 PM, David Brown wrote:
>
>> So you install your RAID10 (or RAID6, if you prefer) system, and
>> make sure you keep backups.  And if you /do/ get hit by a double
>> disk failure in the wrong place, you spend the day restoring
>> everything from the backups.  When management complain that a 24
>> hour downtime doesn't fit with their 99.99% uptime expectations,
>> you remind them that this is amortized over the next 27 years...
>
> Hi David,
>
> nice one ;-) Did you actually calculate 24 hours for those 99.99%
> within 27 years? ;-)
>

27.4 years is 10,000 days - so you can have 99.99% uptime with a 24-hour 
failure if you run for the rest of the 27.4 years without a hitch.  Of 
course, by the same logic you can claim 6 nine's uptime with a week's 
failure - as long as there are no more problems for the next 20,000 years...

:-)


> Cheers, - -Nik


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-06-09 13:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-07 18:12 from 2x RAID1 to 1x RAID6 ? Stefan G. Weichinger
2011-06-07 20:07 ` Maurice Hilarius
2011-06-07 23:59   ` Thomas Harold
2011-06-08  8:06     ` Stefan G. Weichinger
2011-06-08  9:38     ` David Brown
2011-06-08 10:11       ` John Robinson
2011-06-08 10:33         ` David Brown
2011-06-08 14:20           ` Phil Turmel
2011-06-08 14:42             ` Stefan G. Weichinger
2011-06-09 13:18           ` Nikolay Kichukov
2011-06-09 13:42             ` David Brown
2011-06-08  1:16 ` John Robinson
2011-06-08  8:16   ` Stefan G. Weichinger
2011-06-08  9:43 ` David Brown
2011-06-08 12:31   ` Stefan G. Weichinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).