Use RAID-6!

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Use RAID-6!
@ 2013-04-16 16:44 Roy Sigurd Karlsbakk
  2013-04-16 17:09 ` Mikael Abrahamsson
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-16 16:44 UTC (permalink / raw)
  To: Linux RAID

Hi all

After reading this list for some time, there's a single mode of failure that's repeated over and over: RAID-5 loses a drive and finds bad data on another (or just loses another). This is rather normal, far more than documented by the disk vendors. This is also the case with "professional" systems with "enterprise" drives.

So, if you can afford another drive, please use RAID-6. Do *not* trust RAID-5 with something like 8 drives.

Also, maybe this should be on an FAQ/RAID tutorial somewhere?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 16:44 Use RAID-6! Roy Sigurd Karlsbakk
@ 2013-04-16 17:09 ` Mikael Abrahamsson
  2013-04-16 17:25   ` Roy Sigurd Karlsbakk
  2013-04-16 20:01   ` David Brown
  2013-04-16 19:52 ` Robert L Mathews
  2013-04-16 23:42 ` md dropping disks too early (was: Use RAID-6!) Ben Bucksch
  2 siblings, 2 replies; 28+ messages in thread
From: Mikael Abrahamsson @ 2013-04-16 17:09 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Linux RAID

On Tue, 16 Apr 2013, Roy Sigurd Karlsbakk wrote:

> Also, maybe this should be on an FAQ/RAID tutorial somewhere?

Question is, where should it be put so that people read it and actually 
understand it.

This article is from 2007:

<http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>

I've had people argue with me that the above article is wrong, but I never 
udnerstood their logic. To me it makes perfect sense and I always go 
RAID6.

I also think the work having more than 2 parity drives was very promising. 
I'd rather have a 20 drive volume with 4 parity drives than to LVM 
together two 10 drive RAID6:es (apart from obvious performance penalties).

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 17:09 ` Mikael Abrahamsson
@ 2013-04-16 17:25   ` Roy Sigurd Karlsbakk
  2013-04-16 20:01   ` David Brown
  1 sibling, 0 replies; 28+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-16 17:25 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Linux RAID

> > Also, maybe this should be on an FAQ/RAID tutorial somewhere?
> 
> Question is, where should it be put so that people read it and
> actually
> understand it.
> 
> This article is from 2007:
> 
> <http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>
> 
> I've had people argue with me that the above article is wrong, but I
> never
> udnerstood their logic. To me it makes perfect sense and I always go
> RAID6.
> 
> I also think the work having more than 2 parity drives was very
> promising.
> I'd rather have a 20 drive volume with 4 parity drives than to LVM
> together two 10 drive RAID6:es (apart from obvious performance
> penalties).

I've been running RAIDz3 for a backup machine (zfs receive from the main box), and the write performance was rather low. I'd rather use lvm or raid-something over different raid-6 volumes. Spread out the risk factor. With ~8 drives in each raid-6 set, the risk is low enough to allow for rather large volumes.

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 17:09 ` Mikael Abrahamsson
  2013-04-16 17:25   ` Roy Sigurd Karlsbakk
@ 2013-04-16 20:01   ` David Brown
  2013-04-17  7:56     ` Mikael Abrahamsson
  1 sibling, 1 reply; 28+ messages in thread
From: David Brown @ 2013-04-16 20:01 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Roy Sigurd Karlsbakk, Linux RAID

On 16/04/13 19:09, Mikael Abrahamsson wrote:
> On Tue, 16 Apr 2013, Roy Sigurd Karlsbakk wrote:
>
>> Also, maybe this should be on an FAQ/RAID tutorial somewhere?
>
> Question is, where should it be put so that people read it and actually
> understand it.
>
> This article is from 2007:
>
> <http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>
>
> I've had people argue with me that the above article is wrong, but I
> never udnerstood their logic. To me it makes perfect sense and I always
> go RAID6.
>
> I also think the work having more than 2 parity drives was very
> promising. I'd rather have a 20 drive volume with 4 parity drives than
> to LVM together two 10 drive RAID6:es (apart from obvious performance
> penalties).
>

Raid calculations for a third parity are noticeably more time-consuming 
than for the second parity of Raid6.  And with a bigger array with lots 
of drives, you are going to have terrible RMW performance for small 
writes.  However, as the multi-threaded scaling of Raid5 and Raid6 
improves and makes its way into distro's standard kernels, it's going to 
be more realistic - especially for machines with plenty of cores and 
lots of RAM for stripe caches.

I hope triple parity raid will make it into the kernel at some point. 
I've done the main part of the maths involved, but not had the time to 
work it into anything resembling real code.  I don't know if I 
personally will ever make it into working code - but if anyone else is 
at all interested in doing so, then I will certainly help with the maths.

I am not sure there is much real-world need of triple parity raid for 
normal arrays - even with better cpu scaling, it would still be a lot 
slower than two raid6 arrays LVM'ed together.  I foresee it's main use 
as a temporary measure during array maintenance.  For example, if you 
have a raid6 and you want to swap out the drives for bigger ones, then 
you could temporarily add an extra drive for a third parity using a 
non-symmetrical layout.  Once this extra drive is synced, then you can 
step through the other drives doing a replace-and-resync, knowing that 
you still have the double parity safety.  Then at the end of the process 
you drop the third parity again.

Quad parity has some limitations, especially if you want to keep the 
first 3 parities compatible with triple parity.  In particular, you are 
limited to 21 data disks.  There are, of course, ways to handle even 
greater parity counts - but the cost in complexity and speed is 
considerable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 20:01   ` David Brown
@ 2013-04-17  7:56     ` Mikael Abrahamsson
  2013-04-17  9:26       ` David Brown
  0 siblings, 1 reply; 28+ messages in thread
From: Mikael Abrahamsson @ 2013-04-17  7:56 UTC (permalink / raw)
  To: David Brown; +Cc: Roy Sigurd Karlsbakk, Linux RAID

On Tue, 16 Apr 2013, David Brown wrote:

> you are going to have terrible RMW performance for small writes.  However, as

As I said, I don't have problem with lower performance. My workload is 
write once and few, read many. If the performance is approximately the 
approximately the same as a 10 drive RAID-6, but with double the storage, 
I'm fine.

> I am not sure there is much real-world need of triple parity raid for 
> normal arrays - even with better cpu scaling, it would still be a lot 
> slower than two raid6 arrays LVM'ed together.  I foresee it's main use 
> as a temporary measure during array maintenance.  For example, if you 
> have a raid6 and you want to swap out the drives for bigger ones, then 
> you could temporarily add an extra drive for a third parity using a 
> non-symmetrical layout.  Once this extra drive is synced, then you can 
> step through the other drives doing a replace-and-resync, knowing that 
> you still have the double parity safety. Then at the end of the process 
> you drop the third parity again.

Well, I run RAID6+spare. I'd rather run a triple parity drive unless the 
write performance penalty is huge.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  7:56     ` Mikael Abrahamsson
@ 2013-04-17  9:26       ` David Brown
  0 siblings, 0 replies; 28+ messages in thread
From: David Brown @ 2013-04-17  9:26 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Roy Sigurd Karlsbakk, Linux RAID

On 17/04/13 09:56, Mikael Abrahamsson wrote:
> On Tue, 16 Apr 2013, David Brown wrote:
> 
>> you are going to have terrible RMW performance for small writes. 
>> However, as
> 
> As I said, I don't have problem with lower performance. My workload is
> write once and few, read many. If the performance is approximately the
> approximately the same as a 10 drive RAID-6, but with double the
> storage, I'm fine.

I would expect read performance for triple-parity raid to be similar to
Raid5 or Raid6 - i.e., you get good striped performance, especially for
large files as they are spread over many spindles.  Of course, since
triple-parity md raid does not yet exist, that's just theoretical...

> 
>> I am not sure there is much real-world need of triple parity raid for
>> normal arrays - even with better cpu scaling, it would still be a lot
>> slower than two raid6 arrays LVM'ed together.  I foresee it's main use
>> as a temporary measure during array maintenance.  For example, if you
>> have a raid6 and you want to swap out the drives for bigger ones, then
>> you could temporarily add an extra drive for a third parity using a
>> non-symmetrical layout.  Once this extra drive is synced, then you can
>> step through the other drives doing a replace-and-resync, knowing that
>> you still have the double parity safety. Then at the end of the
>> process you drop the third parity again.
> 
> Well, I run RAID6+spare. I'd rather run a triple parity drive unless the
> write performance penalty is huge.
> 

It's encouraging to hear people are interested in this.  But before it
can be implemented, there has to be someone with an understanding of
Linux md raid who can implement it.  I know the maths involved, but I
have no experience with Linux kernel work (I work with embedded systems
- while I use the same programming language as the kernel, it's a very
different style of programming).


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 16:44 Use RAID-6! Roy Sigurd Karlsbakk
  2013-04-16 17:09 ` Mikael Abrahamsson
@ 2013-04-16 19:52 ` Robert L Mathews
  2013-04-16 20:05   ` Carsten Aulbert
  2013-04-17 17:27   ` Roy Sigurd Karlsbakk
  2013-04-16 23:42 ` md dropping disks too early (was: Use RAID-6!) Ben Bucksch
  2 siblings, 2 replies; 28+ messages in thread
From: Robert L Mathews @ 2013-04-16 19:52 UTC (permalink / raw)
  To: Linux RAID

On 4/16/13 9:44 AM, Roy Sigurd Karlsbakk wrote:

> So, if you can afford another drive, please use RAID-6. Do *not* trust RAID-5 with something like 8 drives.

Yep. This has been true for many years, too:

 http://www.miracleas.com/BAARF/

I personally don't even trust RAID 6. All our servers use three-disk
RAID 1 setups, with disks from at least two different manufacturers to
prevent against firmware bricking (although this is becoming more and
more difficult as the industry consolidates).

If (no, scratch that, "when") something goes horribly wrong, I can mount
any of the disks as normal, non-RAID volumes. All I need is one disk to
work. (If none of the disks are working, the RAID level is irrelevant...
 ;-)

To me, that level of confidence is worth sacrificing quite a bit of
performance and capacity. It has saved my bacon at least once.

-- 
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 19:52 ` Robert L Mathews
@ 2013-04-16 20:05   ` Carsten Aulbert
  2013-04-16 20:19     ` Roman Mamedov
  2013-04-16 22:44     ` Robert L Mathews
  2013-04-17 17:27   ` Roy Sigurd Karlsbakk
  1 sibling, 2 replies; 28+ messages in thread
From: Carsten Aulbert @ 2013-04-16 20:05 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 670 bytes --]

Hi

On 04/16/2013 09:52 PM, Robert L Mathews wrote:
> I personally don't even trust RAID 6. All our servers use three-disk
> RAID 1 setups, with disks from at least two different manufacturers to
> prevent against firmware bricking (although this is becoming more and
> more difficult as the industry consolidates).

The problem I find with RAID1 is that it won't protect you against
silent corruptions (same as RAID5). What do you do if you do a through
check and both drives claim a data block is valid and intact, but data
differs? Do you trust disk1 or disk2?

In that respect I think RAID1 is a step into the wrong direction :(

Cheers

Carsten


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2044 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 20:05   ` Carsten Aulbert
@ 2013-04-16 20:19     ` Roman Mamedov
  2013-04-16 22:44     ` Robert L Mathews
  1 sibling, 0 replies; 28+ messages in thread
From: Roman Mamedov @ 2013-04-16 20:19 UTC (permalink / raw)
  To: Carsten Aulbert; +Cc: Robert L Mathews, Linux RAID

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

On Tue, 16 Apr 2013 22:05:53 +0200
Carsten Aulbert <Carsten.Aulbert@aei.mpg.de> wrote:

> The problem I find with RAID1 is that it won't protect you against
> silent corruptions (same as RAID5). What do you do if you do a through
> check and both drives claim a data block is valid and intact, but data
> differs? Do you trust disk1 or disk2?
> 
> In that respect I think RAID1 is a step into the wrong direction :(

Then use btrfs RAID1 where every data and metadata block is checksummed and in
case some array member returns blocks with invalid checksums, this is healed
from others which still have the correct ones.

(Although currently btrfs "RAID1" stores data on *two disks*, no matter how
many you have in the array; so it's a bit unconventional).

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 20:05   ` Carsten Aulbert
  2013-04-16 20:19     ` Roman Mamedov
@ 2013-04-16 22:44     ` Robert L Mathews
  2013-04-17  0:20       ` Ben Bucksch
  2013-04-17  4:20       ` Roman Mamedov
  1 sibling, 2 replies; 28+ messages in thread
From: Robert L Mathews @ 2013-04-16 22:44 UTC (permalink / raw)
  To: Linux RAID

On 4/16/13 1:05 PM, Carsten Aulbert wrote:

> The problem I find with RAID1 is that it won't protect you against
> silent corruptions (same as RAID5). What do you do if you do a through
> check and both drives claim a data block is valid and intact, but data
> differs? Do you trust disk1 or disk2?

That's partly why we use three-disk arrays instead of two-disk.

But as you say, this general issue is a problem with RAID 5 too. We plan
to switch to Btrfs as soon as doing so is wise.

In the meantime, I'd rather risk this problem than the endless reports
of complete array failures that appear on the list with RAID 5 and even
RAID 6 (a recent topic, I note, was "multiple disk failures in an md
raid6 array"). I almost never see anyone reporting complete loss of a
RAID 1 array.

The fundamental difference between RAID 1 and other levels seems to be
that the usefulness of an individual array member doesn't rely on the
state of any other member. This vastly reduces the impact of failures on
the overall system. After using mdadm with various RAID levels since
2002 (thanks, Neil), I'm convinced that RAID 1 is by its very nature far
less fragile than any other scheme. This belief is sadly reinforced
almost every week by a new tale of woe on the mailing list.

-- 
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 22:44     ` Robert L Mathews
@ 2013-04-17  0:20       ` Ben Bucksch
  2013-04-17  1:35         ` Adam Goryachev
  2013-04-17  3:32         ` Robert L Mathews
  2013-04-17  4:20       ` Roman Mamedov
  1 sibling, 2 replies; 28+ messages in thread
From: Ben Bucksch @ 2013-04-17  0:20 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: Linux RAID

Robert L Mathews wrote, On 17.04.2013 00:44:
> the endless reports of complete array failures that appear on the list 
> with RAID 5 and even RAID 6 (a recent topic, I note, was "multiple 
> disk failures in an md raid6 array"). I almost never see anyone 
> reporting complete loss of a RAID 1 array.

Correct

> The fundamental difference between RAID 1 and other levels seems to be 
> that the usefulness of an individual array member doesn't rely on the 
> state of any other member. This vastly reduces the impact of failures 
> on the overall system. After using mdadm with various RAID levels 
> since 2002 (thanks, Neil), I'm convinced that RAID 1 is by its very 
> nature far less fragile than any other scheme. This belief is sadly 
> reinforced almost every week by a new tale of woe on the mailing list. 

Exactly.

However, I think the RAID5 problems are caused by bad design decisions 
in the md implementation, not in the inherent concept of RAID5, though. 
Many people seem to have problems getting to the data of their RAID5 
array, although they have enough disks that are readable, but they can't 
convince md to read it. RAID1 doesn't have that problem, because you can 
ignore md when reading them. This is a home-made problem of Linux md.

FWIW, my own 10 years of experience with Linux md RAID led to the same 
conclusion as you had.

See thread "md dropping disks too early"

Ben

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  0:20       ` Ben Bucksch
@ 2013-04-17  1:35         ` Adam Goryachev
  2013-04-17  4:27           ` Robert L Mathews
  2013-04-17 11:13           ` Ben Bucksch
  2013-04-17  3:32         ` Robert L Mathews
  1 sibling, 2 replies; 28+ messages in thread
From: Adam Goryachev @ 2013-04-17  1:35 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: Robert L Mathews, Linux RAID

On 17/04/13 10:20, Ben Bucksch wrote:
> Robert L Mathews wrote, On 17.04.2013 00:44:
>> the endless reports of complete array failures that appear on the
>> list with RAID 5 and even RAID 6 (a recent topic, I note, was
>> "multiple disk failures in an md raid6 array"). I almost never see
>> anyone reporting complete loss of a RAID 1 array.
> Correct
>
Obviously, if they suffered a two disk failure then they won't be here
asking for help will they :)

Although, you are right, there are less failure scenarios where they are
left with one or more working disks and no possibility to recover the data.
>> The fundamental difference between RAID 1 and other levels seems to
>> be that the usefulness of an individual array member doesn't rely on
>> the state of any other member. This vastly reduces the impact of
>> failures on the overall system. After using mdadm with various RAID
>> levels since 2002 (thanks, Neil), I'm convinced that RAID 1 is by its
>> very nature far less fragile than any other scheme. This belief is
>> sadly reinforced almost every week by a new tale of woe on the
>> mailing list. 
>
> Exactly.
>
> However, I think the RAID5 problems are caused by bad design decisions
> in the md implementation, not in the inherent concept of RAID5,
> though. Many people seem to have problems getting to the data of their
> RAID5 array, although they have enough disks that are readable, but
> they can't convince md to read it. RAID1 doesn't have that problem,
> because you can ignore md when reading them. This is a home-made
> problem of Linux md.
Well, you can ignore Linux md when reading from RAID5 member disks, you
just need to do some work to make the contents actually useful.
However, I totally disagree with your comment anyway. Linux md is simple
a part of the kernel, not the whole kernel. It takes a "block device"
and generates read/write commands to that block device. It can get back
one of a few possible results:
1) read error
2) write error
3) block device is no longer valid

1) A read error can be generated for a number of causes, but (AFAIK)
Linux md will simply read from another member, and try to write the data
back to the device that generated the read error. This would fix a URE
for example.

2) A write error is more of a problem, if the block device generates a
write error, then there are limited options. We can retry the write, or
we can discard the entire device. I think Linux md will discard the
entire device, possibly after retrying the write one or more times I
don't know enough about Linux md, but in any case, I think this is a
rare case where we get a write error from an otherwise good block device.

3) This is the issue that seems to bite everyone. Using block devices
that are not configured correctly. Sooner or later, the drive has a URE,
the drive goes off to la-la land and Linux patiently waits, tries a
drive reset, SATA bus reset, etc, still no response, eventually deciding
the drive has gone. The Linux kernel advises Linux md that the block
device is gone, so Linux md discards the block device and stops trying
to use it. Personally, I don't see that Linux md has a lot of choice in
the matter, without trying to re-implement every SATA/SCSI/SAS
controller driver into md itself so that we can keep retrying longer. We
are told the device is gone, so it is gone, end of story.

Now, if you truly have this issue, and do NOT make any silly assumption,
and follow the correct advice, you will have no problem resolving the
issue (as long as the actual device is working properly). Generally,
this is just a matter of assembling the MD without the oldest/first
affected device, and/or using --force or similar. The SECOND problem is
caused by the user attempting some other recovery methods which cause
additional writes to the array.

Certainly, a hardware raid controller doesn't have this issue, it
controls the disk, disk controller, and RAID, it knows everything about
all layers. However, if some strange issue happens such as two disks
dropping out of the array, one after the other, then I'm not sure what
your recovery options are, but I expect they are a lot more limited
compared to having the power of Linux md and tools like dd, GNU
ddrescue, etc to manipulate the data in well documented and understood
ways (as opposed to being stuck in a limited "BIOS" type tool with
limited GUI type options...)

Perhaps it is possible for Linux md to check whether the RAID members
support ecterc and/or what their timeout is, along with the associated
interface timeout. Possibly using user space mdadm rather than the
in-kernel md. At least this might catch more broken configurations
before they break rather than waiting for it to break first.

> FWIW, my own 10 years of experience with Linux md RAID led to the same
> conclusion as you had.
>
> See thread "md dropping disks too early"

Personally, I'd like to see RAID10 get a lot more attention. We need to
be able to grow RAID10 arrays (and shrink), etc, not because this would
provide RAID1 type reliability. Of course, you can still get multiple
disk failures, and you can still mess up a RAID10 array by trying to
"fix" it, yet still have just enough idea that all your data might be
there, you just need to know the right magic spell to make it re-appear.

The best part of Linux md RAID is that the large majority of the time,
the people that come to the list with broken arrays are able to recover
all of their data *IF* they are patient enough, *AND* follow the advice
of the very knowledgeable people on this list, even in cases where that
user has broken their RAID array further in their attempts to "fix" it.

In summary, I'll say it again, most Linux md RAID issues seem to be
caused by:
1) mis-configured systems that are just waiting for a critical moment to
break (Murphy's Law)
2) people who don't know enough about Linux md RAID who try to fix the
broken array

PS, I really have no idea what I'm talking about, except lurking and
reading this list and the problems (and resolutions) here, if I've made
any errors in the above, feel free to fix it. I really think the above
(plus whatever corrections/more complete information) should be saved in
a FAQ somewhere so we can just point people at the same page all the
time instead of discussing it again each time (it invariably seems to be
discussed every month or so).

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  1:35         ` Adam Goryachev
@ 2013-04-17  4:27           ` Robert L Mathews
  2013-04-17  4:45             ` Adam Goryachev
  2013-04-17  6:06             ` Stan Hoeppner
  2013-04-17 11:13           ` Ben Bucksch
  1 sibling, 2 replies; 28+ messages in thread
From: Robert L Mathews @ 2013-04-17  4:27 UTC (permalink / raw)
  To: Linux RAID

On 4/16/13 6:35 PM, Adam Goryachev wrote:

> Obviously, if they suffered a two disk [RAID 1] failure then they won't
> be here asking for help will they :)

Heh. Well, no, they won't if the disks are completely and permanently dead.

(I know I'm starting to sound like a broken record, but "that's partly
why we use three disks instead of two and make sure they don't all use
the same company's firmware".)

But complete disk death doesn't seem to be the normal failure mode.
If the failure is spurious, as so many seem to be, and temporarily
affects an array so that each disk has a different event count, that
isn't a disaster under RAID 1. If worst comes to worst, you can pick one
disk to use and pretend RAID doesn't even exist. You don't need to get
the members to successfully sync into an array to read the data.

But if each disk in a RAID 5 or RAID 6 array gets a different event
count, or if the disks refuse to easily assemble into an active array
for any other reason, all your data is inaccessible until you fix the
RAID problem.

I avidly read the details of every RAID 5 [and 6] disaster on the list,
and almost every one would be trivially easy to fix under RAID 1, with
no risk of complete data loss. It's heartbreaking.

-- 
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  4:27           ` Robert L Mathews
@ 2013-04-17  4:45             ` Adam Goryachev
  2013-04-17  6:06             ` Stan Hoeppner
  1 sibling, 0 replies; 28+ messages in thread
From: Adam Goryachev @ 2013-04-17  4:45 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: Linux RAID

On 17/04/13 14:27, Robert L Mathews wrote:
> But complete disk death doesn't seem to be the normal failure mode. If
> the failure is spurious, as so many seem to be, and temporarily
> affects an array so that each disk has a different event count, that
> isn't a disaster under RAID 1. If worst comes to worst, you can pick
> one disk to use and pretend RAID doesn't even exist. You don't need to
> get the members to successfully sync into an array to read the data.
> But if each disk in a RAID 5 or RAID 6 array gets a different event
> count, or if the disks refuse to easily assemble into an active array
> for any other reason, all your data is inaccessible until you fix the
> RAID problem. I avidly read the details of every RAID 5 [and 6]
> disaster on the list, and almost every one would be trivially easy to
> fix under RAID 1, with no risk of complete data loss. It's heartbreaking. 
RAID1 of course fails the requirement of a single filesystem that
requires more space than a single disk can provide.

Of course, you can then consider LVM2, multiple mount points, or RAID10
or RAID1 + linear etc.... but most people still prefer to see a single
block device. Dealing with multiple RAID1 and a linear could lead to
more complex issues as well.

In any case, as mentioned previously, the majority of issues are caused
by mis-configuration, if we could add some configuration verification to
mdadm or similar, then we might be able to warn more people prior to
things failing.

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  4:27           ` Robert L Mathews
  2013-04-17  4:45             ` Adam Goryachev
@ 2013-04-17  6:06             ` Stan Hoeppner
  1 sibling, 0 replies; 28+ messages in thread
From: Stan Hoeppner @ 2013-04-17  6:06 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: Linux RAID

On 4/16/2013 11:27 PM, Robert L Mathews wrote:

> I avidly read the details of every RAID 5 [and 6] disaster on the list,
> and almost every one would be trivially easy to fix under RAID 1, with
> no risk of complete data loss. It's heartbreaking.

I do read most of them as well.  But mirrors simply don't scale in
either capacity or performance and thus aren't suitable.  If one needs a
4TB+ filesystem today or more than combined ~150MB/s streaming write
throughput one must use one of:

1.  RAID10
2.  RAID0 over RAID1 pairs/triples
3.  A linear concat over pairs/triples w/XFS
4.  RAID5 or RAID6

Each of these is most suitable for only subset of workloads, but all of
them can scale to more than 4TB, whereas RAID1 cannot.  When
SATA4/SAS1200 arrive offering 1.2GB/s interface rate, and SSDs hit 2-4TB
capacity at reasonable prices, then I think you'll see more straight
RAID1 being used in more of the systems that don't need any more total
capacity.  But as many servers will always need more than this and will
still use rust, striped/concatenated arrays will be with us for quite
some time.

And BTW, regarding your triplets setup, if you want to do that right
according to your philosophy, then you need a dedicated SAS/SATA
controller for each drive, each controller being of a different
make/model with different firmware.  The old UNIX/Netware "duplexing"
strategy but triplexing in this case.  But I doubt you're doing this.
All 3 are probably connected to the single motherboard down SATA
controller.

-- 
Stan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  1:35         ` Adam Goryachev
  2013-04-17  4:27           ` Robert L Mathews
@ 2013-04-17 11:13           ` Ben Bucksch
  2013-04-17 11:32             ` Adam Goryachev
  1 sibling, 1 reply; 28+ messages in thread
From: Ben Bucksch @ 2013-04-17 11:13 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: Robert L Mathews, Linux RAID

Adam Goryachev wrote, On 17.04.2013 03:35:
> Obviously, if they suffered a two disk failure then they won't be here
> asking for help will they:)

Wrong, sadly. I suffered a 1 disk failure, and I am here asking for 
help. And nobody can give it.

Again: I have a RAID5, and 1 (one) disk failed, so I should be fine, but 
I cannot read the data anymore, no way to get at it. That's because md 
ejected a good (!) drive to start with, and refuses to take it back (!). 
(And then another drive failed during resync.) If you have a way, please 
do show me, see thread 'Disk wrongly marked "spare", need to force 
re-add it'

The problem isn't double disk failure. The problem is bugs in md 
implementation.

> The Linux kernel advises Linux md that the block
> device is gone, so Linux md discards the block device and stops trying
> to use it. Personally, I don't see that Linux md has a lot of choice in
> the matter

True. But often, such errors are temporary. For example, a loose cable. 
I must be able to re-add the device as a good device with data. But I 
can't, md doesn't let me.

My case was even more unbelievable: md ejected perfectly good drives 
simply because I upgraded the OS. (This happened with 2 independent 
arrays, so not coincidence.)

Also, a single sector being unreadable/unwritable doesn't count as "disk 
failure" in my book, and shouldn't eject the whole disk. If I have 2 
sectors on 2 different disks that are unreadable, md currently trashes 
the whole array and doesn't let me read anything at all anymore. That's 
obviously broken, but unfortunately the sad reality.

See http://neil.brown.name/blog/20110216044002#1

(And, BTW, RAID6 doesn't really help with this problem, because it's 
quite possible that 3 disks have sectors unreadable/unwritable.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17 11:13           ` Ben Bucksch
@ 2013-04-17 11:32             ` Adam Goryachev
  2013-04-17 11:51               ` Ben Bucksch
  0 siblings, 1 reply; 28+ messages in thread
From: Adam Goryachev @ 2013-04-17 11:32 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: Robert L Mathews, Linux RAID

On 17/04/13 21:13, Ben Bucksch wrote:
> Adam Goryachev wrote, On 17.04.2013 03:35:
>> Obviously, if they suffered a two disk failure then they won't be here
>> asking for help will they:)
> 
> Wrong, sadly. I suffered a 1 disk failure, and I am here asking for
> help. And nobody can give it.
> 
> Again: I have a RAID5, and 1 (one) disk failed, so I should be fine, but
> I cannot read the data anymore, no way to get at it. That's because md
> ejected a good (!) drive to start with,

Actually, I think the real problem here is that you don't know why your
so called good drive was ejected from the array. You assume that the
drive is good, and that it was configured correctly, but obviously Linux
and/or MD has a different opinion.

> and refuses to take it back (!).

It probably would have taken it back, although requiring a resync.

> (And then another drive failed during resync.) If you have a way, please
> do show me, see thread 'Disk wrongly marked "spare", need to force
> re-add it'

Like I said, you need to be patient, and follow the expert advice
provided from the list. This discussion is just a diversion from your
problem, forget the diversion (at least until you get your problem fixed).

> The problem isn't double disk failure. The problem is bugs in md
> implementation.

Or users who expect things to work a certain way, without actually
bothering to find out in advance. Hence their expectation is considered
a bug when really it is just a lack of knowledge.

>> The Linux kernel advises Linux md that the block
>> device is gone, so Linux md discards the block device and stops trying
>> to use it. Personally, I don't see that Linux md has a lot of choice in
>> the matter
> 
> True. But often, such errors are temporary. For example, a loose cable.
> I must be able to re-add the device as a good device with data. But I
> can't, md doesn't let me.

It does actually. You can re-add it, with a resync, or if you ensure
that no writes occurred since the drive was ejected, you can re-add it
without a resync. In addition, even if some writes occurred, if you use
a bitmap, only the newly written blocks need to by resynced.

> My case was even more unbelievable: md ejected perfectly good drives
> simply because I upgraded the OS. (This happened with 2 independent
> arrays, so not coincidence.)

Like I said, the drives were ejected for a reason. You just don't know
what that reason is.

> Also, a single sector being unreadable/unwritable doesn't count as "disk
> failure" in my book, and shouldn't eject the whole disk. If I have 2
> sectors on 2 different disks that are unreadable, md currently trashes
> the whole array and doesn't let me read anything at all anymore. That's
> obviously broken, but unfortunately the sad reality.
> See http://neil.brown.name/blog/20110216044002#1

This is all true, however, I would hope that when this is implemented,
the distributions will properly alert the user that one or more drives
are faulty. One failed write is very frequently indicative of more
failed writes to come. Personally, I would want to replace that drive ASAP.

In addition, the one thing that appeared missing from the blog was the
ability for md to clear the bad blocks list when a drive is replaced,
and rebuild the content of the "bad blocks" from the other members.

> (And, BTW, RAID6 doesn't really help with this problem, because it's
> quite possible that 3 disks have sectors unreadable/unwritable.)

RAID6 simply improves your odds or chances. There is no RAID level that
can provide a 100% uptime, at some point you have lost too many disks or
too much data, etc. Use the appropriate level of RAID depending on your
risk profile.

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17 11:32             ` Adam Goryachev
@ 2013-04-17 11:51               ` Ben Bucksch
  2013-04-17 17:50                 ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Bucksch @ 2013-04-17 11:51 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: Ben Bucksch, Robert L Mathews, Linux RAID

Adam Goryachev wrote, On 17.04.2013 13:32:
> On 17/04/13 21:13, Ben Bucksch wrote:
>> Adam Goryachev wrote, On 17.04.2013 03:35:
>>> Obviously, if they suffered a two disk failure then they won't be here
>>> asking for help will they:)
>> Wrong, sadly. I suffered a 1 disk failure, and I am here asking for
>> help. And nobody can give it.
>>
>> Again: I have a RAID5, and 1 (one) disk failed, so I should be fine, but
>> I cannot read the data anymore, no way to get at it. That's because md
>> ejected a good (!) drive to start with,
> Actually, I think the real problem here is that you don't know why your
> so called good drive was ejected from the array.

I know it doesn't have a fatal hardware failure. See my quote above.

> obviously Linux and/or MD has a different opinion.

See my first post. You see that they have the almost same event count, 
yet I can't re-add it (considering the fact that another drive failed 
entirely).

>
>> and refuses to take it back (!).
> It probably would have taken it back, although requiring a resync.

It did. And that resync uncovered the failure of the other disk. The 
combination trashed my array. The problem is that the first drive should 
never have been ejected, so that the failing drive would not be fatal.

> Like I said, you need to be patient, and follow the expert advice
> provided from the list.

Well, I'm listening. All the info is in my thread:
md RAID5: Disk wrongly marked "spare", need to force re-add it

(And, FYI, being "patient" is difficult when you can't work until the 
array is back online.)

> This discussion is just a diversion from your
> problem, forget the diversion (at least until you get your problem fixed).

I am interested in both: My immediate problem fixed, and that this 
problem hever happens again: not for me, and not for anybody else who 
isn't aware of it yet.

>
>> The problem isn't double disk failure. The problem is bugs in md
>> implementation.
> Or users who expect things to work a certain way, without actually
> bothering to find out in advance. Hence their expectation is considered
> a bug when really it is just a lack of knowledge.

FWIW; I read a lot about RAID before using it, and I use it since 10 
years. RAID5 is supposed to protect against 1 total harddrive failure. 
It doesn't. That's a bug, no matter how you put the light on it.

>
>>> The Linux kernel advises Linux md that the block
>>> device is gone, so Linux md discards the block device and stops trying
>>> to use it. Personally, I don't see that Linux md has a lot of choice in
>>> the matter
>> True. But often, such errors are temporary. For example, a loose cable.
>> I must be able to re-add the device as a good device with data. But I
>> can't, md doesn't let me.
> It does actually. You can re-add it, with a resync, or if you ensure
> that no writes occurred since the drive was ejected, you can re-add it
> without a resync. In addition, even if some writes occurred, if you use
> a bitmap, only the newly written blocks need to by resynced.


>> My case was even more unbelievable: md ejected perfectly good drives
>> simply because I upgraded the OS. (This happened with 2 independent
>> arrays, so not coincidence.)
> Like I said, the drives were ejected for a reason. You just don't know
> what that reason is.
>
>> Also, a single sector being unreadable/unwritable doesn't count as "disk
>> failure" in my book, and shouldn't eject the whole disk. If I have 2
>> sectors on 2 different disks that are unreadable, md currently trashes
>> the whole array and doesn't let me read anything at all anymore. That's
>> obviously broken, but unfortunately the sad reality.
>> See http://neil.brown.name/blog/20110216044002#1
> This is all true, however, I would hope that when this is implemented,
> the distributions will properly alert the user that one or more drives
> are faulty. One failed write is very frequently indicative of more
> failed writes to come. Personally, I would want to replace that drive ASAP.
>
> In addition, the one thing that appeared missing from the blog was the
> ability for md to clear the bad blocks list when a drive is replaced,
> and rebuild the content of the "bad blocks" from the other members.
>
>> (And, BTW, RAID6 doesn't really help with this problem, because it's
>> quite possible that 3 disks have sectors unreadable/unwritable.)
> RAID6 simply improves your odds or chances. There is no RAID level that
> can provide a 100% uptime, at some point you have lost too many disks or
> too much data, etc. Use the appropriate level of RAID depending on your
> risk profile.
>
> Regards,
> Adam
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17 11:51               ` Ben Bucksch
@ 2013-04-17 17:50                 ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 28+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-17 17:50 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: Robert L Mathews, Linux RAID, Adam Goryachev

> FWIW; I read a lot about RAID before using it, and I use it since 10
> years. RAID5 is supposed to protect against 1 total harddrive failure.
> It doesn't. That's a bug, no matter how you put the light on it.

Usually, the problem is someone using desktop drives without scterc enabled. If the drive goes into deep recovery, it'll time out from Linux' point of view and flagged as bad. See my post in your original thread for more info.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  0:20       ` Ben Bucksch
  2013-04-17  1:35         ` Adam Goryachev
@ 2013-04-17  3:32         ` Robert L Mathews
  1 sibling, 0 replies; 28+ messages in thread
From: Robert L Mathews @ 2013-04-17  3:32 UTC (permalink / raw)
  To: Linux RAID

On 4/16/13 5:20 PM, Ben Bucksch wrote:

> However, I think the RAID5 problems are caused by bad design decisions
> in the md implementation, not in the inherent concept of RAID5, though.

I'm not so sure this is true. I once lost (backup) data on a proprietary
non-mdadm RAID 5 system, too, because some spurious event caused
problems for multiple drives at once.

With mdadm, at least there's the opportunity to fix something with the
raw disks, which proprietary systems don't allow. Knowing the right
recovery steps to take is complex and easy to screw up, but there are
many different things that could have gone wrong in the first place.

As Adam Goryachev said, it's amazing how many of the the "my RAID died
*and* I did something foolish" stories do end with getting the data
back. This speaks well of the flexibility and power of mdadm in the true
Unix sense, I think.

-- 
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 22:44     ` Robert L Mathews
  2013-04-17  0:20       ` Ben Bucksch
@ 2013-04-17  4:20       ` Roman Mamedov
  2013-04-17  5:22         ` Robert L Mathews
  1 sibling, 1 reply; 28+ messages in thread
From: Roman Mamedov @ 2013-04-17  4:20 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 2219 bytes --]

On Tue, 16 Apr 2013 15:44:03 -0700
Robert L Mathews <lists@tigertech.com> wrote:

> On 4/16/13 1:05 PM, Carsten Aulbert wrote:
> 
> > The problem I find with RAID1 is that it won't protect you against
> > silent corruptions (same as RAID5). What do you do if you do a through
> > check and both drives claim a data block is valid and intact, but data
> > differs? Do you trust disk1 or disk2?
> 
> That's partly why we use three-disk arrays instead of two-disk.

You do know there is no "voting" system in md, right?

If you imagine that all three disks are being read in parallel, and if one
returns bad data, it is automatically "overruled" by a majority vote from the
two other ones with correct data, that's not how it works at all.

The data is read randomly from all three disks (I think it's load-balanced by
process ID); if one disk happened to silently return corrupt data, that's it,
your app just got corrupt data passed to it, and if happens to write it back
to disk (maybe after some processing), then the incorrect data will be
faithfully replicated by md to all three disks. So in the future you have not
even a _chance_ to read back the correct data that was previously there.

> In the meantime, I'd rather risk this problem than the endless reports
> of complete array failures that appear on the list with RAID 5 and even
> RAID 6 (a recent topic, I note, was "multiple disk failures in an md
> raid6 array"). I almost never see anyone reporting complete loss of a
> RAID 1 array.

In general, you seem to be WAY too concerned about losing your RAID array;
this sounds like you are someone who doesn't make backups and tries to use
RAID as a replacement for them. Don't forget if for example a rogue program
gets 'root' on your machine and overwrites the md device with zeroes, it will
be instantly replicated to all three disks as well.

As for me, if I lose my primary RAID6, it's a maximum a day's worth of
changes, and some data transfer from here and there to get it all copied from
backups and be up and running again. (I could reduce even that risk and easily
back up 4 times a day, but do not see the need at the moment.)

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-17  4:20       ` Roman Mamedov
@ 2013-04-17  5:22         ` Robert L Mathews
  0 siblings, 0 replies; 28+ messages in thread
From: Robert L Mathews @ 2013-04-17  5:22 UTC (permalink / raw)
  To: Linux RAID

On 4/16/13 9:20 PM, Roman Mamedov wrote:

> You do know there is no "voting" system in md, right?

Yes, but the question was "What do you do if you do a through check and
both drives claim a data block is valid and intact, but data differs?"
The implication was that the array has failed and you need to manually
reconstruct data, perhaps sector-by-sector.

Having three sources for a manual reconstruction outside of md reduces
the "someone with two clocks never knows what time it is" problem. With
three, you can make an informed guess about which one is wrong.

I'm not saying that this is the primary reason to use three disks in
RAID 1, because it's not. I've never needed to do sector-level recovery
of an array. The primary reason is so that you can withstand two
simultaneous disk failures, just as with RAID 6 vs. RAID 5.

> In general, you seem to be WAY too concerned about losing your RAID array;
> this sounds like you are someone who doesn't make backups and tries to use
> RAID as a replacement for them.

No, that's definitely not the case. We have backup systems in multiple
data centers, and our disaster recovery planning includes plane crashes
that destroy live servers and so on.

Many businesses require 100% availability. Losing an array on a server
means downtime and telling paying customers "we lost the new data you
stored since the last backup". Neither is acceptable if it's in any way
avoidable, even if the last backup was minutes ago. Like you, I can
easily recover from losing a couple of hours work, but my customers who
run online stores are less sanguine about such things.

By the way, I think I'm going to pin "you seem to be WAY too concerned
about losing your RAID array" up on my wall. That's wonderful, because
it's exactly how concerned I want to be.  ;-)

-- 
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Use RAID-6!
  2013-04-16 19:52 ` Robert L Mathews
  2013-04-16 20:05   ` Carsten Aulbert
@ 2013-04-17 17:27   ` Roy Sigurd Karlsbakk
  1 sibling, 0 replies; 28+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-17 17:27 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: Linux RAID

> I personally don't even trust RAID 6. All our servers use three-disk
> RAID 1 setups, with disks from at least two different manufacturers to
> prevent against firmware bricking (although this is becoming more and
> more difficult as the industry consolidates).

You can *never* trust RAID alone. Even with ZFS, you can have problems taking down a whole pool, even with RAIDz3 and ZFS' checksumming. A power surge can take down half (or even all) the drives in the array, and even with three-way mirrors, the chances are good your pool will die.

So, choose something decent, like RAID-6 (RAIDz2) or mirrors, three-way if you're paranoid, and keep a good backup, preferably offsite and on tape. Tapes in a tape library can't be damaged much of a power surge (except perhaps those in the reader).

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* md dropping disks too early (was: Use RAID-6!)
  2013-04-16 16:44 Use RAID-6! Roy Sigurd Karlsbakk
  2013-04-16 17:09 ` Mikael Abrahamsson
  2013-04-16 19:52 ` Robert L Mathews
@ 2013-04-16 23:42 ` Ben Bucksch
  2013-04-17  8:00   ` Mikael Abrahamsson
  2 siblings, 1 reply; 28+ messages in thread
From: Ben Bucksch @ 2013-04-16 23:42 UTC (permalink / raw)
  To: Linux RAID

The purpose of my RAID system is 1) to protect against hardware disk 
failures, both that a harddrive is entirely broken and won't read at all 
anymore. I know that this *will* happen at some point, but it's still a 
fairly rare event. The chance that 2 out of 8 drives go bad *in the same 
week* (!) is very small.

I am also concerned about 2) bit errors and silently broken sectors, and 
want my RAID to detect and fix those. I am not sure that Linux md does that.

There is a good chance that a controller or some wiring is bad, and many 
disks fail at the same time. Neither RAID5 nor RAID6 will protect 
against that, but a re-cabling should fix it without data loss, as the 
data on the disks is not affected.

Given that this RAID array is for my personal use, and the amount of 
disk slots in a machine is limited, and drives need 24/7 power, too, a 
RAID5 is the right choice for me, given the above situation.

---

BUT - and this is the main purpose of my post - Linux md causes problems 
by itself:

In my case, and from what I read in other posts in forums and on this 
mailing lists, many people have the problem that Linux md simply drops a 
disk from the RAID5, even though there was NOT an unrecoverable hardware 
failure. There are many situations where this happens:

 1. Upgrade (my case)
 2. Disk temporarily not accessible
 3. Disk has bad sectors (but the other content can still be read)

None of these should be fatal. But it seems that md marks the disk as 
faulty and requires a resync. There does not seem to be any way to get a 
disk that was once marked spare or faulty back into the array, unless I 
do a resync. (If somebody knows a way, please show me, see thread 'Disk 
wrongly marked "spare", need to force re-add it'.) Now, the resync needs 
to read all data from all disks and can be the event that uncovers a 
problem with one of the other disks. That disk is then dropped as well, 
again with no way to re-add, and the array is entirely lost. However, 
that is completely unnecessary, given that there are often only a few 
bad sectors, and these - while bad - are no reason to say goodbye to 
several TB of data.

Essentially, by being overly cautious with the data and dropping disks 
too early and being too instant about it, md actually achieves the 
opposite of what it was made for. It was intended to protect my data 
against disk problems, but md actually makes minor or even temporary 
problems resulting in a total dataloss.

I'm not overstating, because that's the exact situation I am in right 
now. I have only 1 disk that's actually failing, and a RAID5, so in 
theory I am fine. But I see no way to safely get at my data anymore. My 
array is offline and I have no idea how to get it online again without 
risking to lose all data.

And worst: the whole situation was triggered by md dropping a disk from 
the array that is wasn't even failing, but just because I upgraded. :-(

Ben

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: md dropping disks too early (was: Use RAID-6!)
  2013-04-16 23:42 ` md dropping disks too early (was: Use RAID-6!) Ben Bucksch
@ 2013-04-17  8:00   ` Mikael Abrahamsson
  2013-04-17 10:57     ` md dropping disks too early Ben Bucksch
  0 siblings, 1 reply; 28+ messages in thread
From: Mikael Abrahamsson @ 2013-04-17  8:00 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: Linux RAID

On Wed, 17 Apr 2013, Ben Bucksch wrote:

> I am also concerned about 2) bit errors and silently broken sectors, and 
> want my RAID to detect and fix those. I am not sure that Linux md does 
> that.

Yes it does, but you need to do frequent scrubbing to reduce the risk of 
hitting this when you actually need it, ie after complete drive failure.

> Given that this RAID array is for my personal use, and the amount of 
> disk slots in a machine is limited, and drives need 24/7 power, too, a 
> RAID5 is the right choice for me, given the above situation.

It's the combination of drive failure and other drive having read errors 
that RAID6 protects against. At least that's my primary use for it.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: md dropping disks too early
  2013-04-17  8:00   ` Mikael Abrahamsson
@ 2013-04-17 10:57     ` Ben Bucksch
  2013-04-17 15:03       ` Keith Keller
  2013-04-17 18:09       ` Roy Sigurd Karlsbakk
  0 siblings, 2 replies; 28+ messages in thread
From: Ben Bucksch @ 2013-04-17 10:57 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Linux RAID

Mikael Abrahamsson wrote, On 17.04.2013 10:00:
> Yes it does, but you need to do frequent scrubbing to reduce the risk 
> of hitting this when you actually need it, ie after complete drive 
> failure.

No, it's not "me" who needs to do that. The software needs to be set up 
by default to do that, be it the kernel or some userland cron job from 
the distro (advantage of latter: configurable). Apparently, Ubuntu 10.04 
didn't do that.
Please stop blaming users, start blaming the software, and fix it.

> It's the combination of drive failure and other drive having read 
> errors that RAID6 protects against. At least that's my primary use for 
> it. 

But a single read error is no reason to send the whole array to the 
trash. RAID6 is merely a workaround here.

With joy, I read that this problem was described, recognized and 
intended to be fixed by the developers:
http://neil.brown.name/blog/20110216044002#1 "Bad Block Log"
Unfortunately, that doesn't seem to be done, as I was running into 
exactly that problem he describes. I hope somebody will fix that, 
because he eloquently describes how the RAID achieves the opposite of 
what it's intended to do.

Ben

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: md dropping disks too early
  2013-04-17 10:57     ` md dropping disks too early Ben Bucksch
@ 2013-04-17 15:03       ` Keith Keller
  2013-04-17 18:09       ` Roy Sigurd Karlsbakk
  1 sibling, 0 replies; 28+ messages in thread
From: Keith Keller @ 2013-04-17 15:03 UTC (permalink / raw)
  To: linux-raid

On 2013-04-17, Ben Bucksch <linux.news@bucksch.org> wrote:
>
> No, it's not "me" who needs to do that. The software needs to be set up 
> by default to do that, be it the kernel or some userland cron job from 
> the distro (advantage of latter: configurable). Apparently, Ubuntu 10.04 
> didn't do that.

CentOS (and, by implication, RHEL) has had this check since version 5
(not sure which minor version).

--keith


-- 
kkeller@wombat.san-francisco.ca.us



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: md dropping disks too early
  2013-04-17 10:57     ` md dropping disks too early Ben Bucksch
  2013-04-17 15:03       ` Keith Keller
@ 2013-04-17 18:09       ` Roy Sigurd Karlsbakk
  1 sibling, 0 replies; 28+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-17 18:09 UTC (permalink / raw)
  To: Ben Bucksch; +Cc: Linux RAID, Mikael Abrahamsson

> No, it's not "me" who needs to do that. The software needs to be set
> up
> by default to do that, be it the kernel or some userland cron job from
> the distro (advantage of latter: configurable). Apparently, Ubuntu
> 10.04
> didn't do that.
> Please stop blaming users, start blaming the software, and fix it.

Not sure about Ubuntu 10.04, but 12.04 and later has this cron'ed first sunday of the month.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-04-17 18:09 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-16 16:44 Use RAID-6! Roy Sigurd Karlsbakk
2013-04-16 17:09 ` Mikael Abrahamsson
2013-04-16 17:25   ` Roy Sigurd Karlsbakk
2013-04-16 20:01   ` David Brown
2013-04-17  7:56     ` Mikael Abrahamsson
2013-04-17  9:26       ` David Brown
2013-04-16 19:52 ` Robert L Mathews
2013-04-16 20:05   ` Carsten Aulbert
2013-04-16 20:19     ` Roman Mamedov
2013-04-16 22:44     ` Robert L Mathews
2013-04-17  0:20       ` Ben Bucksch
2013-04-17  1:35         ` Adam Goryachev
2013-04-17  4:27           ` Robert L Mathews
2013-04-17  4:45             ` Adam Goryachev
2013-04-17  6:06             ` Stan Hoeppner
2013-04-17 11:13           ` Ben Bucksch
2013-04-17 11:32             ` Adam Goryachev
2013-04-17 11:51               ` Ben Bucksch
2013-04-17 17:50                 ` Roy Sigurd Karlsbakk
2013-04-17  3:32         ` Robert L Mathews
2013-04-17  4:20       ` Roman Mamedov
2013-04-17  5:22         ` Robert L Mathews
2013-04-17 17:27   ` Roy Sigurd Karlsbakk
2013-04-16 23:42 ` md dropping disks too early (was: Use RAID-6!) Ben Bucksch
2013-04-17  8:00   ` Mikael Abrahamsson
2013-04-17 10:57     ` md dropping disks too early Ben Bucksch
2013-04-17 15:03       ` Keith Keller
2013-04-17 18:09       ` Roy Sigurd Karlsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox