future of raid 6

Linux RAID subsystem development
 help / color / mirror / Atom feed

* future of raid 6
@ 2017-09-03 18:28 Markus
  2017-09-03 21:03 ` Wols Lists
  2017-09-04  9:53 ` Andreas Klauer
  0 siblings, 2 replies; 6+ messages in thread
From: Markus @ 2017-09-03 18:28 UTC (permalink / raw)
  To: linux-raid

Hello!

I have one question (maybe more) regarding the raid development. Especially 
raid 6 and better.
I read in some articles[1] that raid 6 will become worse as the size of the 
disks grow but the (unrecoverable) error rate and data rate do not improve as 
much. (Rebuilds are likely to hit an unrecoverable error, not to mention the 
long time it will take to rebuild raids with +10TB per drive.)
They estimated 2019 (that was written ten years ago).

Is there already something in progress for the md-raid in linux kernel?
	If so what and where?
	If not, why? Is it not needed yet? Is md-raid itself depreciated?
	What is the future for redundant mass storage?

Best regards,
Markus

[1] http://queue.acm.org/detail.cfm?id=1670144

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: future of raid 6
  2017-09-03 18:28 future of raid 6 Markus
@ 2017-09-03 21:03 ` Wols Lists
  2017-09-03 21:53   ` Gandalf Corvotempesta
  2017-09-04  9:53 ` Andreas Klauer
  1 sibling, 1 reply; 6+ messages in thread
From: Wols Lists @ 2017-09-03 21:03 UTC (permalink / raw)
  To: Markus, linux-raid

On 03/09/17 19:28, Markus wrote:
> Hello!
> 
> I have one question (maybe more) regarding the raid development. Especially 
> raid 6 and better.
> I read in some articles[1] that raid 6 will become worse as the size of the 
> disks grow but the (unrecoverable) error rate and data rate do not improve as 
> much. (Rebuilds are likely to hit an unrecoverable error, not to mention the 
> long time it will take to rebuild raids with +10TB per drive.)
> They estimated 2019 (that was written ten years ago).
> 
> Is there already something in progress for the md-raid in linux kernel?
> 	If so what and where?
> 	If not, why? Is it not needed yet? Is md-raid itself depreciated?
> 	What is the future for redundant mass storage?
> 
Hard errors aren't that common (at least until a drive fails :-)

A hard error is a sector that cannot be read. A soft error is one that
works fine after a reset (however you define said reset). A consumer
10TB disk can return one soft error per complete pass and still be
passed "it's good, it's within spec".

(And I think a lot of hard errors are down to not looking after the
data. Just as DRAM decays over nano-seconds, so do hard drives decay
over time, exacerbated by nearby writes. A hard error could simply be
data that's been degaussed by nearby writes. Fail to deal with that with
eg scrubs, and a perfectly functional drive can trash your data :-(

IFF someone thinks it's worth it, we may add raid-6+ functionality (ie
more than two parity drives), but I suspect enterprise drives will
simply become more reliable as they get bigger (manufacturers will move
more and more error correction technology into the firmware).

Bear in mind enterprise drives are roughly four binary orders of
magnitude more reliable, so a 160TB should just about read completely
without error.

As for "is md-raid deprecated", well a lot of the filesystem guys would
like to do so, and I sort of agree with them - the more layers of
abstraction, the worse things are. But the opposite also holds true -
put raid into the filesystem itself and the complexity there can explode
- just witness the trouble higher raid has caused the btrfs folks.

In *MY* humble (very) opinion, unrecoverable errors are far less common
than people think, technology is improving, and "2019" is a lot further
off than two years. That said, we will probably need something more than
plain raid-6 once we start using having disks in the 50-80TB range.
Unless, of course, technology inside the drive improves which it
probably will.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: future of raid 6
  2017-09-03 21:03 ` Wols Lists
@ 2017-09-03 21:53   ` Gandalf Corvotempesta
  0 siblings, 0 replies; 6+ messages in thread
From: Gandalf Corvotempesta @ 2017-09-03 21:53 UTC (permalink / raw)
  To: Wols Lists; +Cc: Markus, Linux RAID Mailing List

2017-09-03 23:03 GMT+02:00 Wols Lists <antlists@youngman.org.uk>:
> IFF someone thinks it's worth it, we may add raid-6+ functionality (ie
> more than two parity drives),

This would be great idea. Even ZFS has support for RAID-7 (3 parity disks)


> but I suspect enterprise drives will
> simply become more reliable as they get bigger (manufacturers will move
> more and more error correction technology into the firmware).

URE is the same even for enterprise disks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: future of raid 6
  2017-09-03 18:28 future of raid 6 Markus
  2017-09-03 21:03 ` Wols Lists
@ 2017-09-04  9:53 ` Andreas Klauer
  2017-09-04 12:28   ` Mikael Abrahamsson
  2017-09-05  3:44   ` Phil Turmel
  1 sibling, 2 replies; 6+ messages in thread
From: Andreas Klauer @ 2017-09-04  9:53 UTC (permalink / raw)
  To: Markus; +Cc: linux-raid

On Sun, Sep 03, 2017 at 08:28:15PM +0200, Markus wrote:
> (Rebuilds are likely to hit an unrecoverable error, not to mention the 
> long time it will take to rebuild raids with +10TB per drive.)

Rebuilds are not likely to hit an unrecoverable error at all.
Rebuilds do not take a long time, and it tends to be irrelevant.

A lot of articles make rebuilds out to be some kind of mythological beast 
that breathes fire, spits poison, and devours your drives. It's not true.

The probability of failure during a rebuild is not higher than normal. 
A rebuild is as boring as can be, just a linear read (n-1 drives), 
linear write (1 drive). Reshapes are more interesting but it's still 
just normal reading and writing: there is no magic involved, there is 
absolutely nothing that could possibly cause undue drive failures.

Drives fail randomly, silently. The only way to verify that a drive hasn't 
failed yet is to read everything from start to end. If you don't run these 
read tests regularly, the error will go unnoticed for weeks, months, years.

That's the timeframe we're really looking at: not hours, days it actually 
takes for the rebuild itself to finish, but weeks (for you to order a new 
drive and it actually getting shipped), and months (for you to detect the 
error in the first place and bring yourself to order a replacement and not 
see how it goes first because you feel the pinch of replacement costs).

If you do not monitor and regularly test your drives to cut down this time, 
no amount of redundancy will ever be enough. If you never run any tests, 
the errors will pop up during rebuild. If you never tested and thus allowed 
the rebuild itself to be the first ever test in years, it's not a miracle. 
You can't blame the rebuild for that.

It has nothing to do with bit error rates, timeouts, same batch of drives, 
and all the other shit people come up with to make excuses for their lack 
of monitoring drives properly.

Genuine simultaneous drive failures are *very* rare.

> What is the future for redundant mass storage?

It's still RAID-5 or something similar on a filesystem level.
Simply because it works, and it's cost effective. It doesn't matter 
if the drive has 1000MB, 100GB, 10TB - nothing changed.

Most machines (desktops, nas, rented servers) only have 2-4 drives anyway. 
You need twice as many for RAID-6 to start making any kind of sense, 
and triple parity is even further off, like 20+ drives.

If you're currently running a three drive RAID-6 and need more redundancy, 
you can do it with a four drive RAID-1. Knock yourself out. Nobody does it. 
It's barking mad. The only problem it solves does not really exist.

And regardless of redundancy, you still need backups. If you have backups, 
the zero point something chance of simultaneous failure matters even less.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: future of raid 6
  2017-09-04  9:53 ` Andreas Klauer
@ 2017-09-04 12:28   ` Mikael Abrahamsson
  2017-09-05  3:44   ` Phil Turmel
  1 sibling, 0 replies; 6+ messages in thread
From: Mikael Abrahamsson @ 2017-09-04 12:28 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Markus, linux-raid

On Mon, 4 Sep 2017, Andreas Klauer wrote:

> Genuine simultaneous drive failures are *very* rare.

No, the "drive failed and I now have URE on another drive" is not "*very 
rare*". Yes, it can be somewhat mitigated by frequent scrubbing.

That's why I run RAID6 and not RAID5. I have been hit by the above problem 
several times when running RAID5. I haven't yet had a data loss event 
(that I know of) because of this since I moved to RAID6.

I think "RAID6 with triple parity" would make sense for some deployment 
scenarios. It's not uncommon for people to have several RAID6 arrays and 
then have a hot spare in the chassis. This hot spare could instead 
actually do work by being triple redundancy in one array, instead of 
sitting there powered on but doing nothing.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: future of raid 6
  2017-09-04  9:53 ` Andreas Klauer
  2017-09-04 12:28   ` Mikael Abrahamsson
@ 2017-09-05  3:44   ` Phil Turmel
  1 sibling, 0 replies; 6+ messages in thread
From: Phil Turmel @ 2017-09-05  3:44 UTC (permalink / raw)
  To: Markus; +Cc: Andreas Klauer, linux-raid

Hi Markus,

On 09/04/2017 05:53 AM, Andreas Klauer wrote:
> On Sun, Sep 03, 2017 at 08:28:15PM +0200, Markus wrote:
>> (Rebuilds are likely to hit an unrecoverable error, not to mention the 
>> long time it will take to rebuild raids with +10TB per drive.)
> 
> Rebuilds are not likely to hit an unrecoverable error at all.
> Rebuilds do not take a long time, and it tends to be irrelevant.
> 
> A lot of articles make rebuilds out to be some kind of mythological beast 
> that breathes fire, spits poison, and devours your drives. It's not true.

You should be aware that some helpful people nevertheless have a magical
view of technology and don't actually provide accurate information, like
the above (commonly repeated) screed.

The failure event possibilities of the various raid levels can be easily
analyzed if one is unafraid of math (statistics) and have a modest view
of the possible events involved.

The possible events are basically true, persistent hardware failure of
the drive, and potentially rewritable/fixable transient magnetic
failures in data retrieval.  The can combine in various ways, like so:

https://marc.info/?l=linux-raid&m=139050322510249&w=2

The odds of total hardware failure can be inferred from warranty
lengths, but are generally much lower than the odds of transient flaws,
based on manufacturer's specification claims for unrecoverable read
errors.  The math for the latter follows the Poisson distribution, like so:

https://marc.info/?l=linux-raid&m=135863964624202&w=2

>> What is the future for redundant mass storage?

Pretty bright, if you really take a look at the math.  And are willing
to purchase hardware that cooperates with automated maintenance instead
of blowing up (the dreaded and very real timeout mismatch problem that
Andreas dismisses as fantasy).  Simply put, if smartctl -l scterc yields
a message that Error Recovery Control is not supported, you *must*
adjust the kernel's driver timeouts to avoid catastrophic array failure.

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-05  3:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-03 18:28 future of raid 6 Markus
2017-09-03 21:03 ` Wols Lists
2017-09-03 21:53   ` Gandalf Corvotempesta
2017-09-04  9:53 ` Andreas Klauer
2017-09-04 12:28   ` Mikael Abrahamsson
2017-09-05  3:44   ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox