Re: Status of RAID5/6 - Zygo Blaxell

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Menion <menion@gmail.com>
Cc: Christoph Anton Mitterer <calestyo@scientia.net>,
	linux-btrfs@vger.kernel.org
Subject: Re: Status of RAID5/6
Date: Sat, 31 Mar 2018 00:53:18 -0400	[thread overview]
Message-ID: <20180331045317.GD2446@hungrycats.org> (raw)
In-Reply-To: <CAJVZm6dY9vg15Qxnv8uLAwwPvoiiMZ0gJWARLS_9Ab6LX47ajA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

On Fri, Mar 30, 2018 at 09:21:00AM +0200, Menion wrote:
>  Thanks for the detailed explanation. I think that a summary of this
> should go in the btrfs raid56 wiki status page, because now it is
> completely inconsistent and if a user comes there, ihe may get the
> impression that the raid56 is just broken
> Still I have the 1 bilion dollar question: from your word I understand
> that even in RAID56 the metadata are spread on the devices in a coplex
> way, but shall I assume that the array can survice to the sudden death
> of one (two for raid6) HDD in the array?

I wouldn't assume that.  There is still the write hole, and while there
is a small probability of having a write hole failure, it's a probability
that applies on *every* write in degraded mode, and since disks can fail
at any time, the array can enter degraded mode at any time.

It's similar to lottery tickets--buy one ticket, you probably won't win,
but if you buy millions of tickets, you'll claim the prize eventually.
The "prize" in this case is a severely damaged, possibly unrecoverable
filesystem.

If the data is raid5 and the metadata is raid1, the filesystem can
survive a single disk failure easily; however, some of the data may be
lost if writes to the remaining disks are interrupted by a system crash
or power failure and the write hole issue occurs.  Note that the damage
is not necessarily limited to recently written data--it's any random
data that is merely located adjacent to written data on the filesystem.

I wouldn't use raid6 until the write hole issue is resolved.  There is
no configuration where two disks can fail and metadata can still be
updated reliably.

Some users use the 'ssd_spread' mount option to reduce the probability
of write hole failure, which happens to be helpful by accident on some
array configurations, but it has a fairly high cost when the array is
not degraded due to all the extra balancing required.

> Bye

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

next prev parent reply	other threads:[~2018-03-31  4:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 16:50 Status of RAID5/6 Menion
2018-03-21 17:24 ` Liu Bo
2018-03-21 20:02   ` Christoph Anton Mitterer
2018-03-22 12:01     ` Austin S. Hemmelgarn
2018-03-29 21:50     ` Zygo Blaxell
2018-03-30  7:21       ` Menion
2018-03-31  4:53         ` Zygo Blaxell [this message]
2018-03-30 16:14       ` Goffredo Baroncelli
2018-03-31  5:03         ` Zygo Blaxell
2018-03-31  6:57           ` Goffredo Baroncelli
2018-03-31  7:43             ` Zygo Blaxell
2018-03-31  8:16               ` Goffredo Baroncelli
     [not found]                 ` <28a574db-0f74-b12c-ab5f-400205fd80c8@gmail.com>
2018-03-31 14:40                   ` Zygo Blaxell
2018-03-31 22:34             ` Chris Murphy
2018-04-01  3:45               ` Zygo Blaxell
2018-04-01 20:51                 ` Chris Murphy
2018-04-01 21:11                   ` Chris Murphy
2018-04-02  5:45                     ` Zygo Blaxell
2018-04-02 15:18                       ` Goffredo Baroncelli
2018-04-02 15:49                         ` Austin S. Hemmelgarn
2018-04-02 22:23                           ` Zygo Blaxell
2018-04-03  0:31                             ` Zygo Blaxell
2018-04-03 17:03                               ` Goffredo Baroncelli
2018-04-03 22:57                                 ` Zygo Blaxell
2018-04-04  5:15                                   ` Goffredo Baroncelli
2018-04-04  6:01                                     ` Zygo Blaxell
2018-04-04 21:31                                       ` Goffredo Baroncelli
2018-04-04 22:38                                         ` Zygo Blaxell
2018-04-04  3:08                                 ` Chris Murphy
2018-04-04  6:20                                   ` Zygo Blaxell
2018-03-21 20:27   ` Menion
2018-03-22 21:13   ` waxhead

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180331045317.GD2446@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=calestyo@scientia.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=menion@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.