linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Menion <menion@gmail.com>
Cc: Christoph Anton Mitterer <calestyo@scientia.net>,
	linux-btrfs@vger.kernel.org
Subject: Re: Status of RAID5/6
Date: Sat, 31 Mar 2018 00:53:18 -0400	[thread overview]
Message-ID: <20180331045317.GD2446@hungrycats.org> (raw)
In-Reply-To: <CAJVZm6dY9vg15Qxnv8uLAwwPvoiiMZ0gJWARLS_9Ab6LX47ajA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

On Fri, Mar 30, 2018 at 09:21:00AM +0200, Menion wrote:
>  Thanks for the detailed explanation. I think that a summary of this
> should go in the btrfs raid56 wiki status page, because now it is
> completely inconsistent and if a user comes there, ihe may get the
> impression that the raid56 is just broken
> Still I have the 1 bilion dollar question: from your word I understand
> that even in RAID56 the metadata are spread on the devices in a coplex
> way, but shall I assume that the array can survice to the sudden death
> of one (two for raid6) HDD in the array?

I wouldn't assume that.  There is still the write hole, and while there
is a small probability of having a write hole failure, it's a probability
that applies on *every* write in degraded mode, and since disks can fail
at any time, the array can enter degraded mode at any time.

It's similar to lottery tickets--buy one ticket, you probably won't win,
but if you buy millions of tickets, you'll claim the prize eventually.
The "prize" in this case is a severely damaged, possibly unrecoverable
filesystem.

If the data is raid5 and the metadata is raid1, the filesystem can
survive a single disk failure easily; however, some of the data may be
lost if writes to the remaining disks are interrupted by a system crash
or power failure and the write hole issue occurs.  Note that the damage
is not necessarily limited to recently written data--it's any random
data that is merely located adjacent to written data on the filesystem.

I wouldn't use raid6 until the write hole issue is resolved.  There is
no configuration where two disks can fail and metadata can still be
updated reliably.

Some users use the 'ssd_spread' mount option to reduce the probability
of write hole failure, which happens to be helpful by accident on some
array configurations, but it has a fairly high cost when the array is
not degraded due to all the extra balancing required.



> Bye

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2018-03-31  4:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 16:50 Status of RAID5/6 Menion
2018-03-21 17:24 ` Liu Bo
2018-03-21 20:02   ` Christoph Anton Mitterer
2018-03-22 12:01     ` Austin S. Hemmelgarn
2018-03-29 21:50     ` Zygo Blaxell
2018-03-30  7:21       ` Menion
2018-03-31  4:53         ` Zygo Blaxell [this message]
2018-03-30 16:14       ` Goffredo Baroncelli
2018-03-31  5:03         ` Zygo Blaxell
2018-03-31  6:57           ` Goffredo Baroncelli
2018-03-31  7:43             ` Zygo Blaxell
2018-03-31  8:16               ` Goffredo Baroncelli
     [not found]                 ` <28a574db-0f74-b12c-ab5f-400205fd80c8@gmail.com>
2018-03-31 14:40                   ` Zygo Blaxell
2018-03-31 22:34             ` Chris Murphy
2018-04-01  3:45               ` Zygo Blaxell
2018-04-01 20:51                 ` Chris Murphy
2018-04-01 21:11                   ` Chris Murphy
2018-04-02  5:45                     ` Zygo Blaxell
2018-04-02 15:18                       ` Goffredo Baroncelli
2018-04-02 15:49                         ` Austin S. Hemmelgarn
2018-04-02 22:23                           ` Zygo Blaxell
2018-04-03  0:31                             ` Zygo Blaxell
2018-04-03 17:03                               ` Goffredo Baroncelli
2018-04-03 22:57                                 ` Zygo Blaxell
2018-04-04  5:15                                   ` Goffredo Baroncelli
2018-04-04  6:01                                     ` Zygo Blaxell
2018-04-04 21:31                                       ` Goffredo Baroncelli
2018-04-04 22:38                                         ` Zygo Blaxell
2018-04-04  3:08                                 ` Chris Murphy
2018-04-04  6:20                                   ` Zygo Blaxell
2018-03-21 20:27   ` Menion
2018-03-22 21:13   ` waxhead

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180331045317.GD2446@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=calestyo@scientia.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=menion@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).