All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Roman Mamedov <rm@romanrm.net>
Cc: Chris Murphy <lists@colorremedies.com>,
	Hugo Mills <hugo@carfax.org.uk>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	Austin Hemmelgarn <ahferroin7@gmail.com>
Subject: Re: RAID system with adaption to changed number of disks
Date: Wed, 12 Oct 2016 16:33:23 -0400	[thread overview]
Message-ID: <20161012203323.GI26140@hungrycats.org> (raw)
In-Reply-To: <20161013003331.5e33c006@natsu>

[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]

On Thu, Oct 13, 2016 at 12:33:31AM +0500, Roman Mamedov wrote:
> On Wed, 12 Oct 2016 15:19:16 -0400
> Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote:
> 
> > I'm not even sure btrfs does this--I haven't checked precisely what
> > it does in dup mode.  It could send both copies of metadata to the
> > disks with a single barrier to separate both metadata updates from
> > the superblock updates.  That would be bad in this particular case.
> 
> It would be bad in any case, including a single physical disk and no RAID, and

No, a single disk does not have these problems.  On a single disk we don't
have to deal with temporarily corrupted metadata _outside_ the areas we
are writing, as the disk will confine damaged data to individual sectors.
On RAID5, data damage is only limited at the stripe level, a unit orders
of magnitude larger than a sector.

> I don't think there's any basis to speculate that mdadm doesn't implement
> write barriers properly.

btrfs and mdadm have to use them properly together.  It's possible to
get it fatally wrong from the btrfs side even if mdadm does everything
perfectly.  Single disks don't have stripe consistency requirements,
so if btrfs has single-disk assumptions about the behavior of writes
then it can do the wrong thing on multi-disk systems.

> > In degraded RAID5/6 mode, all writes temporarily corrupt data, so if there
> > is an interruption (system crash, a disk times out, etc) in degraded mode,
> 
> Moreover, in any non-COW system writes temporarily corrupt data. So again,
> writing to a (degraded or not) mdadm RAID5 is not much different than writing
> to a single physical disk. However I believe in the Btrfs case metadata is
> always COW, so this particular problem may be not as relevant here in the
> first place.

Degraded RAID5 does not behave like a single disk.  That's the point
people seem to keep missing when thinking about this.  btrfs CoW relies
on single-disk behavior, and fails badly when it doesn't get it.

btrfs CoW requires that writes to one sector don't modify or jeopardize
data integrity in any other sectors.  mdadm in degraded raid5/6 mode with
no stripe journal device cannot deliver this requirement.  Writes always
temporarily disrupt data on other disks in the same RAID stripe.  Each
individual disruption lasts only milliseconds, but there may be hundreds
or thousands of failure windows per second.

> 
> -- 
> With respect,
> Roman



[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2016-10-12 20:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 15:14 RAID system with adaption to changed number of disks Philip Louis Moetteli
2016-10-11 16:06 ` Hugo Mills
2016-10-11 23:58   ` Chris Murphy
2016-10-12  1:32     ` Qu Wenruo
2016-10-12  4:37       ` Zygo Blaxell
2016-10-12  5:48         ` Qu Wenruo
2016-10-12 17:19           ` Zygo Blaxell
2016-10-12 19:55             ` Adam Borowski
2016-10-12 21:10               ` Zygo Blaxell
2016-10-13  3:40                 ` Adam Borowski
2016-10-12 20:41             ` Chris Murphy
2016-10-13  0:35             ` Qu Wenruo
2016-10-13 21:03               ` Zygo Blaxell
2016-10-14  1:24                 ` Qu Wenruo
2016-10-14  7:16                   ` Chris Murphy
2016-10-14 19:55                     ` Zygo Blaxell
2016-10-14 21:19                       ` Duncan
2016-10-14 21:38                       ` Chris Murphy
2016-10-14 22:30                         ` Chris Murphy
2016-10-15  3:19                           ` Zygo Blaxell
2016-10-12  7:02         ` Anand Jain
2016-10-12  7:25     ` Roman Mamedov
2016-10-12 17:31       ` Zygo Blaxell
2016-10-12 19:19         ` Zygo Blaxell
2016-10-12 19:33           ` Roman Mamedov
2016-10-12 20:33             ` Zygo Blaxell [this message]
2016-10-11 16:37 ` Austin S. Hemmelgarn
2016-10-11 17:16 ` Tomasz Kusmierz
2016-10-11 17:29 ` ronnie sahlberg
2016-10-12  1:33 ` Dan Mons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161012203323.GI26140@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=ahferroin7@gmail.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.