public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: Nix <nix@esperi.org.uk>
To: Mukund Sivaraman <muks@mukund.org>
Cc: Wols Lists <antlists@youngman.org.uk>, linux-raid@vger.kernel.org
Subject: Re: RAID-6 and write hole with write-intent bitmap
Date: Sat, 28 Nov 2020 01:57:33 +0000	[thread overview]
Message-ID: <878samckfm.fsf@esperi.org.uk> (raw)
In-Reply-To: <20201124185004.GA27132@jurassic.vpn.mukund.org> (Mukund Sivaraman's message of "Wed, 25 Nov 2020 00:20:04 +0530")

On 24 Nov 2020, Mukund Sivaraman told this:
[...]
> (a) With RAID-5, assuming there are 4 member disks A, B, C, D, a write
> operation with its data on disk A and stripe's parity on disk B may
> involve:
>
> 1. a read of the stripe
> 2. update of data on A
> 3. computation and update of parity A^C^D on B
>
> These are not atomic updates. If power is lost between steps 2 and 3,

The writes usually proceed in parallel (because anything else would be
abominably slow). But... the problem is that the writes to the component
disks are also not atomic, and will likely not proceed at the same
rates: only with spindle-synched drives is there anything like a
guarantee of that, and those have been unobtainable for decades. So a
power loss could well lead to 500 sectors of the stripe written on disk
A, 430 sectors written on disk B... and the sectors between sector 430
and 500 are not consistent. (Disk C might well be up around sector 600,
disk D around sector 450 and there's no *way* mere parity or RAID 6
syndromes can recover from the wildly-varying mess between sectors 430
and 600... it's not like it gets recorded anywhere where a disk write
got up to before the power went out, either. But the journal avoids this
in the usual fashion for a journal, by writing out the whole thing first
and committing it to stable storage, so that on restart the incomplete
writes can just be replayed.)

-- 
NULL && (void)

  parent reply	other threads:[~2020-11-28  1:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24  7:20 RAID-6 and write hole with write-intent bitmap Mukund Sivaraman
2020-11-24 10:10 ` Wols Lists
2020-11-24 18:50   ` Mukund Sivaraman
2020-11-24 20:16     ` Piergiorgio Sartor
2020-11-24 21:30     ` antlists
2020-11-28  1:57     ` Nix [this message]
2020-11-28  1:51   ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878samckfm.fsf@esperi.org.uk \
    --to=nix@esperi.org.uk \
    --cc=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=muks@mukund.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox