All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Goffredo Baroncelli <kreijack@inwind.it>
Cc: Christoph Anton Mitterer <calestyo@scientia.net>,
	linux-btrfs@vger.kernel.org
Subject: Re: Status of RAID5/6
Date: Sat, 31 Mar 2018 03:43:22 -0400	[thread overview]
Message-ID: <20180331074320.GF2446@hungrycats.org> (raw)
In-Reply-To: <b4d5bb24-e8d0-dc1b-94d2-4e7f9a292630@inwind.it>

[-- Attachment #1: Type: text/plain, Size: 2122 bytes --]

On Sat, Mar 31, 2018 at 08:57:18AM +0200, Goffredo Baroncelli wrote:
> On 03/31/2018 07:03 AM, Zygo Blaxell wrote:
> >>> btrfs has no optimization like mdadm write-intent bitmaps; recovery
> >>> is always a full-device operation.  In theory btrfs could track
> >>> modifications at the chunk level but this isn't even specified in the
> >>> on-disk format, much less implemented.
> >> It could go even further; it would be sufficient to track which
> >> *partial* stripes update will be performed before a commit, in one
> >> of the btrfs logs. Then in case of a mount of an unclean filesystem,
> >> a scrub on these stripes would be sufficient.
> 
> > A scrub cannot fix a raid56 write hole--the data is already lost.
> > The damaged stripe updates must be replayed from the log.
> 
> Your statement is correct, but you doesn't consider the COW nature of btrfs.
> 
> The key is that if a data write is interrupted, all the transaction
> is interrupted and aborted. And due to the COW nature of btrfs, the
> "old state" is restored at the next reboot.

This is not presently true with raid56 and btrfs.  RAID56 on btrfs uses
RMW operations which are not COW and don't provide any data integrity
guarantee.  Old data (i.e. data from very old transactions that are not
part of the currently written transaction) can be destroyed by this.

> What is needed in any case is rebuild of parity to avoid the
> "write-hole" bug. And this is needed only for a partial stripe
> write. For a full stripe write, due to the fact that the commit is
> not flushed, it is not needed the scrub at all.
> 
> Of course for the NODATACOW file this is not entirely true; but I
> don't see the gain to switch from the cost of COW to the cost of a log.
> 
> The above sentences are correct (IMHO) if we don't consider a power
> failure+device missing case. However in this case even logging the
> "new data" would be not sufficient.
> 
> BR
> G.Baroncelli
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2018-03-31  7:43 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 16:50 Status of RAID5/6 Menion
2018-03-21 17:24 ` Liu Bo
2018-03-21 20:02   ` Christoph Anton Mitterer
2018-03-22 12:01     ` Austin S. Hemmelgarn
2018-03-29 21:50     ` Zygo Blaxell
2018-03-30  7:21       ` Menion
2018-03-31  4:53         ` Zygo Blaxell
2018-03-30 16:14       ` Goffredo Baroncelli
2018-03-31  5:03         ` Zygo Blaxell
2018-03-31  6:57           ` Goffredo Baroncelli
2018-03-31  7:43             ` Zygo Blaxell [this message]
2018-03-31  8:16               ` Goffredo Baroncelli
     [not found]                 ` <28a574db-0f74-b12c-ab5f-400205fd80c8@gmail.com>
2018-03-31 14:40                   ` Zygo Blaxell
2018-03-31 22:34             ` Chris Murphy
2018-04-01  3:45               ` Zygo Blaxell
2018-04-01 20:51                 ` Chris Murphy
2018-04-01 21:11                   ` Chris Murphy
2018-04-02  5:45                     ` Zygo Blaxell
2018-04-02 15:18                       ` Goffredo Baroncelli
2018-04-02 15:49                         ` Austin S. Hemmelgarn
2018-04-02 22:23                           ` Zygo Blaxell
2018-04-03  0:31                             ` Zygo Blaxell
2018-04-03 17:03                               ` Goffredo Baroncelli
2018-04-03 22:57                                 ` Zygo Blaxell
2018-04-04  5:15                                   ` Goffredo Baroncelli
2018-04-04  6:01                                     ` Zygo Blaxell
2018-04-04 21:31                                       ` Goffredo Baroncelli
2018-04-04 22:38                                         ` Zygo Blaxell
2018-04-04  3:08                                 ` Chris Murphy
2018-04-04  6:20                                   ` Zygo Blaxell
2018-03-21 20:27   ` Menion
2018-03-22 21:13   ` waxhead

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180331074320.GF2446@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=calestyo@scientia.net \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.