linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/5] btrfs: raid56: destructive RMW fix for RAID5 data profiles
Date: Mon, 14 Nov 2022 09:59:47 +0800	[thread overview]
Message-ID: <5fe30279-3816-ecfb-25e2-188ba49e6cec@suse.com> (raw)
In-Reply-To: <cover.1668384746.git.wqu@suse.com>



On 2022/11/14 08:26, Qu Wenruo wrote:
> [DESTRUCTIVE RMW]
> Btrfs (and generic) RAID56 is always vulnerable against destructive RMW.
> 
> Such destructive RMW can happen when one of the data stripe is
> corrupted, then a sub-stripe write into other data stripes happens, like
> the following:
> 
> 
>    Disk 1: data 1 |0x000000000000| <- Corrupted
>    Disk 2: data 2 |0x000000000000|
>    Disk 2: parity |0xffffffffffff|
> 
> In above case, if we write data into disk 2, we will got something like
> this:
> 
>    Disk 1: data 1 |0x000000000000| <- Corrupted
>    Disk 2: data 2 |0x000000000000| <- New '0x00' writes
>    Disk 2: parity |0x000000000000| <- New Parity.
> 
> Since the new parity is calculated using the new writes and the
> corrupted data, we lost the chance to recovery data stripe 1, and that
> corruption will forever be there.
> 
> [SOLUTION]
> This series will close the destructive RMW problem for RAID5 data
> profiles by:
> 
> - Read the full stripe before doing sub-stripe writes.
> 
> - Also fetch the data csum for the data stripes:
> 
> - Verify every data sector against their data checksum
> 
>    Now even with above corrupted data, we know there are some data csum
>    mismatch, we will have an error bitmap to record such mismatches.
>    We treat read error (like missing device) and data csum mismatch the
>    same.
> 
>    If there is no csum (NODATASUM or metadata), we just trust the data
>    unconditionally.
> 
>    So we will have an error bitmap for above corrupted case like this:
> 
>    Disk 1: data 1: |XXXXXXX|
>    Disk 2: data 2: |       |
> 
> - Rebuild just as usual
>    Since data 1 has all its sectors marked error in the error bitmap,
>    we rebuild the sectors of data stripe 1.
> 
> - Verify the repaired sectors against their csum
>    If csum matches, we can continue.
>    Or we error out.
> 
> - Continue the RMW cycle using the repaired sectors
>    Now we have correct data and will re-calculate the correct parity.
> 
> [TODO]
> - Iterate all RAID6 combinations
>    Currently we don't try all combinations of RAID6 during the repair.
>    Thus for RAID6 we treat it just like RAID5 in RMW.
> 
>    Currently the RAID6 recovery combination is only exhausted during
>    recovery path, relying on the caller the increase the mirror number.
> 
>    Can be implemented later for RMW path.
> 
> - Write back the repaired sectors
>    Currently we don't write back the repaired sectors, thus if we read
>    the corrupted sectors, we rely on the recover path, and since the new
>    parity is calculated using the recovered sectors, we can get the
>    correct data without any problem.
> 
>    But if we write back the repaired sectors during RMW, we can save the
>    reader some time without going into recovery path at all.
> 
>    This is just a performance optimization, thus I believe it can be
>    implemented later.
> 
> Qu Wenruo (5):
>    btrfs: use btrfs_dev_name() helper to handle missing devices better

Please ignore this one, this patch has already been sent as an 
independent patch.

The remaining 4 are the main patches.

Thanks,
Qu

>    btrfs: refactor btrfs_lookup_csums_range()
>    btrfs: introduce a bitmap based csum range search function
>    btrfs: raid56: prepare data checksums for later RMW data csum
>      verification
>    btrfs: raid56: do data csum verification during RMW cycle
> 
>   fs/btrfs/check-integrity.c |   2 +-
>   fs/btrfs/dev-replace.c     |  15 +--
>   fs/btrfs/disk-io.c         |   2 +-
>   fs/btrfs/extent-tree.c     |   2 +-
>   fs/btrfs/extent_io.c       |   3 +-
>   fs/btrfs/file-item.c       | 196 ++++++++++++++++++++++++++----
>   fs/btrfs/file-item.h       |   8 +-
>   fs/btrfs/inode.c           |   5 +-
>   fs/btrfs/ioctl.c           |   4 +-
>   fs/btrfs/raid56.c          | 243 ++++++++++++++++++++++++++++++++-----
>   fs/btrfs/raid56.h          |  12 ++
>   fs/btrfs/relocation.c      |   4 +-
>   fs/btrfs/scrub.c           |  28 ++---
>   fs/btrfs/super.c           |   2 +-
>   fs/btrfs/tree-log.c        |  15 ++-
>   fs/btrfs/volumes.c         |  16 +--
>   fs/btrfs/volumes.h         |   9 ++
>   17 files changed, 451 insertions(+), 115 deletions(-)
> 

  parent reply	other threads:[~2022-11-14  2:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14  0:26 [PATCH 0/5] btrfs: raid56: destructive RMW fix for RAID5 data profiles Qu Wenruo
2022-11-14  0:26 ` [PATCH 1/5] btrfs: use btrfs_dev_name() helper to handle missing devices better Qu Wenruo
2022-11-14 16:06   ` Goffredo Baroncelli
2022-11-14 22:40     ` Qu Wenruo
2022-11-14 23:54       ` David Sterba
2022-11-14  0:26 ` [PATCH 2/5] btrfs: refactor btrfs_lookup_csums_range() Qu Wenruo
2022-11-14  0:26 ` [PATCH 3/5] btrfs: introduce a bitmap based csum range search function Qu Wenruo
2022-11-14  0:26 ` [PATCH 4/5] btrfs: raid56: prepare data checksums for later RMW data csum verification Qu Wenruo
2022-11-15  6:40   ` Dan Carpenter
2022-11-15 13:18     ` David Sterba
2022-11-14  0:26 ` [PATCH 5/5] btrfs: raid56: do data csum verification during RMW cycle Qu Wenruo
2022-11-14  1:59 ` Qu Wenruo [this message]
2022-11-15 13:24 ` [PATCH 0/5] btrfs: raid56: destructive RMW fix for RAID5 data profiles David Sterba
2022-11-16  3:21   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5fe30279-3816-ecfb-25e2-188ba49e6cec@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).