From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH RFC 0/5] btrfs: raid56: do full stripe data checksum verification and recovery at RMW time
Date: Fri, 14 Oct 2022 15:17:08 +0800 [thread overview]
Message-ID: <cover.1665730948.git.wqu@suse.com> (raw)
There is a long existing problem for all RAID56 (not only btrfs RAID56),
that if we have corrupted data on-disk, and we do a RMW using that
corrupted data, we lose the chance of recovery.
Since new parity is calculated using the corrupted sector, we will never
be able to recovery the old data.
However btrfs has data checksum to save the day here, if we do a full
scrub level verification at RMW time, we can detect the corrupted data
before we do any write.
Then do the proper recovery, data checksum recheck, and recovery the old
data and call it a day.
This series is going to add such ability, currently there are the
following limitations:
- Only works for full stripes without a missing device
The code base is already here for a missing device + a corrupted
sector case of RAID6.
But for now, I don't really want to test RAID6 yet.
- We only handles data checksum verification
Metadata verification will be much more complex, and in the long run
we will only recommend metadata RAID1C3/C4 profiles to compensate
RAID56 data profiles.
Thus we may never support metadata verification for RAID56.
- If we found corrupted sectors which can not be repaired, we fail
the whole bios for the full stripe
This is to avoid further data corruption, but in the future we may
just continue with corrupte data.
This will need extra work to rollback to the original corrupte data
(as the recovery process will change the content).
- Way more overhead for substripe write RMW cycle
Now we need to:
* Fetch the datacsum for the full stripe
* Verify the datacsum
* Do RAID56 recovery (if needed)
* Verify the recovered data (if needed)
Thankfully this only affects uncached sub-stripe writes.
The successfully recovered data can be cached for later usage.
- Will not writeback the recovered data during RMW
Thus we still need to go back to recovery path to recovery.
This can be later enhanced to let RMW to write the full stripe if
we did some recovery during RMW.
- May need further refactor to change how we handle RAID56 workflow
Currently we use quite some workqueues to handle RAID56, and all
work are delayed.
This doesn't look sane to me, especially hard to read (too many jumps
just for a full RMW cycle).
May be a good idea to make it into a submit-and-wait workflow.
[REASON for RFC]
Although the patchset does not only passed RAID56 test groups, but also
passed my local destructive RMW test cases, some of the above limitations
need to be addressed.
And whther the trade-off is worthy may still need to be discussed.
Qu Wenruo (5):
btrfs: refactor btrfs_lookup_csums_range()
btrfs: raid56: refactor __raid_recover_end_io()
btrfs: introduce a bitmap based csum range search function
btrfs: raid56: prepare data checksums for later sub-stripe
verification
btrfs: raid56: do full stripe data checksum verification before RMW
fs/btrfs/ctree.h | 8 +-
fs/btrfs/file-item.c | 196 ++++++++++++--
fs/btrfs/inode.c | 6 +-
fs/btrfs/raid56.c | 608 +++++++++++++++++++++++++++++++-----------
fs/btrfs/raid56.h | 12 +
fs/btrfs/relocation.c | 4 +-
fs/btrfs/scrub.c | 8 +-
fs/btrfs/tree-log.c | 16 +-
8 files changed, 664 insertions(+), 194 deletions(-)
--
2.37.3
next reply other threads:[~2022-10-14 7:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-14 7:17 Qu Wenruo [this message]
2022-10-14 7:17 ` [PATCH RFC 1/5] btrfs: refactor btrfs_lookup_csums_range() Qu Wenruo
2022-10-14 7:17 ` [PATCH RFC 2/5] btrfs: raid56: refactor __raid_recover_end_io() Qu Wenruo
2022-10-14 7:17 ` [PATCH RFC 3/5] btrfs: introduce a bitmap based csum range search function Qu Wenruo
2022-10-14 7:17 ` [PATCH RFC 4/5] btrfs: raid56: prepare data checksums for later sub-stripe verification Qu Wenruo
2022-10-14 7:17 ` [PATCH RFC 5/5] btrfs: raid56: do full stripe data checksum verification before RMW Qu Wenruo
2022-10-25 14:21 ` David Sterba
2022-10-25 23:31 ` Qu Wenruo
2022-10-25 13:48 ` [PATCH RFC 0/5] btrfs: raid56: do full stripe data checksum verification and recovery at RMW time David Sterba
2022-10-25 23:30 ` Qu Wenruo
2022-10-26 13:19 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1665730948.git.wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).