From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/5] btrfs: raid56: destructive RMW fix for RAID5 data profiles
Date: Wed, 16 Nov 2022 11:21:15 +0800 [thread overview]
Message-ID: <33e40e7c-5b9a-62e7-457c-24f85808d189@gmx.com> (raw)
In-Reply-To: <20221115132452.GH5824@twin.jikos.cz>
On 2022/11/15 21:24, David Sterba wrote:
> On Mon, Nov 14, 2022 at 08:26:29AM +0800, Qu Wenruo wrote:
>> [DESTRUCTIVE RMW]
>> Btrfs (and generic) RAID56 is always vulnerable against destructive RMW.
>>
>> Such destructive RMW can happen when one of the data stripe is
>> corrupted, then a sub-stripe write into other data stripes happens, like
>> the following:
>>
>>
>> Disk 1: data 1 |0x000000000000| <- Corrupted
>> Disk 2: data 2 |0x000000000000|
>> Disk 2: parity |0xffffffffffff|
>>
>> In above case, if we write data into disk 2, we will got something like
>> this:
>>
>> Disk 1: data 1 |0x000000000000| <- Corrupted
>> Disk 2: data 2 |0x000000000000| <- New '0x00' writes
>> Disk 2: parity |0x000000000000| <- New Parity.
>>
>> Since the new parity is calculated using the new writes and the
>> corrupted data, we lost the chance to recovery data stripe 1, and that
>> corruption will forever be there.
> ...
>> [TODO]
>> - Iterate all RAID6 combinations
>> Currently we don't try all combinations of RAID6 during the repair.
>> Thus for RAID6 we treat it just like RAID5 in RMW.
>>
>> Currently the RAID6 recovery combination is only exhausted during
>> recovery path, relying on the caller the increase the mirror number.
>>
>> Can be implemented later for RMW path.
>>
>> - Write back the repaired sectors
>> Currently we don't write back the repaired sectors, thus if we read
>> the corrupted sectors, we rely on the recover path, and since the new
>> parity is calculated using the recovered sectors, we can get the
>> correct data without any problem.
>>
>> But if we write back the repaired sectors during RMW, we can save the
>> reader some time without going into recovery path at all.
>>
>> This is just a performance optimization, thus I believe it can be
>> implemented later.
>
> Even with the todo and potential performance drop due to mandatory
> stripe caching I think this is worth merging at this point, so patches
> are in misc-next now.
>
> Regarding the perf drop, I'll try to get some results besides functional
> testing but if anybody with a decent setup for raid5 can do some
> destructive tests it would be most welcome.
>
> Fallback plan is to revert the last patch if it turns out to be too
> problematic.
If you're concerned about the fallback plan, I'd say we need to at least
drop the last two patches.
Sure, dropping the last one will remove the verification and the extra
reads, but unfortunately we will still fetch the csum for RMW cycles.
Although the csum fetch would at most cost 2 leaves, the cost is still
there.
Thus if needed, I can do more refactoring of the series, to separate the
final patch.
Although I hope we don't need to go that path.
Thanks,
Qu
prev parent reply other threads:[~2022-11-16 3:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-14 0:26 [PATCH 0/5] btrfs: raid56: destructive RMW fix for RAID5 data profiles Qu Wenruo
2022-11-14 0:26 ` [PATCH 1/5] btrfs: use btrfs_dev_name() helper to handle missing devices better Qu Wenruo
2022-11-14 16:06 ` Goffredo Baroncelli
2022-11-14 22:40 ` Qu Wenruo
2022-11-14 23:54 ` David Sterba
2022-11-14 0:26 ` [PATCH 2/5] btrfs: refactor btrfs_lookup_csums_range() Qu Wenruo
2022-11-14 0:26 ` [PATCH 3/5] btrfs: introduce a bitmap based csum range search function Qu Wenruo
2022-11-14 0:26 ` [PATCH 4/5] btrfs: raid56: prepare data checksums for later RMW data csum verification Qu Wenruo
2022-11-15 6:40 ` Dan Carpenter
2022-11-15 13:18 ` David Sterba
2022-11-14 0:26 ` [PATCH 5/5] btrfs: raid56: do data csum verification during RMW cycle Qu Wenruo
2022-11-14 1:59 ` [PATCH 0/5] btrfs: raid56: destructive RMW fix for RAID5 data profiles Qu Wenruo
2022-11-15 13:24 ` David Sterba
2022-11-16 3:21 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33e40e7c-5b9a-62e7-457c-24f85808d189@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).