From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
Johannes Thumshirn <jth@kernel.org>, Chris Mason <clm@fb.com>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
"open list:BTRFS FILE SYSTEM" <linux-btrfs@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>
Cc: WenRuo Qu <wqu@suse.com>, Naohiro Aota <Naohiro.Aota@wdc.com>
Subject: Re: [PATCH] btrfs: also add stripe entries for NOCOW writes
Date: Tue, 24 Sep 2024 07:53:50 +0930 [thread overview]
Message-ID: <f6ae39fd-ee30-4e22-8d0d-6dec5c3bd192@gmx.com> (raw)
In-Reply-To: <887a09bc-3c98-4bd1-aa31-0732fc633315@wdc.com>
在 2024/9/24 00:11, Johannes Thumshirn 写道:
> On 23.09.24 10:54, Qu Wenruo wrote:
>>
>>
[...]
>> Finally, I do not think it's a good idea to insert RST entries for NOCOW.
>> If a file is set NOCOW, it means we'll doing a lot of overwrite for it.
>> Then why waste our time updating the RST entries again and again?
>>
>> Isn't such behavior going to cause more write amplification? Meanwhile
>> for non-RST cases, NOCOW should cause the least amount of write
>> amplification.
>
> The whole idea behind the RST was to write the RST entries _after_ the
> data has been persisted to disk. Otherwise we're back at the write hole
> problem. See for example this imaginary sequence:
>
> Preallocate a range. This will then also preallocate the RST entries
> with the mapping as you describe. Write to it and while you write you
> have a powerloss. The copy/stripe to disk 1 is correctly written but
> disk 2 didn't report back before the power loss happened.
> After we have
> power again, a read to disk 2 comes in, as we have a RST entry, the read
> will be directed to the broken entry and garbage is returned. And this
> is the good case, as we can repair it.
> If it was an overwrite of a block and the same happens, we have a RST
> entry pointing to a good and a bad copy.
Nope, that will not happen.
Because our metadata is still COW protected, after such powerloss, the
file extent is still showing that range is PREALLOCATED, we won't even
trigger a read.
And this is exactly the same as the non-RST PREALLOCATED write.
>
> Once we're adding the RST entries after both writes succeed the problem
> isn't there. So for preallocated extents it is even harmful to add a RST
> entry.
You just forgot the metadata part, which prevents the problem from
happening in the very beginning.
Thanks,
Qu
next prev parent reply other threads:[~2024-09-23 22:34 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-23 6:45 [PATCH] btrfs: also add stripe entries for NOCOW writes Johannes Thumshirn
2024-09-23 7:28 ` Qu Wenruo
2024-09-23 7:40 ` Johannes Thumshirn
2024-09-23 7:56 ` Qu Wenruo
2024-09-23 8:15 ` Johannes Thumshirn
2024-09-23 8:53 ` Qu Wenruo
2024-09-23 14:41 ` Johannes Thumshirn
2024-09-23 22:23 ` Qu Wenruo [this message]
2024-09-23 15:20 ` Josef Bacik
2024-09-23 22:32 ` Qu Wenruo
2024-09-24 0:46 ` Qu Wenruo
2024-09-24 7:07 ` Naohiro Aota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f6ae39fd-ee30-4e22-8d0d-6dec5c3bd192@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=Naohiro.Aota@wdc.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=jth@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox