Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Johannes Thumshirn <jth@kernel.org>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	"open list:BTRFS FILE SYSTEM" <linux-btrfs@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Cc: WenRuo Qu <wqu@suse.com>, Naohiro Aota <Naohiro.Aota@wdc.com>
Subject: Re: [PATCH] btrfs: also add stripe entries for NOCOW writes
Date: Mon, 23 Sep 2024 18:23:58 +0930	[thread overview]
Message-ID: <47d0f003-ebc8-4959-a22c-ccf9ea7ef35a@gmx.com> (raw)
In-Reply-To: <aacb742c-2081-441e-ac52-d9e0f580ab1e@wdc.com>



在 2024/9/23 17:45, Johannes Thumshirn 写道:
> On 23.09.24 09:56, Qu Wenruo wrote:
>>>> 在 2024/9/23 16:15, Johannes Thumshirn 写道:
>>>>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>>
>>>>> NOCOW writes do not generate stripe_extent entries in the RAID stripe
>>>>> tree, as the RAID stripe-tree feature initially was designed with a
>>>>> zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW
>>>>> writes. But the RAID stripe-tree feature is independent from the zoned
>>>>> feature, so we must also allow NOCOW writes for zoned filesystems.
>>>>>
>>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>>
>>>> Sorry I'm going to repeat myself again, I still believe if we insert an
>>>> RST entry at falloc() time, it will be more consistent with the non-RST
>>>> code.
>>>>
>>>> Yes, I known preallocated space will not need any read nor search RST
>>>> entry, and we just fill the page cache with zero at read time.
>>>>
>>>> But the point of proper (not just dummy) RST entry for the whole
>>>> preallocated space is, we do not need to touch the RST entry anymore for
>>>> NOCOW/PREALLOCATED write at all.
>>>>
>>>> This makes the RST NOCOW/PREALLOC writes behavior to align with the
>>>> non-RST code, which doesn't update any extent item, but only modify the
>>>> file extent for PREALLOC writes.
>>>
>>> Please re-read the patch. This is not a dummy RST entry but a real RST
>>> entry for NOCOW writes.
>>>
>> I know, but my point is, if the RST entry for preallocated range is
>> already a regular one, you won't even need to insert/update the RST tree
>> at all.
>>
>> Just like we do not need to update the extent tree for
>> NOCOW/PREALLOCATED writes.
>
> But as long as there is no data on disk there is no point of having a
> logical to N-disk physical mapping. We haven't even called
> btrfs_map_block() before, so where do we know which disks will get the
> blocks at which address? Look at this example:
>
> Fallocate [0, 1M]
> virtme-scsi:/mnt # xfs_io -fc "falloc 0 1M" -c sync test
[...]
>
>
> [1] we preallocate the data for [0, 1M] @ 298844160
> [2] we have the actual written data for [64k, 128k] @ 298844160
>
> What should I do to insert the RST entry there as we get:

Do the needed write map and insert the RST entries, just as if we're
doing a write, but without any actual IO.


Currently the handling of RST is not consistent with non-RST, thus
that's the reason causing problems with scrub on preallocated extents.

I known preallocated range won't trigger any read thus it makes no sense
to do the proper RST setup.
But let's also take the example of non-RST scrub:

Do we spend our time checking if a data extent is preallocated or not?
No, we do not. We just read the content, and check against its csum.
For preallocated extents, it just has no csum.

You can argue that this is wasting IO, but I can also counter-argue that
we need to make sure the read on that device range is fine, even if we
know we will not really read the content (but that's only for now).


Furthermore, from the logical aspect, at least to me, non-RST case is
just a special case where logical address is 1:1 mapped already.

This also means, even for preallocated extents, they should have RST
entries.


Finally, I do not think it's a good idea to insert RST entries for NOCOW.
If a file is set NOCOW, it means we'll doing a lot of overwrite for it.
Then why waste our time updating the RST entries again and again?

Isn't such behavior going to cause more write amplification? Meanwhile
for non-RST cases, NOCOW should cause the least amount of write
amplification.



So again, YES, I know doing proper write map and inserting RST entries
for preallocated range makes no sense for READ.
But preallocation and NOCOW is utilized for contents we frequently
over-writes, and such RST entries save us more for those frequently
over-writes.

Thanks,
Qu
>
> [3] the physical copies starting at 298909696 on devid 1 and 277938176
> on devid 2
>
>


  reply	other threads:[~2024-09-23  8:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-23  6:45 [PATCH] btrfs: also add stripe entries for NOCOW writes Johannes Thumshirn
2024-09-23  7:28 ` Qu Wenruo
2024-09-23  7:40   ` Johannes Thumshirn
2024-09-23  7:56     ` Qu Wenruo
2024-09-23  8:15       ` Johannes Thumshirn
2024-09-23  8:53         ` Qu Wenruo [this message]
2024-09-23 14:41           ` Johannes Thumshirn
2024-09-23 22:23             ` Qu Wenruo
2024-09-23 15:20   ` Josef Bacik
2024-09-23 22:32     ` Qu Wenruo
2024-09-24  0:46       ` Qu Wenruo
2024-09-24  7:07 ` Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47d0f003-ebc8-4959-a22c-ccf9ea7ef35a@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jth@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox