From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Boris Burkov <boris@bur.io>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v4 1/2] btrfs: make btrfs_truncate_block() to zero involved blocks in a folio
Date: Thu, 24 Apr 2025 10:24:22 +0930 [thread overview]
Message-ID: <c01585b5-cca2-47b7-8798-a1ac8fb20de3@gmx.com> (raw)
In-Reply-To: <aAmCU2BDFmEmX0mv@devvm12410.ftw0.facebook.com>
在 2025/4/24 09:44, Boris Burkov 写道:
> On Thu, Apr 24, 2025 at 06:56:31AM +0930, Qu Wenruo wrote:
[...]
>> Such behavior is only exposed when page size is larger than fs block
>> btrfs, as for block size == page size case the block is exactly one
>> page, and fsx only checks exactly one page at EOF.
>
> This wording struck me as a little weird. What fsx does and doesn't
> check shouldn't really matter, compared to the expected/desired behavior
> of the various APIs.
Sure, I'll remove the fsx part.
>
> Other than that, very nice debug and thanks for the clear explanation.
>
>>
>> [FIX]
>> Enhance btrfs_truncate_block() by:
>>
>> - Force callers to pass a @start/@end combination
>> So that there will be no 0 length passed in.
>>
>> - Rename the @front parameter to an enum
>> And make it matches the @start/@end parameter better by using
>> TRUNCATE_HEAD_BLOCK and TRUNCATE_TAIL_BLOCK instead.
>
> Do I understand that you are not just renaming front to an enum, but
> changing the meaning to be selecting a block to do the dirty/writeback
> thing on? That is not really as clear as it could be from the commit
> message/comments.
The original code is hiding the block selection by doing all it by the
caller.
E.g. If we have something like this:
0 4K 8K 12K 16K
| |///|///////|//////|///| |
2K 14K
The original code do the truncation by calling with @from = 2K, @front =
false, and @from=14K, @front = true.
But that only handles the block at range [0, 4k) and [12K, 16K), on a
64K page, it will not tell exact range we can zero.
If we just zero everything past 2K until page end, then the contents at
[14K, 64K) will be zeroed incorrectly.
Thus we need the correct range we're really trunacting, to avoid
over/under zeroing.
Normally with the full truncating range passed in, we still need to know
what exact block we're truncating, because for each range there are
always the heading and tailing blocks.
But since you're already mentioning this, it looks like the old @from is
still a better solution, and since we're already passing the original
range into the function, we should be able to detect just based on @from
and original ranges.
E.g. for above case, if we want to truncate [2k, 4K), we can just pass
@from = 2K, @range = [2K, 14K).
And if we want to trunacte [12K, 14K), pass @from = 14K, range=[2K, 14K).
>
>>
>> - Pass the original unmodified range into btrfs_truncate_block()
>> There are several call sites inside btrfs_zero_range() and
>> btrfs_punch_hole() where we pass part of the original range for
>> truncating.
>
> The definition of "original range" is very murky to me. It seems
> specific to the hole punching code (and maybe some of the other
> callsites).
Yes, it's specific to hole punching and zero range, which all modify
their internal cursors.
But if we go the above purposal, using the full range to replace @front
parameter, it may be a little easier to understand.
>> @@ -2656,8 +2657,9 @@ static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len)
>
> (not visible in the formatted patch) but there is a comment just above
> this change about how we don't need to truncate past EOF. Can you check
> it and update it if it is wrong now?
Indeed I need to update that part to mention the mmap write cases.
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 538a9ec86abc..6f910c056819 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4765,15 +4765,16 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
>> *
>> * @inode - inode that we're zeroing
>> * @from - the offset to start zeroing
>> - * @len - the length to zero, 0 to zero the entire range respective to the
>> - * offset
>
> orig_start/orig_end are missing. This adds to the confusion about their
> definition/semantics I mentioned above.
>
> More generally, I think this change is a huge change to the semantics of
> this function, which used to operate on only a block but now operates on
> a range. I think that merits a full rewrite of the doc to fully
> represent what the new interface is.
Will do that.
>>
>> - block_start = round_down(from, blocksize);
>> + if (where == BTRFS_TRUNCATE_HEAD_BLOCK)
>> + block_start = round_down(from, blocksize);
>> + else
>> + block_start = round_down(end, blocksize);
>
> I don't think I get why we would want to change which block we are doing
> the cow stuff to, if the main focus is zeroing the full original range.
E.g. We want to truncate range [2K, 6K], it covers two blocks:
0 2K 4K 6K 8K
| |//////////////////////| |
If we're handling the HEAD block, aka for block [0, 4K), the block start
should be round_down(2K).
If we're handling the tail block, aka for block [4k, 8k), the block
start should be round_down(6K).
The original code move the handling to the caller, but I believe the new
purposed parameters can handle it a little easier.
>
> Apologies if I am being dense on that :)
No worries, that's more or less expected because I already know the new
parameters are very ugly, just can not come up a better one at the time
of writing.
Thanks,
Qu
next prev parent reply other threads:[~2025-04-24 0:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-23 21:26 [PATCH v4 0/2] btrfs: fix corner cases for subpage generic/363 failures Qu Wenruo
2025-04-23 21:26 ` [PATCH v4 1/2] btrfs: make btrfs_truncate_block() to zero involved blocks in a folio Qu Wenruo
2025-04-24 0:14 ` Boris Burkov
2025-04-24 0:54 ` Qu Wenruo [this message]
2025-04-23 21:26 ` [PATCH v4 2/2] btrfs: make btrfs_truncate_block() zero folio range for certain subpage corner cases Qu Wenruo
2025-04-24 0:27 ` Boris Burkov
2025-04-24 0:56 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c01585b5-cca2-47b7-8798-a1ac8fb20de3@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox