Re: [PATCH 5/7] btrfs: allow buffered write to avoid full page read if it's block aligned

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 5/7] btrfs: allow buffered write to avoid full page read if it's block aligned
Date: Wed, 26 Feb 2025 07:46:07 +1030	[thread overview]
Message-ID: <cf967519-c1ee-439e-90fb-6a945296dbba@suse.com> (raw)
In-Reply-To: <CAL3q7H7dOsKqYAKvZdzOoe4DZNJ28P2neBDopme=dFYT-PNn6g@mail.gmail.com>



在 2025/2/26 04:25, Filipe Manana 写道:
> On Sun, Feb 23, 2025 at 11:47 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> Since the support of block size (sector size) < page size for btrfs,
>> test case generic/563 fails with 4K block size and 64K page size:
>>
>>      --- tests/generic/563.out   2024-04-25 18:13:45.178550333 +0930
>>      +++ /home/adam/xfstests-dev/results//generic/563.out.bad    2024-09-30 09:09:16.155312379 +0930
>>      @@ -3,7 +3,8 @@
>>       read is in range
>>       write is in range
>>       write -> read/write
>>      -read is in range
>>      +read has value of 8388608
>>      +read is NOT in range -33792 .. 33792
>>       write is in range
>>      ...
>>
>> [CAUSE]
>> The test case creates a 8MiB file, then buffered write into the 8MiB
>> using 4K block size, to overwrite the whole file.
>>
>> On 4K page sized systems, since the write range covers the full block and
>> page, btrfs will no bother reading the page, just like what XFS and EXT4
>> do.
>>
>> But 64K page sized systems, although the 4K sized write is still block
>> aligned, it's not page aligned any more, thus btrfs will read the full
>> page, causing more read than expected and fail the test case.
> 
> This part "causing more read than expected" got me puzzled, but it's explained
> in the "Fix" section below. It's more like "causing previously dirty
> blocks within the page to be zeroed out".

It's not exactly the same.

Before this patch, we will never allow a folio to be dirtied if it's not 
uptodate (unless we're dirtying the full folio).

So there is no "previously dirty blocks to be zeroed out".

I have no better idea to explain the situation here, it's really about 
if we need to read the folio before buffered write.

I'm super happy if you have a better expression here so that no one 
would be confused.

Thanks,
Qu

> 
>>
>> [FIX]
>> To skip the full page read, we need to do the following modification:
>>
>> - Do not trigger full page read as long as the buffered write is block
>>    aligned
>>    This is pretty simple by modifying the check inside
>>    prepare_uptodate_page().
>>
>> - Skip already uptodate blocks during full page read
>>    Or we can lead to the following data corruption:
>>
>>    0       32K        64K
>>    |///////|          |
>>
>>    Where the file range [0, 32K) is dirtied by buffered write, the
>>    remaining range [32K, 64K) is not.
>>
>>    When reading the full page, since [0,32K) is only dirtied but not
>>    written back, there is no data extent map for it, but a hole covering
>>    [0, 64k).
>>
>>    If we continue reading the full page range [0, 64K), the dirtied range
>>    will be filled with 0 (since there is only a hole covering the whole
>>    range).
>>    This causes the dirtied range to get lost.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Looks good, thanks.
> 
>> ---
>>   fs/btrfs/extent_io.c | 4 ++++
>>   fs/btrfs/file.c      | 5 +++--
>>   2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index b3a4a94212c9..d7240e295bfc 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -977,6 +977,10 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
>>                          end_folio_read(folio, true, cur, end - cur + 1);
>>                          break;
>>                  }
>> +               if (btrfs_folio_test_uptodate(fs_info, folio, cur, blocksize)) {
>> +                       end_folio_read(folio, true, cur, blocksize);
>> +                       continue;
>> +               }
>>                  em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached);
>>                  if (IS_ERR(em)) {
>>                          end_folio_read(folio, false, cur, end + 1 - cur);
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 00c68b7b2206..e3d63192281d 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -804,14 +804,15 @@ static int prepare_uptodate_folio(struct inode *inode, struct folio *folio, u64
>>   {
>>          u64 clamp_start = max_t(u64, pos, folio_pos(folio));
>>          u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
>> +       const u32 sectorsize = inode_to_fs_info(inode)->sectorsize;
>>          int ret = 0;
>>
>>          if (folio_test_uptodate(folio))
>>                  return 0;
>>
>>          if (!force_uptodate &&
>> -           IS_ALIGNED(clamp_start, PAGE_SIZE) &&
>> -           IS_ALIGNED(clamp_end, PAGE_SIZE))
>> +           IS_ALIGNED(clamp_start, sectorsize) &&
>> +           IS_ALIGNED(clamp_end, sectorsize))
>>                  return 0;
>>
>>          ret = btrfs_read_folio(NULL, folio);
>> --
>> 2.48.1
>>
>>

next prev parent reply	other threads:[~2025-02-25 21:16 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-23 23:46 [PATCH 0/7] btrfs: make subpage handling be feature full Qu Wenruo
2025-02-23 23:46 ` [PATCH 1/7] btrfs: prevent inline data extents read from touching blocks beyond its range Qu Wenruo
2025-02-25 15:07   ` Filipe Manana
2025-02-25 16:10     ` Filipe Manana
2025-02-23 23:46 ` [PATCH 2/7] btrfs: fix the qgroup data free range for inline data extents Qu Wenruo
2025-02-25 15:11   ` Filipe Manana
2025-02-23 23:46 ` [PATCH 3/7] btrfs: introduce a read path dedicated extent lock helper Qu Wenruo
2025-02-25 17:05   ` Filipe Manana
2025-02-25 21:12     ` Qu Wenruo
2025-02-25 23:44       ` Qu Wenruo
2025-02-26 11:07         ` Filipe Manana
2025-02-26 11:07       ` Filipe Manana
2025-02-23 23:46 ` [PATCH 4/7] btrfs: make btrfs_do_readpage() to do block-by-block read Qu Wenruo
2025-02-25 17:09   ` Filipe Manana
2025-02-23 23:46 ` [PATCH 5/7] btrfs: allow buffered write to avoid full page read if it's block aligned Qu Wenruo
2025-02-25 17:55   ` Filipe Manana
2025-02-25 21:16     ` Qu Wenruo [this message]
2025-02-23 23:46 ` [PATCH 6/7] btrfs: allow inline data extents creation if block size < page size Qu Wenruo
2025-02-25 18:03   ` Filipe Manana
2025-02-23 23:46 ` [PATCH 7/7] btrfs: remove the subpage related warning message Qu Wenruo
2025-02-25 18:10   ` Filipe Manana
2025-02-25 21:19     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf967519-c1ee-439e-90fb-6a945296dbba@suse.com \
    --to=wqu@suse.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox