public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v5 0/2] btrfs: fix beyond EOF truncation for subpage generic/363 failures
Date: Sat, 26 Apr 2025 08:06:48 +0930	[thread overview]
Message-ID: <cover.1745619801.git.wqu@suse.com> (raw)

[CHANGELOG]
v5:
- Shrink the parameter list for btrfs_truncate_block()
  Remove the @front and @len, instead passing a new pair of @start/@end,
  so that we can determine if @from is in the head or tail block,
  thus no need for @front.

  This will give callers more freedom (a little too much),
  e.g. for the following zero range/hole punch case:

    Page size is 64K, fs block size is 4K.
    Truncation range is [6K, 58K).

    0        8K                32K                  56K      64K
    |      |/|//////////////////////////////////////|/|      |
           6K                                         58K

    To truncate the first block to zero out range [6K, 8K),
    caller can pass @from = 6K, @start = 6K, @end = 58K - 1.
    In fact, any @from inside range [6K, 8K) will work.

    To truncate the last block to zero out range [56K, 58K),
    caller can pass @from=58K - 1, @start = 6K, @end = 58K -1.
    Any @from inside range [56K, 58K) will also work.

    Furthermore, if aligned @from is passed in, e.g. 8K,
    btrfs_truncate_block() will detect that there is nothing to do,
    and exit properly.

- Only do the extra zeroing if we're truncating beyond EOF
  Especially for the recent large folios support, we can do a lot of
  unnecessary zeroing for a very large folio.

- Remove the lock-wait-retry loop if we're doing aligned truncation
  beyond EOF
  Since it's already EOF, there is no need to wait for the OE anyway.

v4:
- Rebased to the latest for-next branch
  btrfs_free_extent_map() renames cause a minor conflict in the first
  patch.

v3:
- Fix a typo where @block_size should @blocksize.
  There is a global function, block_size(), thus this typo will cause
  type conflicts inside round_down()/round_up().

v2:
- Fix a conversion bug in the first patch that leads to generic/008
  failure on x86_64
  The range is passed incorrectly and caused btrfs_truncate_block() to
  incorrectly skip an unaligned range.

Test case generic/363 always fail on subpage (fs block fs < page size)
btrfses, there are mostly two kinds of problems here:

All examples are based on 64K page size and 4K fs block size.

1) EOF is polluted and btrfs_truncate_block() only zeros the block that
   needs to be written back

   
   0                           32K                           64K
   |                           |              |GGGGGGGGGGGGGG|
                                              50K EOF
   The original file is 50K sized (not 4K aligned), and fsx polluted the
   range beyond EOF through memory mapped write.
   And since memory mapped write is page based, and our page size is
   larger than block size, the page range [0, 64K) covere blocks beyond
   EOF.

   Those polluted range will not be written back, but will still affect
   our page cache.

   Then some operation happens to expand the inode to size 64K.

   In that case btrfs_truncate_block() is called to trim the block
   [48K, 52K), and that block will be marked dirty for written back.

   But the range [52K, 64K) is untouched at all, left the garbage
   hanging there, triggering `fsx -e 1` failure.

   Fix this case by force btrfs_truncate_block() to zeroing any involved
   blocks. (Meanwhile still only one block [48K, 52K) will be written
   back)

2) EOF is polluted and the original size is block aligned so
   btrfs_truncate_block() does nothing

   0                           32K                           64K
   |                           |                |GGGGGGGGGGGG|
                                                52K EOF

   Mostly the same as case 1, but this time since the inode size is
   block aligned, btrfs_truncate_block() will do nothing.

   Leaving the garbage range [52K, 64K) untouched and fail `fsx -e 1`
   runs.

   Fix this case by force btrfs_truncate_block() to zeroing any involved
   blocks when the btrfs is subpage and the range is aligned.
   This will not cause any new dirty blocks, but purely zeroing out EOF
   to pass `fsx -e 1` runs.

Qu Wenruo (2):
  btrfs: handle unaligned EOF truncation correctly for subpage cases
  btrfs: handle aligned EOF truncation correctly for subpage cases

 fs/btrfs/btrfs_inode.h |   3 +-
 fs/btrfs/file.c        |  34 ++++-----
 fs/btrfs/inode.c       | 152 ++++++++++++++++++++++++++++++++++-------
 3 files changed, 147 insertions(+), 42 deletions(-)

-- 
2.49.0


             reply	other threads:[~2025-04-25 22:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25 22:36 Qu Wenruo [this message]
2025-04-25 22:36 ` [PATCH v5 1/2] btrfs: handle unaligned EOF truncation correctly for subpage cases Qu Wenruo
2025-05-06 17:29   ` Boris Burkov
2025-04-25 22:36 ` [PATCH v5 2/2] btrfs: handle aligned " Qu Wenruo
2025-05-06 17:25   ` Boris Burkov
2025-05-05 15:33 ` [PATCH v5 0/2] btrfs: fix beyond EOF truncation for subpage generic/363 failures David Sterba
2025-05-06  0:05   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1745619801.git.wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox