[PATCH v2 0/5] btrfs: add the missing preparations exposed by initial large data folio support

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 0/5] btrfs: add the missing preparations exposed by initial large data folio support
Date: Sat, 29 Mar 2025 19:49:35 +1030	[thread overview]
Message-ID: <cover.1743239672.git.wqu@suse.com> (raw)

[CHANGELOG]
v2:
- Rebased to the latest misc-next
  As there are some conflicts regarding to:
  * parameter list indent
  * iov_iter parameter name

- Fix a busy loop caused by underflowing check_range_has_page() parameters
  This is only happening for fs block size < page size cases, e.g.
  the fs block size is 4K, page size is 64K.
  And the lockend is (4K - 1).

  The lockend in btrfs_punch_hole_lock_range() will be (u64)-1, as
  round_down(lockend + 1, PAGE_SIZE) leads to 0, resulting
  btrfs_punch_hole_lock_range() to do a busy loop waiting for all folios
  to be purged.

With all the existing preparations, finally I can enable and start
initial testing (aka, short fsstress loops).

To no one's surprise, it immeidately exposed several problems:

- A critical subpage helper is still using offset_in_page()
  Which causes incorrect start bit calculation and triggered the
  ASSERT() inside subpage locking.

  Fixed in the first patch.

- Fix a dead busy loop in btrfs_punch_hole_lock_range()
  This is only happening when fs block size < page size.

  @page_lockend can underflow, causing it to be -1, and
  btrfs_punch_hole_lock_range() will busy loop waiting for all the
  pages of an inode to be dropped, meanwhile
  truncate_pagecache_range() is only called for the range inside the
  first page (aka, doing nothing).

  Fixed in the second patch.

- Buffered write can not shrink reserved space properly
  Since we're reserving data and metadata space before grabbing the
  folio, we have to reserved as much space as possible, other than just
  reserving space for the range inside the page.

  If the folio we got is smaller than what we expect, we have to shrink
  the reserved space, but things like btrfs_delalloc_release_extents()
  can not handle it.

  Fixed in the third patch, with a new helper
  btrfs_delalloc_shrink_extents().

  This will also be a topic in in the iomap migration, iomap goes
  valid_folio() callbacks to make sure no extent map is changed during
  our buffered write, thus they can reserve a large range of space,
  other than our current over-reserve-then-shrink.

  Our behavior is super safe, but less optimized compared to iomap.

- Buffered write is not utilizing large folios
  Since buffered write is the main entrance to allocate large folios,
  without its support there will be no large folios at all.

  Addressed in the forth patch.

- A dead busy loop inside btrfs_punch_hole_lock_range()
  It turns out that the usage of filemap_range_has_page() is never a
  good idea for large folios, as we can easily hit the following case:

          start                            end
          |                                |
    |//|//|//|//|  |  |  |  |  |  |  |  |//|//|
     \         /                         \   /
      Folio A                            Folio B

  Fixed in the last patch, with a helper check_range_has_page() to do
  the check with large folios in mind.

  This will also be a topic in the iomap migration, as our zero range
  behavior is quite different from the iomap one, and the
  filemap_range_has_page() behavior looks a little overkilled to me.

I'm pretty sure there will be more hidden bugs after I throw the whole
fstests to my local branch, but that's all the bugs I have so far.

Qu Wenruo (5):
  btrfs: subpage: fix a bug that blocks large folios
  btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range()
  btrfs: refactor how we handle reserved space inside copy_one_range()
  btrfs: prepare btrfs_buffered_write() for large data folios
  btrfs: prepare btrfs_punch_hole_lock_range() for large data folios

 fs/btrfs/delalloc-space.c |  24 +++++
 fs/btrfs/delalloc-space.h |   3 +-
 fs/btrfs/file.c           | 203 ++++++++++++++++++++++++++++----------
 fs/btrfs/subpage.c        |   2 +-
 4 files changed, 177 insertions(+), 55 deletions(-)

-- 
2.49.0

next             reply	other threads:[~2025-03-29  9:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-29  9:19 Qu Wenruo [this message]
2025-03-29  9:19 ` [PATCH v2 1/5] btrfs: subpage: fix a bug that blocks large folios Qu Wenruo
2025-03-29  9:19 ` [PATCH v2 2/5] btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range() Qu Wenruo
2025-03-31 11:24   ` Filipe Manana
2025-03-29  9:19 ` [PATCH v2 3/5] btrfs: refactor how we handle reserved space inside copy_one_range() Qu Wenruo
2025-03-29  9:19 ` [PATCH v2 4/5] btrfs: prepare btrfs_buffered_write() for large data folios Qu Wenruo
2025-03-29  9:19 ` [PATCH v2 5/5] btrfs: prepare btrfs_punch_hole_lock_range() " Qu Wenruo
2025-03-31 11:50   ` Filipe Manana
2025-03-31 21:19     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1743239672.git.wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox