public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] btrfs: improve stalls under sudden writeback
@ 2026-04-07 19:30 Boris Burkov
  2026-04-07 19:30 ` [PATCH v3 1/4] btrfs: reserve space for delayed_refs in delalloc Boris Burkov
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Boris Burkov @ 2026-04-07 19:30 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If you have a system with very large memory (TiBs) and a normal
percentage based dirty_ratio/dirty_background_ratio like the defaults of
20%/10%, then we can theoretically rack up 100s of GiB of dirty pages
before doing any writeback. This is further exacerbated if we also see a
sudden drop in the free memory due to a large allocation. If we
(relatively likely for a large ram system) also have a large disk, we are
unlikely to do trigger much preemptive metadata reclaim either.

Once we do start doing writeback with such a large supply, the results
are somewhat ugly. The delalloc work generates a huge amount of delayed
refs without proper reservations which sends the metadata space system
into a tailspin trying to run yet more delalloc to free space.
Ultimately, the system stalls waiting for huge amounts of ordered
extents and delayed refs blocking all users in start_transaction() on
tickets in reserve_space().

This patch series aims to address these issues in a relatively targeted
way by improving our reservations for delalloc delayed refs and by doing
some very basic smoothing of the work in flush_space(). Further work
could be done to improve flush_space() heuristics and latency but this
is already a big help on my observed workloads.

I was able to reproduce stalls on a more "modest" system with 264GiB of
ram by using a somewhat silly 80% dirty_ratio.

I was unfortunately unable to reproduce any stalls on a yet smaller
system with only 32GiB of ram.

The first 2 patches do the delayed_ref rsv accounting on btrfs_inode,
mirroring inode->block_rsv.
The 3th patch is a cleanup to the types counting max extents
The 4th patch reduces the size of the unit of work in shrink_delalloc()
to further reduce stalls.
---
Changelog:
v3:
- Merge csum reservation patch (2) into main delalloc delrefs rsv patch (1)
- Add delayed refs reservations for RST and subvol tree metadata cow to
  patch 1.
- Do the migration in the nocow/prealloc finish_one_ordered() cases as
  there are still metadata delayed refs generated.
- Double delref rsv for cows (add+drop). This seems really conservative
  to me, but I think it is correct. If we like it, it needs to happen
  more places too...
- Upgrade ASSERTs in patch 3 (old patch 4) to log unexpected values.
- Remove unused return value in migrate function.
- Various stylistic issues in several patches.
v2:
- patch 1 no longer embeds a new block_rsv on btrfs_inode for the
  delayed reservation. Instead it does the reservation on
  inode->block_rsv and migrates it to trans->delayed_rsv at the moment
  of truth.

Boris Burkov (4):
  btrfs: reserve space for delayed_refs in delalloc
  btrfs: account for compression in delalloc extent reservation
  btrfs: make inode->outstanding_extents a u64
  btrfs: cap shrink_delalloc iterations to 128M

 fs/btrfs/btrfs_inode.h       | 20 ++++++--
 fs/btrfs/delalloc-space.c    | 84 +++++++++++++++++++++++++++------
 fs/btrfs/delalloc-space.h    |  3 ++
 fs/btrfs/fs.h                | 13 ------
 fs/btrfs/inode.c             | 90 ++++++++++++++++++++++++++++--------
 fs/btrfs/ordered-data.c      |  4 +-
 fs/btrfs/space-info.c        | 31 ++++++++-----
 fs/btrfs/tests/inode-tests.c | 18 ++++----
 fs/btrfs/transaction.c       | 36 ++++++---------
 include/trace/events/btrfs.h |  8 ++--
 10 files changed, 210 insertions(+), 97 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-10 23:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07 19:30 [PATCH v3 0/4] btrfs: improve stalls under sudden writeback Boris Burkov
2026-04-07 19:30 ` [PATCH v3 1/4] btrfs: reserve space for delayed_refs in delalloc Boris Burkov
2026-04-08 14:56   ` Filipe Manana
2026-04-08 17:34     ` Boris Burkov
2026-04-10 15:27       ` Filipe Manana
2026-04-10 23:06         ` Boris Burkov
2026-04-07 19:30 ` [PATCH v3 2/4] btrfs: account for compression in delalloc extent reservation Boris Burkov
2026-04-08 14:58   ` Filipe Manana
2026-04-07 19:30 ` [PATCH v3 3/4] btrfs: make inode->outstanding_extents a u64 Boris Burkov
2026-04-08 15:04   ` Filipe Manana
2026-04-07 19:30 ` [PATCH v3 4/4] btrfs: cap shrink_delalloc iterations to 128M Boris Burkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox