From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v2 0/5] btrfs: improve stalls under sudden writeback
Date: Fri, 3 Apr 2026 15:27:50 -0700 [thread overview]
Message-ID: <cover.1775255085.git.boris@bur.io> (raw)
If you have a system with very large memory (TiBs) and a normal
percentage based dirty_ratio/dirty_background_ratio like the defaults of
20%/10%, then we can theoretically rack up 100s of GiB of dirty pages
before doing any writeback. This is further exacerbated if we also see a
sudden drop in the free memory due to a large allocation. If we
(relatively likely for a large ram system) also have a large disk, we are
unlikely to do trigger much preemptive metadata reclaim either.
Once we do start doing writeback with such a large supply, the results
are somewhat ugly. The delalloc work generates a huge amount of delayed
refs without proper reservations which sends the metadata space system
into a tailspin trying to run yet more delalloc to free space.
Ultimately, the system stalls waiting for huge amounts of ordered
extents and delayed refs blocking all users in start_transaction() on
tickets in reserve_space().
This patch series aims to address these issues in a relatively targeted
way by improving our reservations for delalloc delayed refs and by doing
some very basic smoothing of the work in flush_space(). Further work
could be done to improve flush_space() heuristics and latency but this
is already a big help on my observed workloads.
I was able to reproduce stalls on a more "modest" system with 264GiB of
ram by using a somewhat silly 80% dirty_ratio.
I was unfortunately unable to reproduce any stalls on a yet smaller
system with only 32GiB of ram.
The first 3 patches do the delayed_ref rsv accounting on btrfs_inode,
mirroring inode->block_rsv.
The 4th patch is a cleanup to the types counting max extents
The 5th patch reduces the size of the unit of work in shrink_delalloc()
to further reduce stalls.
---
Changelog:
v2:
- patch 1 no longer embeds a new block_rsv on btrfs_inode for the
delayed reservation. Instead it does the reservation on
inode->block_rsv and migrates it to trans->delayed_rsv at the moment
of truth.
Boris Burkov (5):
btrfs: reserve space for delayed_refs in delalloc
btrfs: account for csum delayed_refs in delalloc
btrfs: account for compression in delalloc extent reservation
btrfs: make inode->outstanding_extents a u64
btrfs: cap shrink_delalloc iterations to 128M
fs/btrfs/btrfs_inode.h | 17 +++++--
fs/btrfs/delalloc-space.c | 78 +++++++++++++++++++++++-------
fs/btrfs/delalloc-space.h | 3 ++
fs/btrfs/fs.h | 13 -----
fs/btrfs/inode.c | 93 ++++++++++++++++++++++++++++--------
fs/btrfs/ordered-data.c | 4 +-
fs/btrfs/space-info.c | 31 ++++++++----
fs/btrfs/tests/inode-tests.c | 18 +++----
fs/btrfs/transaction.c | 36 ++++++--------
include/trace/events/btrfs.h | 8 ++--
10 files changed, 201 insertions(+), 100 deletions(-)
--
2.53.0
next reply other threads:[~2026-04-03 22:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 22:27 Boris Burkov [this message]
2026-04-03 22:27 ` [PATCH v2 1/5] btrfs: reserve space for delayed_refs in delalloc Boris Burkov
2026-04-06 17:20 ` Filipe Manana
2026-04-06 17:40 ` Boris Burkov
2026-04-03 22:27 ` [PATCH v2 2/5] btrfs: account for csum " Boris Burkov
2026-04-06 17:33 ` Filipe Manana
2026-04-03 22:27 ` [PATCH v2 3/5] btrfs: account for compression in delalloc extent reservation Boris Burkov
2026-04-06 17:44 ` Filipe Manana
2026-04-03 22:27 ` [PATCH v2 4/5] btrfs: make inode->outstanding_extents a u64 Boris Burkov
2026-04-06 17:55 ` Filipe Manana
2026-04-03 22:27 ` [PATCH v2 5/5] btrfs: cap shrink_delalloc iterations to 128M Boris Burkov
2026-04-06 18:08 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1775255085.git.boris@bur.io \
--to=boris@bur.io \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox