From: Sun YangKai <sunk67188@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, Boris Burkov <boris@bur.io>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v4 4/4] btrfs: cap shrink_delalloc iterations to 128M
Date: Fri, 24 Apr 2026 17:48:24 +0800 [thread overview]
Message-ID: <781a2579-c6ee-4dd8-ac68-cdb7503a21e1@gmail.com> (raw)
In-Reply-To: <54030bf6-56a5-4633-9bc2-0008ca43191e@gmx.com>
On 2026/4/24 14:38, Qu Wenruo wrote:
>
>
> 在 2026/4/10 03:18, Boris Burkov 写道:
> [...]
>>
>> This means iterating over to_reclaim by 128MiB at a time until it is
>> drained or we satisfy a ticket, rather than trying 3 times to do the
>> whole thing.
>>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: Boris Burkov <boris@bur.io>
>
> Hi Boris,
>
> I'm testing the latest for-next base as the baseline for the incoming
> huge folio support.
>
> On arm64 64K page size, 4K fs block size, I'm seeing a very weird
> behavior on generic/027.
> On 7.0-rc7, the test case takes less than 5 seconds and passes as expected.
>
> But on for-next it never finished, furthermore there is always a kworker
> taking a full core, deadlooping inside
> btrfs_async_reclaim_metadata_space(), and you can not unmount the fs.
>
> Here is the "echo l > /proc/sysrq-trigger" stack dump for the involved
> btrfs kworker:
>
> [ 6616.093728] CPU: 0 UID: 0 PID: 501715 Comm: kworker/u33:0 Not tainted
> 7.0.0-rc7-custom-64k+ #9 PREEMPT(full)
> [ 6616.093732] Hardware name: QEMU KVM Virtual Machine, BIOS unknown
> 2/2/2022
> [ 6616.093734] Workqueue: events_unbound
> btrfs_async_reclaim_metadata_space [btrfs]
> [ 6616.093849] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS
> BTYPE=--)
> [ 6616.093852] pc : btrfs_start_delalloc_roots+0xf0/0x268 [btrfs]
> [ 6616.093923] lr : btrfs_start_delalloc_roots+0x88/0x268 [btrfs]
> [ 6616.093987] sp : ffff80008af0fbd0
> [...]
> [ 6616.094008] Call trace:
> [ 6616.094009] btrfs_start_delalloc_roots+0xf0/0x268 [btrfs] (P)
> [ 6616.094073] flush_space+0x3d4/0x6b0 [btrfs]
> [ 6616.094138] do_async_reclaim_metadata_space+0x88/0x1d8 [btrfs]
> [ 6616.094201] btrfs_async_reclaim_metadata_space+0x50/0x80 [btrfs]
> [ 6616.094263] process_one_work+0x174/0x540
> [ 6616.094277] worker_thread+0x1a0/0x318
> [ 6616.094279] kthread+0x140/0x158
> [ 6616.094285] ret_from_fork+0x10/0x20
>
> So it's a regression, and bisection points to this patch.
>
> And I tried the following steps to further confirm it's caused by this
> commit:
>
> - The test passes just before the commit
> The previous commit is "btrfs: make inode->outstanding_extents a u64".
>
> - The test failed at that commit
> The test case never finish and one kworker dead looping.
>
> - The test case pass at for-next with this commit reverted
> The test case finishes in seconds as usual.
>
> Do you have any clue on what's going wrong? I guess it's pretty hard to
> hit on x86_64.> I have a local btrfs branch with huge folios support, with that it's
> pretty easy to hit similar problems on x86_64, but without that branch,
> no hit is observed so far on x86_64.
>
> Thanks,
> Qu
>
>> ---
>> fs/btrfs/space-info.c | 31 ++++++++++++++++++++-----------
>> 1 file changed, 20 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>> index f0436eea1544..e931deb3d013 100644
>> --- a/fs/btrfs/space-info.c
>> +++ b/fs/btrfs/space-info.c
>> @@ -725,9 +725,8 @@ static void shrink_delalloc(struct
>> btrfs_space_info *space_info,
>> struct btrfs_trans_handle *trans;
>> u64 delalloc_bytes;
>> u64 ordered_bytes;
>> - u64 items;
>> long time_left;
>> - int loops;
>> + u64 orig_tickets_id;
>> delalloc_bytes = percpu_counter_sum_positive(&fs_info-
>> >delalloc_bytes);
>> ordered_bytes = percpu_counter_sum_positive(&fs_info-
>> >ordered_bytes);
>> @@ -735,9 +734,7 @@ static void shrink_delalloc(struct
>> btrfs_space_info *space_info,
>> return;
>> /* Calc the number of the pages we need flush for space
>> reservation */
>> - if (to_reclaim == U64_MAX) {
>> - items = U64_MAX;
>> - } else {
>> + if (to_reclaim != U64_MAX) {
>> /*
>> * to_reclaim is set to however much metadata we need to
>> * reclaim, but reclaiming that much data doesn't really track
>> @@ -751,7 +748,6 @@ static void shrink_delalloc(struct
>> btrfs_space_info *space_info,
>> * aggressive.
>> */
>> to_reclaim = max(to_reclaim, delalloc_bytes >> 3);
>> - items = calc_reclaim_items_nr(fs_info, to_reclaim) * 2;
>> }
>> trans = current->journal_info;
>> @@ -764,10 +760,14 @@ static void shrink_delalloc(struct
>> btrfs_space_info *space_info,
>> if (ordered_bytes > delalloc_bytes && !for_preempt)
>> wait_ordered = true;
>> - loops = 0;
>> - while ((delalloc_bytes || ordered_bytes) && loops < 3) {
>> - u64 temp = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;>> - long nr_pages = min_t(u64, temp, LONG_MAX);
>> + spin_lock(&space_info->lock);
>> + orig_tickets_id = space_info->tickets_id;
>> + spin_unlock(&space_info->lock);
>> +
>> + while ((delalloc_bytes || ordered_bytes) && to_reclaim) {
>> + u64 iter_reclaim = min_t(u64, to_reclaim, SZ_128M);
>> + long nr_pages = min_t(u64, delalloc_bytes, iter_reclaim) >>
>> PAGE_SHIFT;
I wonder if it's possible that delalloc_bytes < 64k while to_reclaim ==
U64_MAX so we'll get nr_pages == 0 on 64k page size and we'll loop for a
very long time(seems forever).
>> + u64 items = calc_reclaim_items_nr(fs_info, iter_reclaim) * 2;
>> int async_pages;
>> btrfs_start_delalloc_roots(fs_info, nr_pages, true);
>> @@ -811,7 +811,7 @@ static void shrink_delalloc(struct
>> btrfs_space_info *space_info,
>> atomic_read(&fs_info->async_delalloc_pages) <=
>> async_pages);
>> skip_async:
>> - loops++;
>> + to_reclaim -= iter_reclaim;
>> if (wait_ordered && !trans) {
>> btrfs_wait_ordered_roots(fs_info, items, NULL);
>> } else {
>> @@ -834,6 +834,15 @@ static void shrink_delalloc(struct
>> btrfs_space_info *space_info,
>> spin_unlock(&space_info->lock);
>> break;
>> }
>> + /*
>> + * If a ticket was satisfied since we started, break out
>> + * so the async reclaim state machine can process delayed
>> + * refs before we flush more delalloc.
>> + */
>> + if (space_info->tickets_id != orig_tickets_id) {
>> + spin_unlock(&space_info->lock);
>> + break;
>> + }
>> spin_unlock(&space_info->lock);
>> delalloc_bytes = percpu_counter_sum_positive(
>
>
next prev parent reply other threads:[~2026-04-24 9:48 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-09 17:48 [PATCH v4 0/4] btrfs: improve stalls under sudden writeback Boris Burkov
2026-04-09 17:48 ` [PATCH v4 1/4] btrfs: reserve space for delayed_refs in delalloc Boris Burkov
2026-04-10 16:07 ` Filipe Manana
2026-04-09 17:48 ` [PATCH v4 2/4] btrfs: account for compression in delalloc extent reservation Boris Burkov
2026-04-09 17:48 ` [PATCH v4 3/4] btrfs: make inode->outstanding_extents a u64 Boris Burkov
2026-04-13 18:43 ` David Sterba
2026-04-09 17:48 ` [PATCH v4 4/4] btrfs: cap shrink_delalloc iterations to 128M Boris Burkov
2026-04-24 6:38 ` Qu Wenruo
2026-04-24 9:48 ` Sun YangKai [this message]
2026-04-24 10:07 ` Qu Wenruo
2026-04-24 15:26 ` Boris Burkov
2026-04-24 20:11 ` Boris Burkov
2026-04-24 22:06 ` Qu Wenruo
2026-04-24 22:10 ` Boris Burkov
2026-04-24 22:21 ` Qu Wenruo
2026-04-24 22:23 ` Boris Burkov
2026-04-24 22:59 ` Qu Wenruo
2026-04-13 18:41 ` [PATCH v4 0/4] btrfs: improve stalls under sudden writeback David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=781a2579-c6ee-4dd8-ac68-cdb7503a21e1@gmail.com \
--to=sunk67188@gmail.com \
--cc=boris@bur.io \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox