Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v4 4/4] btrfs: cap shrink_delalloc iterations to 128M
Date: Fri, 24 Apr 2026 08:26:32 -0700	[thread overview]
Message-ID: <20260424152632.GA2637332@zen.localdomain> (raw)
In-Reply-To: <d916e8e3-0c25-4540-b829-a13ef058c762@gmx.com>

On Fri, Apr 24, 2026 at 07:37:38PM +0930, Qu Wenruo wrote:
> 
> 
> 在 2026/4/24 16:08, Qu Wenruo 写道:
> > 
> > 
> > 在 2026/4/10 03:18, Boris Burkov 写道:
> > [...] > > > 
> > > This means iterating over to_reclaim by 128MiB at a time until it is
> > > drained or we satisfy a ticket, rather than trying 3 times to do the
> > > whole thing.
> > > 
> > > Reviewed-by: Filipe Manana <fdmanana@suse.com>
> > > Signed-off-by: Boris Burkov <boris@bur.io>
> > 
> > Hi Boris,
> > 
> > I'm testing the latest for-next base as the baseline for the incoming
> > huge folio support.
> > 
> > On arm64 64K page size, 4K fs block size, I'm seeing a very weird
> > behavior on generic/027.
> > On 7.0-rc7, the test case takes less than 5 seconds and passes as expected.
> > 
> > But on for-next it never finished, furthermore there is always a kworker
> > taking a full core, deadlooping inside
> > btrfs_async_reclaim_metadata_space(), and you can not unmount the fs.
> > 
> > Here is the "echo l > /proc/sysrq-trigger" stack dump for the involved
> > btrfs kworker:
> > 
> > [ 6616.093728] CPU: 0 UID: 0 PID: 501715 Comm: kworker/u33:0 Not tainted
> > 7.0.0-rc7-custom-64k+ #9 PREEMPT(full)
> > [ 6616.093732] Hardware name: QEMU KVM Virtual Machine, BIOS unknown
> > 2/2/2022
> > [ 6616.093734] Workqueue: events_unbound
> > btrfs_async_reclaim_metadata_space [btrfs]
> > [ 6616.093849] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS
> > BTYPE=--)
> > [ 6616.093852] pc : btrfs_start_delalloc_roots+0xf0/0x268 [btrfs]
> > [ 6616.093923] lr : btrfs_start_delalloc_roots+0x88/0x268 [btrfs]
> > [ 6616.093987] sp : ffff80008af0fbd0
> > [...]
> > [ 6616.094008] Call trace:
> > [ 6616.094009]  btrfs_start_delalloc_roots+0xf0/0x268 [btrfs] (P)
> > [ 6616.094073]  flush_space+0x3d4/0x6b0 [btrfs]
> > [ 6616.094138]  do_async_reclaim_metadata_space+0x88/0x1d8 [btrfs]
> > [ 6616.094201]  btrfs_async_reclaim_metadata_space+0x50/0x80 [btrfs]
> > [ 6616.094263]  process_one_work+0x174/0x540
> > [ 6616.094277]  worker_thread+0x1a0/0x318
> > [ 6616.094279]  kthread+0x140/0x158
> > [ 6616.094285]  ret_from_fork+0x10/0x20
> > 
> > So it's a regression, and bisection points to this patch.
> > 
> > And I tried the following steps to further confirm it's caused by this
> > commit:
> > 
> > - The test passes just before the commit
> >    The previous commit is "btrfs: make inode->outstanding_extents a u64".
> > 
> > - The test failed at that commit
> >    The test case never finish and one kworker dead looping.
> > 
> > - The test case pass at for-next with this commit reverted
> >    The test case finishes in seconds as usual.
> 
> Furthermore, even with this particular patch *reverted*, I'm still seeing
> generic/224 hitting the same problem.
> 
> Currently I'm testing at the commit before the whole series, which is
> "btrfs: abort transaction in do_remap_reloc_trans() on failure", and no
> generic/224 hang nor 100% kworker CPU usage.
> 
> Thus I'm afraid the whole series may be involved.
> 
> Thanks,
> Qu
> 

Thank you very much for the thorough debugging, and sorry for the
disruption. I suspect Sun's idea will be at least one bug with the async
reclaim pacing patch. I think I need to go away from the infinite loop
loop entirely, or at least make sure it has other escape hatches.

However, I am really struggling to conceptually see how changing the
reservation sizing at delalloc or fussing with the types of
outstanding_extents would make us spin forever in async metadata
reclaim. Are you 100% sure that the hang reproduces with this series
minus 'btrfs: cap shrink_delalloc iterations to 128M'?

I will try to hunt down an arm machine and reproduce myself. And keep
thinking on what could be wrong with the new delalloc rsv numbers to make
async reclaim loop forever.

Thanks again,
Boris

> > 
> > Do you have any clue on what's going wrong? I guess it's pretty hard to
> > hit on x86_64.
> > 
> > I have a local btrfs branch with huge folios support, with that it's
> > pretty easy to hit similar problems on x86_64, but without that branch,
> > no hit is observed so far on x86_64.
> > 
> > Thanks,
> > Qu
> > 
> > > ---
> > >   fs/btrfs/space-info.c | 31 ++++++++++++++++++++-----------
> > >   1 file changed, 20 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> > > index f0436eea1544..e931deb3d013 100644
> > > --- a/fs/btrfs/space-info.c
> > > +++ b/fs/btrfs/space-info.c
> > > @@ -725,9 +725,8 @@ static void shrink_delalloc(struct
> > > btrfs_space_info *space_info,
> > >       struct btrfs_trans_handle *trans;
> > >       u64 delalloc_bytes;
> > >       u64 ordered_bytes;
> > > -    u64 items;
> > >       long time_left;
> > > -    int loops;
> > > +    u64 orig_tickets_id;
> > >       delalloc_bytes = percpu_counter_sum_positive(&fs_info-
> > > >delalloc_bytes);
> > >       ordered_bytes = percpu_counter_sum_positive(&fs_info-
> > > >ordered_bytes);
> > > @@ -735,9 +734,7 @@ static void shrink_delalloc(struct
> > > btrfs_space_info *space_info,
> > >           return;
> > >       /* Calc the number of the pages we need flush for space
> > > reservation */
> > > -    if (to_reclaim == U64_MAX) {
> > > -        items = U64_MAX;
> > > -    } else {
> > > +    if (to_reclaim != U64_MAX) {
> > >           /*
> > >            * to_reclaim is set to however much metadata we need to
> > >            * reclaim, but reclaiming that much data doesn't really track
> > > @@ -751,7 +748,6 @@ static void shrink_delalloc(struct
> > > btrfs_space_info *space_info,
> > >            * aggressive.
> > >            */
> > >           to_reclaim = max(to_reclaim, delalloc_bytes >> 3);
> > > -        items = calc_reclaim_items_nr(fs_info, to_reclaim) * 2;
> > >       }
> > >       trans = current->journal_info;
> > > @@ -764,10 +760,14 @@ static void shrink_delalloc(struct
> > > btrfs_space_info *space_info,
> > >       if (ordered_bytes > delalloc_bytes && !for_preempt)
> > >           wait_ordered = true;
> > > -    loops = 0;
> > > -    while ((delalloc_bytes || ordered_bytes) && loops < 3) {
> > > -        u64 temp = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
> > > -        long nr_pages = min_t(u64, temp, LONG_MAX);
> > > +    spin_lock(&space_info->lock);
> > > +    orig_tickets_id = space_info->tickets_id;
> > > +    spin_unlock(&space_info->lock);
> > > +
> > > +    while ((delalloc_bytes || ordered_bytes) && to_reclaim) {
> > > +        u64 iter_reclaim = min_t(u64, to_reclaim, SZ_128M);
> > > +        long nr_pages = min_t(u64, delalloc_bytes, iter_reclaim) >>
> > > PAGE_SHIFT;
> > > +        u64 items = calc_reclaim_items_nr(fs_info, iter_reclaim) * 2;
> > >           int async_pages;
> > >           btrfs_start_delalloc_roots(fs_info, nr_pages, true);
> > > @@ -811,7 +811,7 @@ static void shrink_delalloc(struct
> > > btrfs_space_info *space_info,
> > >                  atomic_read(&fs_info->async_delalloc_pages) <=
> > >                  async_pages);
> > >   skip_async:
> > > -        loops++;
> > > +        to_reclaim -= iter_reclaim;
> > >           if (wait_ordered && !trans) {
> > >               btrfs_wait_ordered_roots(fs_info, items, NULL);
> > >           } else {
> > > @@ -834,6 +834,15 @@ static void shrink_delalloc(struct
> > > btrfs_space_info *space_info,
> > >               spin_unlock(&space_info->lock);
> > >               break;
> > >           }
> > > +        /*
> > > +         * If a ticket was satisfied since we started, break out
> > > +         * so the async reclaim state machine can process delayed
> > > +         * refs before we flush more delalloc.
> > > +         */
> > > +        if (space_info->tickets_id != orig_tickets_id) {
> > > +            spin_unlock(&space_info->lock);
> > > +            break;
> > > +        }
> > >           spin_unlock(&space_info->lock);
> > >           delalloc_bytes = percpu_counter_sum_positive(
> > 
> > 
> 

  reply	other threads:[~2026-04-24 15:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 17:48 [PATCH v4 0/4] btrfs: improve stalls under sudden writeback Boris Burkov
2026-04-09 17:48 ` [PATCH v4 1/4] btrfs: reserve space for delayed_refs in delalloc Boris Burkov
2026-04-10 16:07   ` Filipe Manana
2026-04-09 17:48 ` [PATCH v4 2/4] btrfs: account for compression in delalloc extent reservation Boris Burkov
2026-04-09 17:48 ` [PATCH v4 3/4] btrfs: make inode->outstanding_extents a u64 Boris Burkov
2026-04-13 18:43   ` David Sterba
2026-04-09 17:48 ` [PATCH v4 4/4] btrfs: cap shrink_delalloc iterations to 128M Boris Burkov
2026-04-24  6:38   ` Qu Wenruo
2026-04-24  9:48     ` Sun YangKai
2026-04-24 10:07     ` Qu Wenruo
2026-04-24 15:26       ` Boris Burkov [this message]
2026-04-24 20:11       ` Boris Burkov
2026-04-24 22:06         ` Qu Wenruo
2026-04-24 22:10           ` Boris Burkov
2026-04-24 22:21             ` Qu Wenruo
2026-04-24 22:23               ` Boris Burkov
2026-04-24 22:59               ` Qu Wenruo
2026-04-13 18:41 ` [PATCH v4 0/4] btrfs: improve stalls under sudden writeback David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260424152632.GA2637332@zen.localdomain \
    --to=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox