From: Dave Chinner <dgc@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Hang with xfs/285 on 2026-03-02 kernel
Date: Mon, 6 Apr 2026 08:29:51 +1000 [thread overview]
Message-ID: <adLiXxkpvfFhLoYh@dread> (raw)
In-Reply-To: <adF3RXLIzlp8SwZO@casper.infradead.org>
On Sat, Apr 04, 2026 at 09:40:37PM +0100, Matthew Wilcox wrote:
> On Sat, Apr 04, 2026 at 10:42:59PM +1100, Dave Chinner wrote:
> > On Fri, Apr 03, 2026 at 04:35:46PM +0100, Matthew Wilcox wrote:
> > > This is with commit 5619b098e2fb so after 7.0-rc6
> > > INFO: task fsstress:3762792 blocked on a semaphore likely last held by task fsstress:3762793
> > > task:fsstress state:D stack:0 pid:3762793 tgid:3762793 ppid:3762783 task_flags:0x440140 flags:0x00080800
> > > Call Trace:
> > > <TASK>
> > > __schedule+0x560/0xfc0
> > > schedule+0x3e/0x140
> > > schedule_timeout+0x84/0x110
> > > ? __pfx_process_timeout+0x10/0x10
> > > io_schedule_timeout+0x5b/0x80
> > > xfs_buf_alloc+0x793/0x7d0
> >
> > -ENOMEM.
> >
> > It'll be looping here:
> >
> > fallback:
> > for (;;) {
> > bp->b_addr = __vmalloc(size, gfp_mask);
> > if (bp->b_addr)
> > break;
> > if (flags & XBF_READ_AHEAD)
> > return -ENOMEM;
> > XFS_STATS_INC(bp->b_mount, xb_page_retries);
> > memalloc_retry_wait(gfp_mask);
> > }
> >
> > If it is looping here long enough to trigger the hang check timer,
> > then the MM subsystem is not making progress reclaiming memory. This
> > is probably a 16kB allocation (it's an inode cluster buffer), and
> > the allocation context is NOFAIL because it is within a transaction
> > (this loop pre-dates __vmalloc() supporting __GFP_NOFAIL)....
>
> There may be something else going on. I reproduced it again and ssh'd
> into the VM.
>
> # free
> total used free shared buff/cache available
> Mem: 3988260 1197132 240080 144 3147496 2791128
> Swap: 2097148 258128 1839020
>
> There are five instances of fsstress running. Very slowly, but they are
> accumulating seconds of CPU time:
>
> root@deadly-kvm:~# ps -aux |grep fsstress
> root 3745227 0.0 0.0 2664 1476 ? S 06:48 0:00 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745236 7.5 1.6 127928 65256 ? D 06:48 42:54 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745237 7.6 1.5 124644 61308 ? D 06:48 42:55 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745238 7.6 1.6 130844 65584 ? D 06:48 43:01 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745239 7.6 1.6 126524 66536 ? D 06:48 42:58 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root@deadly-kvm:~# ps -aux |grep fsstress
> root 3745227 0.0 0.0 2664 1476 ? S 06:48 0:00 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745236 5.5 1.6 133116 66708 ? R 06:48 45:44 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745237 5.5 1.5 130136 62516 ? R 06:48 45:45 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745238 5.5 1.6 136520 65944 ? R 06:48 45:52 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root 3745239 5.5 1.7 131988 67884 ? R 06:48 45:50 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
>
> # cat /proc/3745239/stack
> [<0>] xfs_buf_lock+0x4b/0x170
> [<0>] xfs_buf_find_lock+0x69/0x140
> [<0>] xfs_buf_get_map+0x265/0xbd0
> [<0>] xfs_buf_read_map+0x59/0x2e0
> [<0>] xfs_trans_read_buf_map+0x1bb/0x560
> [<0>] xfs_read_agi+0xab/0x1a0
> (...)
It would be helpful to quote the full stack traces...
> # cat /proc/3745238/stack
> [<0>] xfs_buf_alloc+0x793/0x7d0
> [<0>] xfs_buf_get_map+0x651/0xbd0
> [<0>] xfs_buf_readahead_map+0x3b/0x1b0
> [<0>] xfs_iwalk_ichunk_ra+0xe9/0x130
> [<0>] xfs_iwalk_ag+0x185/0x2d0
> (...)
However, how is memory allocation stuck here? That's the readahead
path, which triggers an early exit from the __vmalloc() fallback
loop. i.e. xfs_buf_alloc() does not loop forever on readahead - it
tries once and then exits.
Yes, this bulkstat path is holding the AGI buffer locked, and the
previous thread is waiting on the AGI buffer lock, but that doesn't
mean the system is deadlocked - it's just lockstepping on the AGI
buffer lock due to the long hold in the bulkstat path....
i.e. these traces do not indicate that there is any sort of memory
allocation problem in the system, just bulkstat slowing down other
operations...
-Dave.
--
Dave Chinner
dgc@kernel.org
next prev parent reply other threads:[~2026-04-05 22:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 15:35 Hang with xfs/285 on 2026-03-02 kernel Matthew Wilcox
2026-04-04 11:42 ` Dave Chinner
2026-04-04 20:40 ` Matthew Wilcox
2026-04-05 22:29 ` Dave Chinner [this message]
2026-04-05 1:03 ` Ritesh Harjani
2026-04-05 22:16 ` Dave Chinner
2026-04-06 0:27 ` Ritesh Harjani
2026-04-06 21:45 ` Dave Chinner
2026-04-07 5:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adLiXxkpvfFhLoYh@dread \
--to=dgc@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox