public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <dgc@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Hang with xfs/285 on 2026-03-02 kernel
Date: Mon, 6 Apr 2026 08:29:51 +1000	[thread overview]
Message-ID: <adLiXxkpvfFhLoYh@dread> (raw)
In-Reply-To: <adF3RXLIzlp8SwZO@casper.infradead.org>

On Sat, Apr 04, 2026 at 09:40:37PM +0100, Matthew Wilcox wrote:
> On Sat, Apr 04, 2026 at 10:42:59PM +1100, Dave Chinner wrote:
> > On Fri, Apr 03, 2026 at 04:35:46PM +0100, Matthew Wilcox wrote:
> > > This is with commit 5619b098e2fb so after 7.0-rc6
> > > INFO: task fsstress:3762792 blocked on a semaphore likely last held by task fsstress:3762793
> > > task:fsstress        state:D stack:0     pid:3762793 tgid:3762793 ppid:3762783 task_flags:0x440140 flags:0x00080800
> > > Call Trace:
> > >  <TASK>
> > >  __schedule+0x560/0xfc0
> > >  schedule+0x3e/0x140
> > >  schedule_timeout+0x84/0x110
> > >  ? __pfx_process_timeout+0x10/0x10
> > >  io_schedule_timeout+0x5b/0x80
> > >  xfs_buf_alloc+0x793/0x7d0
> > 
> > -ENOMEM.
> > 
> > It'll be looping here:
> > 
> > fallback:
> >         for (;;) {
> >                 bp->b_addr = __vmalloc(size, gfp_mask);
> >                 if (bp->b_addr)
> >                         break;
> >                 if (flags & XBF_READ_AHEAD)
> >                         return -ENOMEM;
> >                 XFS_STATS_INC(bp->b_mount, xb_page_retries);
> >                 memalloc_retry_wait(gfp_mask);
> >         }
> > 
> > If it is looping here long enough to trigger the hang check timer,
> > then the MM subsystem is not making progress reclaiming memory. This
> > is probably a 16kB allocation (it's an inode cluster buffer), and
> > the allocation context is NOFAIL because it is within a transaction
> > (this loop pre-dates __vmalloc() supporting __GFP_NOFAIL)....
> 
> There may be something else going on.  I reproduced it again and ssh'd
> into the VM.
> 
> # free
>                total        used        free      shared  buff/cache   available
> Mem:         3988260     1197132      240080         144     3147496     2791128
> Swap:        2097148      258128     1839020
> 
> There are five instances of fsstress running.  Very slowly, but they are
> accumulating seconds of CPU time:
> 
> root@deadly-kvm:~# ps -aux |grep fsstress
> root     3745227  0.0  0.0   2664  1476 ?        S    06:48   0:00 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745236  7.5  1.6 127928 65256 ?        D    06:48  42:54 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745237  7.6  1.5 124644 61308 ?        D    06:48  42:55 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745238  7.6  1.6 130844 65584 ?        D    06:48  43:01 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745239  7.6  1.6 126524 66536 ?        D    06:48  42:58 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root@deadly-kvm:~# ps -aux |grep fsstress
> root     3745227  0.0  0.0   2664  1476 ?        S    06:48   0:00 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745236  5.5  1.6 133116 66708 ?        R    06:48  45:44 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745237  5.5  1.5 130136 62516 ?        R    06:48  45:45 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745238  5.5  1.6 136520 65944 ?        R    06:48  45:52 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> root     3745239  5.5  1.7 131988 67884 ?        R    06:48  45:50 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000
> 
> # cat /proc/3745239/stack
> [<0>] xfs_buf_lock+0x4b/0x170
> [<0>] xfs_buf_find_lock+0x69/0x140
> [<0>] xfs_buf_get_map+0x265/0xbd0
> [<0>] xfs_buf_read_map+0x59/0x2e0
> [<0>] xfs_trans_read_buf_map+0x1bb/0x560
> [<0>] xfs_read_agi+0xab/0x1a0
> (...)

It would be helpful to quote the full stack traces...

> # cat /proc/3745238/stack
> [<0>] xfs_buf_alloc+0x793/0x7d0
> [<0>] xfs_buf_get_map+0x651/0xbd0
> [<0>] xfs_buf_readahead_map+0x3b/0x1b0
> [<0>] xfs_iwalk_ichunk_ra+0xe9/0x130
> [<0>] xfs_iwalk_ag+0x185/0x2d0
> (...)

However, how is memory allocation stuck here? That's the readahead
path, which triggers an early exit from the __vmalloc() fallback
loop.  i.e. xfs_buf_alloc() does not loop forever on readahead - it
tries once and then exits.

Yes, this bulkstat path is holding the AGI buffer locked, and the
previous thread is waiting on the AGI buffer lock, but that doesn't
mean the system is deadlocked - it's just lockstepping on the AGI
buffer lock due to the long hold in the bulkstat path....

i.e. these traces do not indicate that there is any sort of memory
allocation problem in the system, just bulkstat slowing down other
operations...

-Dave.
-- 
Dave Chinner
dgc@kernel.org

  reply	other threads:[~2026-04-05 22:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03 15:35 Hang with xfs/285 on 2026-03-02 kernel Matthew Wilcox
2026-04-04 11:42 ` Dave Chinner
2026-04-04 20:40   ` Matthew Wilcox
2026-04-05 22:29     ` Dave Chinner [this message]
2026-04-05  1:03   ` Ritesh Harjani
2026-04-05 22:16     ` Dave Chinner
2026-04-06  0:27       ` Ritesh Harjani
2026-04-06 21:45         ` Dave Chinner
2026-04-07  5:41 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adLiXxkpvfFhLoYh@dread \
    --to=dgc@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox