Re: Hang with xfs/285 on 2026-03-02 kernel

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <dgc@kernel.org>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>, linux-xfs@vger.kernel.org
Subject: Re: Hang with xfs/285 on 2026-03-02 kernel
Date: Mon, 6 Apr 2026 08:16:07 +1000	[thread overview]
Message-ID: <adLfJwoi1lZhnbjn@dread> (raw)
In-Reply-To: <341amd4w.ritesh.list@gmail.com>

On Sun, Apr 05, 2026 at 06:33:59AM +0530, Ritesh Harjani wrote:
> Dave Chinner <dgc@kernel.org> writes:
> 
> > On Fri, Apr 03, 2026 at 04:35:46PM +0100, Matthew Wilcox wrote:
> >> This is with commit 5619b098e2fb so after 7.0-rc6
> >> INFO: task fsstress:3762792 blocked on a semaphore likely last held by task fsstress:3762793
> >> task:fsstress        state:D stack:0     pid:3762793 tgid:3762793 ppid:3762783 task_flags:0x440140 flags:0x00080800
> >> Call Trace:
> >>  <TASK>
> >>  __schedule+0x560/0xfc0
> >>  schedule+0x3e/0x140
> >>  schedule_timeout+0x84/0x110
> >>  ? __pfx_process_timeout+0x10/0x10
> >>  io_schedule_timeout+0x5b/0x80
> >>  xfs_buf_alloc+0x793/0x7d0
> >
> > -ENOMEM.
> >
> > It'll be looping here:
> >
> > fallback:
> >         for (;;) {
> >                 bp->b_addr = __vmalloc(size, gfp_mask);
> >                 if (bp->b_addr)
> >                         break;
> >                 if (flags & XBF_READ_AHEAD)
> >                         return -ENOMEM;
> >                 XFS_STATS_INC(bp->b_mount, xb_page_retries);
> >                 memalloc_retry_wait(gfp_mask);
> >         }
> >
> > If it is looping here long enough to trigger the hang check timer,
> > then the MM subsystem is not making progress reclaiming memory. This
> 
> Hi Dave,
> 
> If that's the case and if we expect the MM subsystem to do memory
> reclaim, shouldn't we be passing the __GFP_DIRECT_RECLAIM flag to our
> fallback loop? I see that we might have cleared this flag and also set
> __GFP_NORETRY, in the above if condition if allocation size is >PAGE_SIZE.
> 
> So shouldn't we do?
> 
>         if (size > PAGE_SIZE) {
>                 if (!is_power_of_2(size))
>                         goto fallback;
> -               gfp_mask &= ~__GFP_DIRECT_RECLAIM;
> -               gfp_mask |= __GFP_NORETRY;
> +               gfp_t alloc_gfp = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
> +               folio = folio_alloc(alloc_gfp, get_order(size));
> +       } else {
> +               folio = folio_alloc(gfp_mask, get_order(size));
>         }
> -       folio = folio_alloc(gfp_mask, get_order(size));
>         if (!folio) {
>                 if (size <= PAGE_SIZE)
>                         return -ENOMEM;
>                 trace_xfs_buf_backing_fallback(bp, _RET_IP_);
>                 goto fallback;
>         }

Possibly.

That said, we really don't want stuff like compaction to
run here -ever- because of how expensive it is for hot paths when
memory is low, and the only knob we have to control that is
__GFP_DIRECT_RECLAIM.

However, turning off direct reclaim should make no difference in
the long run because vmalloc is only trying to allocate a batch of
single page folios.

If we are in low memory situations where no single page folios are
not available, then even for a NORETRY/no direct reclaim allocation
the expectation is that the failed allocation attempt would be
kicking kswapd to perform background memory reclaim.

This is especially true when the allocation is GFP_NOFS/GFP_NOIO
even with direct reclaim turned on - if all the memory is held in
shrinkable fs/vfs caches then direct reclaim cannot reclaim anything
filesystem/IO related.

i.e. background reclaim making forwards progress is absolutely
necessary for any sort of "nofail" allocation loop to succeed
regardless of whether direct reclaim is enabled or not.

Hence if background memory reclaim is making progress, this
allocation loop should eventually succeed. If the allocation is not
succeeding, then it implies that some critical resource in the
allocation path is not being refilled either on allocation failure
or by background reclaim, and hence the allocation failure persists
because nothing alleviates the resource shortage that is triggering
the ENOMEM issue.

So the question is: where in the __vmalloc allocation path is the
ENOMEM error being generated from, and is it the same place every
time?

-Dave.
-- 
Dave Chinner
dgc@kernel.org

next prev parent reply	other threads:[~2026-04-05 22:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03 15:35 Hang with xfs/285 on 2026-03-02 kernel Matthew Wilcox
2026-04-04 11:42 ` Dave Chinner
2026-04-04 20:40   ` Matthew Wilcox
2026-04-05 22:29     ` Dave Chinner
2026-04-05  1:03   ` Ritesh Harjani
2026-04-05 22:16     ` Dave Chinner [this message]
2026-04-06  0:27       ` Ritesh Harjani
2026-04-06 21:45         ` Dave Chinner
2026-04-07  5:41 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adLfJwoi1lZhnbjn@dread \
    --to=dgc@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox