From: Dave Chinner <dgc@kernel.org>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>, linux-xfs@vger.kernel.org
Subject: Re: Hang with xfs/285 on 2026-03-02 kernel
Date: Tue, 7 Apr 2026 07:45:58 +1000 [thread overview]
Message-ID: <adQplqmx-4RaEh2e@dread> (raw)
In-Reply-To: <y0j1kk6d.ritesh.list@gmail.com>
On Mon, Apr 06, 2026 at 05:57:06AM +0530, Ritesh Harjani wrote:
> > However, turning off direct reclaim should make no difference in
> > the long run because vmalloc is only trying to allocate a batch of
> > single page folios.
> >
> > If we are in low memory situations where no single page folios are
> > not available, then even for a NORETRY/no direct reclaim allocation
> > the expectation is that the failed allocation attempt would be
> > kicking kswapd to perform background memory reclaim.
> >
> > This is especially true when the allocation is GFP_NOFS/GFP_NOIO
> > even with direct reclaim turned on - if all the memory is held in
> > shrinkable fs/vfs caches then direct reclaim cannot reclaim anything
> > filesystem/IO related.
> >
>
> So, looking at the logs from Matthew, I think, this case might have
> benefitted from __GFP_DIRECT_RECLAIM, because we have many clean
> inactive file pages. So theoritically, IMO direct reclaim should be able
> to use one of those clean file pages (after it gets direct-reclaimed)
>
> nr_zone_inactive_file 62769
> nr_zone_write_pending 0
You miss the point - this is not an isolated use case. e.g. Look at
xlog_kvmalloc() - it's also ~__GFP_DIRECT_RECLAIM, NORETRY vmalloc()
loop. What's to stop that one from getting stuck in exactly the same
way?
To that point, kvmalloc(GFP_NOFAIL) now implements the semantics
that xlog_kvmalloc() requires - it turns of direct reclaim (and
hence costly compaction) for the kmalloc() allocation attempt, then
falls back to vmalloc(GFP_NOFAIL) if kmalloc fails.
That's also pretty much the exact semantics we are trying to
implement in in xfs_buf_alloc(), yes? i.e. xfs_buf_alloc() does:
For buffers < PAGE_SIZE, it calls kmalloc() directly and returns.
For buffers == PAGESIZE, it calls folio_alloc(GFP_KERNEL).
For buffers > PAGE_SIZE, it calls folio_alloc(NORETRY, ~__GFP_DIRECT_RECLAIM)
if either folio_alloc() call fails, it effectively runs an open
coded __vmalloc() no-fail loop.
IOWs we are implementing essentially the same semantics as
kvmalloc(__GFP_NOFAIL), modulo the reclaim flags for the __vmalloc()
loop. If we are going to change the flags for the vmalloc() loop
to be the original, then we are essentially reimplementing
kvmalloc(GFP_NOFAIL) semantics exactly. At which point....
> > i.e. background reclaim making forwards progress is absolutely
> > necessary for any sort of "nofail" allocation loop to succeed
> > regardless of whether direct reclaim is enabled or not.
> >
> > Hence if background memory reclaim is making progress, this
> > allocation loop should eventually succeed. If the allocation is not
> > succeeding, then it implies that some critical resource in the
> > allocation path is not being refilled either on allocation failure
> > or by background reclaim, and hence the allocation failure persists
> > because nothing alleviates the resource shortage that is triggering
> > the ENOMEM issue.
>
> I agree, background memory reclaim / kswapd thread should have made
> forward progress.
>
> I am not sure why in this case, we are we hitting hung tasks issues then.
> Could be because of multiple fsstress threads running in parallel (from
> ps -eax output), and maybe some other process ends up using the pages
> reclaimed by background kswapd (just a theory).
I don't think that's the case, because kswapd is supposed to run
until watermarks are reached and that means all free page pools are
supposed to have at least some free pages in them...
That's why I think there's a reclaim bug lurking here - allocation
appears to be stalling on something that background reclaim is not
refilling. And if allocation is stalling on buffer allocation, then
it can stall in other critical parts of XFS, too. Background reclaim
not doing sufficient work to make looping non-blocking, no-retry
allocations to succeed seems like a memory allocation/reclaim bug to
me, not an XFS issue...
> > So the question is: where in the __vmalloc allocation path is the
> > ENOMEM error being generated from, and is it the same place every
> > time?
> >
>
> Although I can't say for sure, but in this case after looking at the
> code, and knowing that we are not passing __GFP_DIRECT_RECLAIM, it might
> be returning from here (after get_page_from_freelist() couldn't get a
> free page).
>
> __alloc_pages_slowpath() {
> ...
> /* Caller is not willing to reclaim, we can't balance anything */
> if (!can_direct_reclaim)
> goto nopage;
Sure, we can't balance anything, but we've set ALLOC_KSWAPD early in
this function and so every time we get to the above point in the
allocation code we've alreayd run this:
retry:
/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
Hence kswapds should be active and doing reclaim work to bring
everything back to minimum free pool watermarks. That *should* be
sufficient for a no-direct-reclaim allocation loop to make progress.
-Dave.
--
Dave Chinner
dgc@kernel.org
next prev parent reply other threads:[~2026-04-06 21:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 15:35 Hang with xfs/285 on 2026-03-02 kernel Matthew Wilcox
2026-04-04 11:42 ` Dave Chinner
2026-04-04 20:40 ` Matthew Wilcox
2026-04-05 22:29 ` Dave Chinner
2026-04-05 1:03 ` Ritesh Harjani
2026-04-05 22:16 ` Dave Chinner
2026-04-06 0:27 ` Ritesh Harjani
2026-04-06 21:45 ` Dave Chinner [this message]
2026-04-07 5:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adQplqmx-4RaEh2e@dread \
--to=dgc@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ritesh.list@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox