All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Dmitry Ilvokhin <d@ilvokhin.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
Date: Tue, 19 May 2026 11:28:39 -0400	[thread overview]
Message-ID: <agyBp_j6CXuhfkfp@cmpxchg.org> (raw)
In-Reply-To: <agxqCE2juj14EhyZ@shell.ilvokhin.com>

On Tue, May 19, 2026 at 01:47:52PM +0000, Dmitry Ilvokhin wrote:
> On Mon, May 18, 2026 at 01:24:22PM -0700, Andrew Morton wrote:
> > On Mon, 18 May 2026 16:37:36 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:
> > 
> > > When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
> > > migratetype fallbacks and keep pageblocks clean. The allocator relies on
> > > reclaim and compaction to free pages of the correct type before allowing
> > > fallback as a last resort.
> > > 
> > > However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
> > > direct reclaim or compaction. With defrag_mode=1, these allocations hit
> > > the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
> > > ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.
> > > 
> > > This causes a large number of SLUB allocation failures for
> > > skbuff_head_cache under network-heavy workloads, despite free memory
> > > being available in other migratetype freelists.
> > > 
> > > Clear ALLOC_NOFRAGMENT and retry before giving up on allocations that
> > > cannot reclaim, following the same pattern used after reclaim/compaction
> > > exhaustion later in the slowpath.
> > 
> > Thanks.  Sashiko asked a couple of things:
> > 
> > 	https://sashiko.dev/#/patchset/20260518163736.173910-1-d@ilvokhin.com
> > 
> > I'm not sure what to make of the first one - we aren't holding any locks
> > in there which prevent concurrent cpuset or zonelist alterations
> > anyway (?).
> > 
> > But your change might violate the later comment `No "goto retry;" can be
> > placed above this check * unless it can execute just once'?
> 
> Thanks for taking a look, Andrew.
> 
> Goto retry can execute at most once, since ALLOC_NOFRAGMENT is cleared
> before the jump, so on the next iteration the condition is false and we
> fall through to goto nopage. This is the similar to the existing
> can_retry_reserves path.

Yes, it's just a one-off retry with relaxed fragmentation rules, no
need to re-evaluate the cpuset. So this looks fine to me.

> Just for the sake of keeping everything in one place. Another point
> Sashiko raised.
> 
> "Will allocations hitting this PF_MEMALLOC check, or the __GFP_NORETRY check
> further down in the function, still fail prematurely under defrag_mode=1?
> Because these terminal error paths also jump directly to the nopage label,
> they skip the normal ALLOC_NOFRAGMENT clearing at the bottom of the slowpath.
> Should we also clear ALLOC_NOFRAGMENT and retry for these paths so they are
> allowed to fall back rather than failing outright?"
> 
> I think by the time we reach the PF_MEMALLOC check, ALLOC_NOFRAGMENT has
> already been cleared, since we set only ALLOC_NO_WATERMARKS and
> ALLOC_KSWAPD in reserve_flags, when PF_MEMALLOC is set.

Yes, that's correct. alloc_flags gets overwritten, losing NOFRAGMENT,
for privileged requests. And then we retry with that already.

> For GFP_NORETRY, we can do direct reclaim (compared to GFP_ATOMIC case),
> so we either succeed or not, we don't need another round.

This is an interesting question.

GFP_NORETRY can reclaim and compact once, yes, but ALLOC_NOFRAGMENT is
still a higher bar, increasing the likelihood of failure.

However, unlike GFP_ATOMIC, GFP_NORETRY are usually speculative
allocations with reasonable fallback options (like slub's optimistic
higher order requests).

The idea behind defrag_mode is to not fragment until the alternative
is OOM. For GFP_ATOMIC, failing is an OOM-like event. For the other
nopage cases, it's more about "my favorite thing isn't available".

So I'd say let's fix GFP_ATOMIC and leave the other cases alone unless
somebody specifically brings it up as an issue.

However, there is one catch: GFP_ATOMIC is not its own flag. You're
gating on can_direct_reclaim which is also true for optimistic things
like mTHP allocations (GFP_TRANSHUGE_LIGHT e.g.). We don't want to
fragment for those, either.

So I think you'd want to check at least if __GFP_KSWAPD_RECLAIM is set
(which it is for GFP_ATOMIC) to decide between fragmenting and
failing. If the caller doesn't even set that, it's a good indicator
that they're purely speculative, and failing is the better option.

With that,

Acked-by: Johannes Weiner <hannes@cmpxchg.org>



  reply	other threads:[~2026-05-19 15:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 16:37 [PATCH] mm/page_alloc: fix defrag_mode for non-reclaimable allocations Dmitry Ilvokhin
2026-05-18 20:24 ` Andrew Morton
2026-05-19 13:47   ` Dmitry Ilvokhin
2026-05-19 15:28     ` Johannes Weiner [this message]
2026-05-20 11:35       ` Dmitry Ilvokhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agyBp_j6CXuhfkfp@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=d@ilvokhin.com \
    --cc=jackmanb@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.