From: Johannes Weiner <hannes@cmpxchg.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Dmitry Ilvokhin <d@ilvokhin.com>,
Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
Date: Tue, 26 May 2026 13:51:40 -0400 [thread overview]
Message-ID: <ahXdrP2eNYw1jc0P@cmpxchg.org> (raw)
In-Reply-To: <2aedfd17-17e6-4dfe-8ae5-c7342ead708b@kernel.org>
On Tue, May 26, 2026 at 03:13:09PM +0200, Vlastimil Babka (SUSE) wrote:
> On 5/22/26 3:05 PM, Dmitry Ilvokhin wrote:
> > On Thu, May 21, 2026 at 04:59:10PM -0700, Andrew Morton wrote:
> >> On Wed, 20 May 2026 12:22:28 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:
> >>
> >>> When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
> >>> migratetype fallbacks and keep pageblocks clean. The allocator relies on
> >>> reclaim and compaction to free pages of the correct type before allowing
> >>> fallback as a last resort.
> >>>
> >>> However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
> >>> direct reclaim or compaction. With defrag_mode=1, these allocations hit
> >>> the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
> >>> ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.
> >>>
> >>> This causes a large number of SLUB allocation failures for
> >>> skbuff_head_cache under network-heavy workloads, despite free memory
> >>> being available in other migratetype freelists.
> >>
> >> That sounds painful.
> >>
> >>> Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd
> >>> reclaim but cannot do direct reclaim themselves (GFP_ATOMIC). Purely
> >>> speculative allocations like GFP_TRANSHUGE_LIGHT that don't set
> >>> __GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable
> >>> fallbacks and should not cause fragmentation.
> >>
> >> How serious is this to our users when running real-world workloads?
> >
> > We observed it on a few of the Meta workloads that adopted
> > defrag_mode=1.
>
> Do you (or Johannes) have some observations to share about what
> motivated those to adopt it, what kind of workloads benefit and how?
As you may remember it was developed to help with higher order / THP
success rates under pressure.
The impetus for actually deploying it was that we saw issues with
avalanches of large page cache folios vacuuming up the higher-order
chunks; this (ironically) also led to failures on the network side.
It's kind of a structural problem. We have real preproduction buffers
for order-0 pages through the watermarks. But for higher orders we
only ensure there is at least one page. That easily fails under even
mild competition.
Since we wanted to roll defrag_mode for THP in multi-tenant systems
anyway, we figured we might as well take the plunge now and battle
test the feature this way.
defrag_mode fixes *that* issue, by preproducing watermark buffers in
contiguous pageblocks - making everything up to that order more
readily available. I'm still hoping to make it the default eventually,
which was the plan with the original huge page allocator series. As we
keep leaning into higher order requests more and more, and especially
grow the non-optional ones, we kind of need non-optional preproduction
guarantees for higher orders as well.
But there are bugs like this one, and we're still figuring out some
overreclaim issues with it in production as well. So I'm glad it's
optional for the time being ;-)
next prev parent reply other threads:[~2026-05-26 17:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 12:22 [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations Dmitry Ilvokhin
2026-05-21 23:59 ` Andrew Morton
2026-05-22 13:05 ` Dmitry Ilvokhin
2026-05-23 2:54 ` Andrew Morton
2026-05-23 13:50 ` Dmitry Ilvokhin
2026-05-26 13:13 ` Vlastimil Babka (SUSE)
2026-05-26 17:51 ` Johannes Weiner [this message]
2026-05-27 7:10 ` Vlastimil Babka (SUSE)
2026-05-26 13:21 ` Vlastimil Babka (SUSE)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahXdrP2eNYw1jc0P@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=d@ilvokhin.com \
--cc=jackmanb@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox