From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 1/5] mm: Introduce zone_appears_fragmented()
Date: Fri, 24 Apr 2026 09:05:16 +0200 [thread overview]
Message-ID: <291406b26b8badf2e565996515931d9ebe50208f.camel@linux.intel.com> (raw)
In-Reply-To: <aeqbe/9YRKNyYuWY@gsse-cloud1.jf.intel.com>
On Thu, 2026-04-23 at 15:21 -0700, Matthew Brost wrote:
> On Thu, Apr 23, 2026 at 12:08:36PM -0700, Matthew Brost wrote:
> > On Thu, Apr 23, 2026 at 01:27:11PM +0200, Thomas Hellström wrote:
> > > On Thu, 2026-04-23 at 12:27 +0200, David Hildenbrand (Arm) wrote:
> > > > On 4/23/26 07:56, Matthew Brost wrote:
> > > > > Introduce zone_appears_fragmented() as a lightweight helper
> > > > > to
> > > > > allow
> > > > > subsystems to make coarse decisions about reclaim behavior in
> > > > > the
> > > > > presence of likely fragmentation.
> > > > >
> > > > > The helper implements a simple heuristic: if the number of
> > > > > free
> > > > > pages
> > > > > in a zone exceeds twice the high watermark, the zone is
> > > > > considered
> > > > > to
> > > > > have ample free memory and allocation failures are more
> > > > > likely due
> > > > > to
> > > > > fragmentation than overall memory pressure.
> > > > >
> > > > > This is intentionally imprecise and is not meant to replace
> > > > > the
> > > > > core
> > > > > MM compaction or fragmentation accounting logic. Instead, it
> > > > > provides
> > > > > a cheap signal for callers (e.g., shrinkers) that wish to
> > > > > avoid
> > > > > overly aggressive reclaim when sufficient free memory exists
> > > > > but
> > > > > high-order allocations may still fail.
> > > > >
> > > > > No functional changes; this is a preparatory helper for
> > > > > future
> > > > > users.
> > > > >
> > > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > > Cc: David Hildenbrand <david@kernel.org>
> > > > > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > > > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > > > > Cc: Vlastimil Babka <vbabka@kernel.org>
> > > > > Cc: Mike Rapoport <rppt@kernel.org>
> > > > > Cc: Suren Baghdasaryan <surenb@google.com>
> > > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > > Cc: linux-mm@kvack.org
> > > > > Cc: linux-kernel@vger.kernel.org
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > include/linux/vmstat.h | 13 +++++++++++++
> > > > > 1 file changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> > > > > index 3c9c266cf782..568d9f4f1a1f 100644
> > > > > --- a/include/linux/vmstat.h
> > > > > +++ b/include/linux/vmstat.h
> > > > > @@ -483,6 +483,19 @@ static inline const char
> > > > > *zone_stat_name(enum
> > > > > zone_stat_item item)
> > > > > return vmstat_text[item];
> > > > > }
> > > > >
> > > > > +static inline bool zone_appears_fragmented(struct zone
> > > > > *zone)
> > > > > +{
> > > >
> > > > "zone_likely_fragmented" or "zone_maybe_fragmented" might be
> > > > clearer,
> > > > depending
> > > > on the actual semantics.
> > > >
> > > > > + /*
> > > > > + * Simple heuristic: if the number of free pages is
> > > > > more
> > > > > than twice the
> > > > > + * high watermark, this strongly suggests that the
> > > > > zone is
> > > > > heavily
> > > > > + * fragmented when called from a shrinker.
> > > > > + */
> > > >
> > > > I'll cc some more people. But the "when called from a shrinker"
> > > > bit
> > > > is
> > > > concerning. Are there additional semantics that should be
> > > > expressed
> > > > in the
> > > > function name, for example?
> > > >
> > > > Something that implies that this function only gives you a
> > > > reasonable
> > > > answer in
> > > > a certain context.
> > >
> > > I think that test would not be relevant for cgroup-aware
> > > shrinking.
> > >
> > > What about trying to pass something in the struct shrink_control?
> > > Like
> > > if we pass the struct scan_control's order field also in struct
> >
> > If the order were included in shrink_control, there is about a 95%
> > certain that this change would allow TTM / Xe to break the
> > problematic
> > kswapd feedback loop. This may also better express the intent of
> > the
> > problem we are trying to fix here.
> >
> > For reference, the cover letter [1] details the problem.
> >
> > Any guidance from the core MM folks would be appreciated—would
> > adding
> > the order to shrink_control be an acceptable solution?
> >
> > Matt
> >
> > [1] https://patchwork.freedesktop.org/series/165330/
> >
> > > shrink_control, really expensive shrinkers could duck reclaim
> > > attempts
> > > from higher-order allocations that may fail anyway:
> > >
> > > if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> > > (sc->gfp_mask & (__GFP_NORETRY | __GFP_RETRY_MAYFAIL))
> > > &&
> > > !(sc->gfp_mask & __GFP_NOFAIL))
>
> It doesn't look like __GFP_NORETRY, __GFP_RETRY_MAYFAIL, __GFP_NOFAIL
> make it to the sc->gfp_mask flags from the caller and get into kswapd
> loop...
Perhaps that's because they mostly (only?) make sense from direct
reclaim? Looks like the trace is from kswapd.
Another metric to weigh in is perhaps the scan_control::priority field.
From my understanding it is progressively decreased towards 0 with 0
indicating most urgent shrinking.
Thanks,
Thomas
>
> 1182 [ 394.049058] xe_shrinker_scan: no skip order=9,
> gfp=0x0000000000000cc0
> 1183 [ 394.049061] CPU: 2 UID: 0 PID: 110 Comm: kswapd0 Not tainted
> 7.0.0-xe+ #355 PREEMPT(full)
> 1184 [ 394.049062] Hardware name: Intel Corporation Panther Lake
> Client Platform/PTL-UH LP5 T3 RVP1, BIOS
> PTLPFWI1.R00.3332.D05.2509011438 09/01/2025
> 1185 [ 394.049063] Call Trace:
> 1186 [ 394.049065] <TASK>
> 1187 [ 394.049066] dump_stack_lvl+0x55/0x70
> 1188 [ 394.049073] xe_shrinker_scan+0x274/0x280 [xe]
> 1189 [ 394.049181] do_shrink_slab+0x132/0x360
> 1190 [ 394.049184] shrink_slab+0xf0/0x3e0
> 1191 [ 394.049186] shrink_node+0x2bd/0x800
> 1192 [ 394.049188] balance_pgdat+0x323/0x760
> 1193 [ 394.049189] kswapd+0x1c3/0x340
> 1194 [ 394.049190] ? __pfx_autoremove_wake_function+0x10/0x10
> 1195 [ 394.049193] ? __pfx_kswapd+0x10/0x10
> 1196 [ 394.049194] kthread+0xdf/0x120
> 1197 [ 394.049196] ? __pfx_kthread+0x10/0x10
> 1198 [ 394.049197] ret_from_fork+0x1d0/0x220
> 1199 [ 394.049200] ? __pfx_kthread+0x10/0x10
> 1200 [ 394.049200] ret_from_fork_asm+0x1a/0x30
> 1201 [ 394.049202] </TASK>
>
> Will look into if this is fixable, but again any core MM guidance
> would
> helpful.
>
> Matt
>
> > > return SHRINK_STOP;
> > >
> > > Possibly exposed as an inline helper in the shrinker interface?
> > >
> > > /Thomas
> > >
> > >
> > >
> > >
next prev parent reply other threads:[~2026-04-24 7:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 5:56 [PATCH v2 0/5] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation Matthew Brost
2026-04-23 5:56 ` [PATCH v2 1/5] mm: Introduce zone_appears_fragmented() Matthew Brost
2026-04-23 6:04 ` Balbir Singh
2026-04-23 6:16 ` Matthew Brost
2026-04-23 6:27 ` Matthew Brost
2026-04-23 10:27 ` David Hildenbrand (Arm)
2026-04-23 11:27 ` Thomas Hellström
2026-04-23 19:08 ` Matthew Brost
2026-04-23 22:21 ` Matthew Brost
2026-04-24 7:05 ` Thomas Hellström [this message]
2026-04-24 7:26 ` David Hildenbrand (Arm)
2026-04-30 2:47 ` Matthew Brost
2026-04-30 7:47 ` Thomas Hellström
2026-04-28 9:51 ` Andi Shyti
2026-04-28 10:05 ` Andi Shyti
2026-04-30 2:34 ` Matthew Brost
2026-04-30 2:37 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=291406b26b8badf2e565996515931d9ebe50208f.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox