From: Matthew Brost <matthew.brost@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
<intel-xe@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 1/5] mm: Introduce zone_appears_fragmented()
Date: Thu, 23 Apr 2026 15:21:47 -0700 [thread overview]
Message-ID: <aeqbe/9YRKNyYuWY@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <aepuNL2fwp47P1Wi@gsse-cloud1.jf.intel.com>
On Thu, Apr 23, 2026 at 12:08:36PM -0700, Matthew Brost wrote:
> On Thu, Apr 23, 2026 at 01:27:11PM +0200, Thomas Hellström wrote:
> > On Thu, 2026-04-23 at 12:27 +0200, David Hildenbrand (Arm) wrote:
> > > On 4/23/26 07:56, Matthew Brost wrote:
> > > > Introduce zone_appears_fragmented() as a lightweight helper to
> > > > allow
> > > > subsystems to make coarse decisions about reclaim behavior in the
> > > > presence of likely fragmentation.
> > > >
> > > > The helper implements a simple heuristic: if the number of free
> > > > pages
> > > > in a zone exceeds twice the high watermark, the zone is considered
> > > > to
> > > > have ample free memory and allocation failures are more likely due
> > > > to
> > > > fragmentation than overall memory pressure.
> > > >
> > > > This is intentionally imprecise and is not meant to replace the
> > > > core
> > > > MM compaction or fragmentation accounting logic. Instead, it
> > > > provides
> > > > a cheap signal for callers (e.g., shrinkers) that wish to avoid
> > > > overly aggressive reclaim when sufficient free memory exists but
> > > > high-order allocations may still fail.
> > > >
> > > > No functional changes; this is a preparatory helper for future
> > > > users.
> > > >
> > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: David Hildenbrand <david@kernel.org>
> > > > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > > > Cc: Vlastimil Babka <vbabka@kernel.org>
> > > > Cc: Mike Rapoport <rppt@kernel.org>
> > > > Cc: Suren Baghdasaryan <surenb@google.com>
> > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > include/linux/vmstat.h | 13 +++++++++++++
> > > > 1 file changed, 13 insertions(+)
> > > >
> > > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> > > > index 3c9c266cf782..568d9f4f1a1f 100644
> > > > --- a/include/linux/vmstat.h
> > > > +++ b/include/linux/vmstat.h
> > > > @@ -483,6 +483,19 @@ static inline const char *zone_stat_name(enum
> > > > zone_stat_item item)
> > > > return vmstat_text[item];
> > > > }
> > > >
> > > > +static inline bool zone_appears_fragmented(struct zone *zone)
> > > > +{
> > >
> > > "zone_likely_fragmented" or "zone_maybe_fragmented" might be clearer,
> > > depending
> > > on the actual semantics.
> > >
> > > > + /*
> > > > + * Simple heuristic: if the number of free pages is more
> > > > than twice the
> > > > + * high watermark, this strongly suggests that the zone is
> > > > heavily
> > > > + * fragmented when called from a shrinker.
> > > > + */
> > >
> > > I'll cc some more people. But the "when called from a shrinker" bit
> > > is
> > > concerning. Are there additional semantics that should be expressed
> > > in the
> > > function name, for example?
> > >
> > > Something that implies that this function only gives you a reasonable
> > > answer in
> > > a certain context.
> >
> > I think that test would not be relevant for cgroup-aware shrinking.
> >
> > What about trying to pass something in the struct shrink_control? Like
> > if we pass the struct scan_control's order field also in struct
>
> If the order were included in shrink_control, there is about a 95%
> certain that this change would allow TTM / Xe to break the problematic
> kswapd feedback loop. This may also better express the intent of the
> problem we are trying to fix here.
>
> For reference, the cover letter [1] details the problem.
>
> Any guidance from the core MM folks would be appreciated—would adding
> the order to shrink_control be an acceptable solution?
>
> Matt
>
> [1] https://patchwork.freedesktop.org/series/165330/
>
> > shrink_control, really expensive shrinkers could duck reclaim attempts
> > from higher-order allocations that may fail anyway:
> >
> > if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> > (sc->gfp_mask & (__GFP_NORETRY | __GFP_RETRY_MAYFAIL)) &&
> > !(sc->gfp_mask & __GFP_NOFAIL))
It doesn't look like __GFP_NORETRY, __GFP_RETRY_MAYFAIL, __GFP_NOFAIL
make it to the sc->gfp_mask flags from the caller and get into kswapd
loop...
1182 [ 394.049058] xe_shrinker_scan: no skip order=9, gfp=0x0000000000000cc0
1183 [ 394.049061] CPU: 2 UID: 0 PID: 110 Comm: kswapd0 Not tainted 7.0.0-xe+ #355 PREEMPT(full)
1184 [ 394.049062] Hardware name: Intel Corporation Panther Lake Client Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.3332.D05.2509011438 09/01/2025
1185 [ 394.049063] Call Trace:
1186 [ 394.049065] <TASK>
1187 [ 394.049066] dump_stack_lvl+0x55/0x70
1188 [ 394.049073] xe_shrinker_scan+0x274/0x280 [xe]
1189 [ 394.049181] do_shrink_slab+0x132/0x360
1190 [ 394.049184] shrink_slab+0xf0/0x3e0
1191 [ 394.049186] shrink_node+0x2bd/0x800
1192 [ 394.049188] balance_pgdat+0x323/0x760
1193 [ 394.049189] kswapd+0x1c3/0x340
1194 [ 394.049190] ? __pfx_autoremove_wake_function+0x10/0x10
1195 [ 394.049193] ? __pfx_kswapd+0x10/0x10
1196 [ 394.049194] kthread+0xdf/0x120
1197 [ 394.049196] ? __pfx_kthread+0x10/0x10
1198 [ 394.049197] ret_from_fork+0x1d0/0x220
1199 [ 394.049200] ? __pfx_kthread+0x10/0x10
1200 [ 394.049200] ret_from_fork_asm+0x1a/0x30
1201 [ 394.049202] </TASK>
Will look into if this is fixable, but again any core MM guidance would
helpful.
Matt
> > return SHRINK_STOP;
> >
> > Possibly exposed as an inline helper in the shrinker interface?
> >
> > /Thomas
> >
> >
> >
> >
prev parent reply other threads:[~2026-04-23 22:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 5:56 [PATCH v2 0/5] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation Matthew Brost
2026-04-23 5:56 ` [PATCH v2 1/5] mm: Introduce zone_appears_fragmented() Matthew Brost
2026-04-23 6:04 ` Balbir Singh
2026-04-23 6:16 ` Matthew Brost
2026-04-23 6:27 ` Matthew Brost
2026-04-23 10:27 ` David Hildenbrand (Arm)
2026-04-23 11:27 ` Thomas Hellström
2026-04-23 19:08 ` Matthew Brost
2026-04-23 22:21 ` Matthew Brost [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeqbe/9YRKNyYuWY@gsse-cloud1.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox