Re: [PATCH v3 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.com>
To: Gregory Price <gourry@gourry.net>
Cc: Akinobu Mita <akinobu.mita@gmail.com>,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
	hannes@cmpxchg.org, david@kernel.org, zhengqi.arch@bytedance.com,
	shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, bingjiao@google.com
Subject: Re: [PATCH v3 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier
Date: Wed, 28 Jan 2026 10:56:44 +0100	[thread overview]
Message-ID: <aXndXPMFK2fhLA4p@tiehlicka> (raw)
In-Reply-To: <aXkfBF5bdnTZ7t7e@gourry-fedora-PF4VCD3F>

On Tue 27-01-26 15:24:36, Gregory Price wrote:
> On Sat, Jan 10, 2026 at 10:55:02PM +0900, Akinobu Mita wrote:
> > 2026年1月10日(土) 1:08 Gregory Price <gourry@gourry.net>:
> > >
> > > > +     for_each_node_mask(nid, allowed_mask) {
> > > > +             int z;
> > > > +             struct zone *zone;
> > > > +             struct pglist_data *pgdat = NODE_DATA(nid);
> > > > +
> > > > +             for_each_managed_zone_pgdat(zone, pgdat, z, MAX_NR_ZONES - 1) {
> > > > +                     if (zone_watermark_ok(zone, 0, min_wmark_pages(zone),
> > > > +                                             ZONE_MOVABLE, 0))
> > >
> > > Why does this only check zone movable?
> > 
> > Here, zone_watermark_ok() checks the free memory for all zones from 0 to
> > MAX_NR_ZONES - 1.
> > There is no strong reason to pass ZONE_MOVABLE as the highest_zoneidx
> > argument every time zone_watermark_ok() is called; I can change it if an
> > appropriate value is found.
> > In v1, highest_zoneidx was "sc ? sc->reclaim_idx : MAX_NR_ZONES - 1"
> > 
> > > Also, would this also limit pressure-signal to invoke reclaim when
> > > there is still swap space available?  Should demotion not be a pressure
> > > source for triggering harder reclaim?
> > 
> > Since can_reclaim_anon_pages() checks whether there is free space on the swap
> > device before checking with can_demote(), I think the negative impact of this
> > change will be small. However, since I have not been able to confirm the
> > behavior when a swap device is available, I would like to correctly understand
> > the impact.
> 
> Something else is going on here
> 
> See demote_folio_list and alloc_demote_folio
> 
> static unsigned int demote_folio_list(struct list_head *demote_folios,
>                                       struct pglist_data *pgdat,
>                                       struct mem_cgroup *memcg)
> {
>         struct migration_target_control mtc = {
>                  */
>                 .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
>                         __GFP_NOMEMALLOC | GFP_NOWAIT,
>         };
> }
> 
> static struct folio *alloc_demote_folio(struct folio *src,
>                 unsigned long private)
> {
> 	/* Only attempt to demote to the preferred node */
>         mtc->nmask = NULL;
>         mtc->gfp_mask |= __GFP_THISNODE;
>         dst = alloc_migration_target(src, (unsigned long)mtc);
>         if (dst)
>                 return dst;
> 
> 	/* Now attempt to demote to any node in the lower tier */
>         mtc->gfp_mask &= ~__GFP_THISNODE;
>         mtc->nmask = allowed_mask;
>         return alloc_migration_target(src, (unsigned long)mtc);
> }
> 
> 
> /*
> * %__GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd reclaim.
> */
> 
> 
> You basically shouldn't be hitting any reclaim behavior at all, and if

This will trigger kswapd so there will be background reclaim demoting
from those lower tiers.

> the target nodes are actually under various watermarks, you should be
> getting allocation failures and quick-outs from the demotion logic.
> 
> i.e. you should be seeing OOM happen
> 
> When I dug in far enough I found this:
> 
> static struct folio *alloc_demote_folio(struct folio *src,
>                 unsigned long private)
> {
> ...
>         dst = alloc_migration_target(src, (unsigned long)mtc);
> }
> 
> struct folio *alloc_migration_target(struct folio *src, unsigned long private)
> {
>         
> ...
>         if (folio_test_hugetlb(src)) {
>                 struct hstate *h = folio_hstate(src);
> 
>                 gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
>                 return alloc_hugetlb_folio_nodemask(h, nid, ...)
> 	}
> }
> 
> static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> {
>         gfp_t modified_mask = htlb_alloc_mask(h);
> 
>         /* Some callers might want to enforce node */
>         modified_mask |= (gfp_mask & __GFP_THISNODE);
> 
>         modified_mask |= (gfp_mask & __GFP_NOWARN);
> 
>         return modified_mask;
> }
> 
> /* Movability of hugepages depends on migration support. */
> static inline gfp_t htlb_alloc_mask(struct hstate *h)
> {
>         gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
> 
>         gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
> 
>         return gfp;
> }
> 
> #define GFP_USER        (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
> #define GFP_HIGHUSER    (GFP_USER | __GFP_HIGHMEM)
> #define GFP_HIGHUSER_MOVABLE    (GFP_HIGHUSER | __GFP_MOVABLE | __GFP_SKIP_KASAN)
> 
> 
> If we try to move a hugepage, we start including __GFP_RECLAIM again -
> regardless of whether HIGHUSER_MOVABLE or HIGHUSER is used.
> 
> 
> Any chance you are using hugetlb on this system?  This looks like a
> clear bug, but it may not be what you're experiencing.

Hugetlb pages are not sitting on LRU lists so they are not participating
in the demotion.

Or maybe I missed your point.
-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2026-01-28  9:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-08 10:15 [PATCH v3 0/3] mm: fix oom-killer not being invoked when demotion is enabled Akinobu Mita
2026-01-08 10:15 ` [PATCH v3 1/3] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes Akinobu Mita
2026-01-08 15:47   ` Jonathan Cameron
2026-01-10  3:47     ` Akinobu Mita
2026-01-09  4:43   ` Pratyush Brahma
2026-01-10  4:03     ` Akinobu Mita
2026-01-08 10:15 ` [PATCH v3 2/3] mm: numa_emu: add document for NUMA emulation Akinobu Mita
2026-01-08 15:51   ` Jonathan Cameron
2026-01-08 10:15 ` [PATCH v3 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Akinobu Mita
2026-01-08 19:00   ` Andrew Morton
2026-01-09 16:07   ` Gregory Price
2026-01-10 13:55     ` Akinobu Mita
2026-01-27 20:24       ` Gregory Price
2026-01-27 23:28         ` Bing Jiao
2026-01-27 23:43           ` Gregory Price
2026-01-28  9:56         ` Michal Hocko [this message]
2026-01-28 14:21           ` Gregory Price
2026-01-28 21:14             ` Michal Hocko
2026-01-29  0:44         ` Akinobu Mita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXndXPMFK2fhLA4p@tiehlicka \
    --to=mhocko@suse.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akinobu.mita@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bingjiao@google.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.