From: Bing Jiao <bingjiao@google.com>
To: Gregory Price <gourry@gourry.net>
Cc: linux-mm@kvack.org, "Waiman Long" <longman@redhat.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Andrew Morton" <akpm@linux-foundation.org>,
"David Hildenbrand" <david@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
"Qi Zheng" <zhengqi.arch@bytedance.com>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/vmscan: respect mems_effective in demote_folio_list()
Date: Mon, 22 Dec 2025 06:28:11 +0000 [thread overview]
Message-ID: <aUjk-yVw8ddRgZyN@google.com> (raw)
In-Reply-To: <aUfi9gn5HS4u4ShU@gourry-fedora-PF4VCD3F>
On Sun, Dec 21, 2025 at 07:07:18AM -0500, Gregory Price wrote:
>
> I think this patch can be done without as many changes as proposed here.
>
> > -bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid);
> > +void mem_cgroup_node_allowed(struct mem_cgroup *memcg, nodemask_t *nodes);
>
> > -static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid)
> > +static inline void mem_cgroup_node_allowed(struct mem_cgroup *memcg,
>
> > -int next_demotion_node(int node);
> > +int next_demotion_node(int node, nodemask_t *mask);
>
> > -bool cpuset_node_allowed(struct cgroup *cgroup, int nid)
> > +void cpuset_node_allowed(struct cgroup *cgroup, nodemask_t *nodes)
>
> These are some fairly major contract changes, and the names don't make
> much sense as a result.
>
> Would be better to just make something like
>
> /* Filter the given nmask based on cpuset.mems.allowed */
> mem_cgroup_filter_mems_allowed(memg, nmask);
>
> (or some other, better name)
>
> separate of the existing interfaces, and operate on one scratch-mask if
> possible.
>
Hi Gregory, thank you for the review and suggestions.
I have divided these changes into 2 patches based on your suggestions.
Since mem_cgroup_node_allowed() and cpuset_node_allowed() are dangling,
they are removed in v2 2/2.
> > +static int get_demotion_targets(nodemask_t *targets, struct pglist_data *pgdat,
> > + struct mem_cgroup *memcg)
> > +{
> > + nodemask_t allowed_mask;
> > + nodemask_t preferred_mask;
> > + int preferred_node;
> > +
> > + if (!pgdat)
> > + return NUMA_NO_NODE;
> > +
> > + preferred_node = next_demotion_node(pgdat->node_id, &preferred_mask);
> > + if (preferred_node == NUMA_NO_NODE)
> > + return NUMA_NO_NODE;
> > +
> > + node_get_allowed_targets(pgdat, &allowed_mask);
> > + mem_cgroup_node_allowed(memcg, &allowed_mask);
> > + if (nodes_empty(allowed_mask))
> > + return NUMA_NO_NODE;
> > +
> > + if (targets)
> > + nodes_copy(*targets, allowed_mask);
> > +
> > + do {
> > + if (node_isset(preferred_node, allowed_mask))
> > + return preferred_node;
> > +
> > + nodes_and(preferred_mask, preferred_mask, allowed_mask);
> > + if (!nodes_empty(preferred_mask))
> > + return node_random(&preferred_mask);
> > +
> > + /*
> > + * Hop to the next tier of preferred nodes. Even if
> > + * preferred_node is not set in allowed_mask, still can use it
> > + * to query the nest-best demotion nodes.
> > + */
> > + preferred_node = next_demotion_node(preferred_node,
> > + &preferred_mask);
> > + } while (preferred_node != NUMA_NO_NODE);
> > +
>
> What you're implementing here is effectively a new feature - allowing
> demotion to jump nodes rather than just target the next demotion node.
>
> This is nice, but it should be a separate patch proposal (I think Andrew
> said something as much already) - not as part of a fix.
>
Thanks for the suggestion.
I sent a v2 patch series for fixes and backport. This function (jump
node) will be sent in another thread for distinguishing between fixes
and features.
> > + /*
> > + * Should not reach here, as a non-empty allowed_mask ensures
> > + * there must have a target node for demotion.
>
> Does it? What if preferred_node is online when calling
> next_demotion_node(), but then is offline when
> node_get_allowed_targets() is called?
>
>
> > + * Otherwise, it suggests something wrong in node_demotion[]->preferred,
> > + * where the same-tier nodes have different preferred targets.
> > + * E.g., if node 0 identifies both nodes 2 and 3 as preferred targets,
> > + * but nodes 2 and 3 themselves have different preferred nodes.
> > + */
> > + WARN_ON_ONCE(1);
> > + return node_random(&allowed_mask);
>
> Just returning a random allowed node seems like an objectively poor
> result and we should just not demote if we reach this condition. It
> likesly means hotplug was happening and node states changed.
>
> > @@ -1041,10 +1090,10 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> > if (list_empty(demote_folios))
> > return 0;
> >
> > + target_nid = get_demotion_targets(&allowed_mask, pgdat, memcg);
> > if (target_nid == NUMA_NO_NODE)
> > return 0;
> > -
> > - node_get_allowed_targets(pgdat, &allowed_mask);
>
> in the immediate fixup patch, it seems more expedient to just add the
> function i described above
>
> /* Filter the given nmask based on cpuset.mems.allowed */
> mem_cgroup_filter_mems_allowed(memg, nmask);
>
> and then add that immediate after the node_get_allowed_targets() call.
>
> Then come back around afterwards to add the tier/node-skip functionality
> from above in a separate feature patch.
>
> ~Gregory
>
Thanks for the hit. I had never considered hot-swapping before.
> ---
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 670fe9fae5ba..1971a8d9475b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1046,6 +1046,11 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
>
> node_get_allowed_targets(pgdat, &allowed_mask);
>
> + /* Filter based on mems_allowed, fail if the result is empty */
> + mem_cgroup_filter_nodemask(memcg, &allowed_mask);
> + if (nodes_empty(allowed_mask))
> + return 0;
> +
> /* Demotion ignores all cpuset and mempolicy settings */
> migrate_pages(demote_folios, alloc_demote_folio, NULL,
> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
>
>
Thanks for the code. My v2 1/2 is based on your suggestion with
some changes.
Best,
Bing
next prev parent reply other threads:[~2025-12-22 6:28 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-20 6:10 [PATCH] mm/vmscan: respect mems_effective in demote_folio_list() Bing Jiao
2025-12-20 19:20 ` Andrew Morton
2025-12-22 6:16 ` Bing Jiao
2025-12-21 12:07 ` Gregory Price
2025-12-22 6:28 ` Bing Jiao [this message]
2025-12-21 23:36 ` [PATCH v2 0/2] fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-21 23:36 ` [PATCH v2 1/2] mm/vmscan: respect mems_effective in demote_folio_list() Bing Jiao
2025-12-21 23:41 ` kernel test robot
2025-12-22 2:38 ` Chen Ridong
2025-12-22 21:56 ` kernel test robot
2025-12-22 22:18 ` kernel test robot
2025-12-21 23:36 ` [PATCH v2 2/2] mm/vmscan: check all allowed targets in can_demote() Bing Jiao
2025-12-22 2:51 ` Chen Ridong
2025-12-22 6:09 ` Bing Jiao
2025-12-22 8:28 ` Chen Ridong
2025-12-23 21:19 ` [PATCH v3] mm/vmscan: fix demotion targets checks in reclaim/demotion Bing Jiao
2025-12-23 21:38 ` Bing Jiao
2025-12-24 1:19 ` Gregory Price
2025-12-26 18:48 ` Bing Jiao
2026-01-05 21:57 ` Bing Jiao
2025-12-24 1:49 ` Chen Ridong
2025-12-26 18:58 ` Bing Jiao
2025-12-26 19:32 ` Waiman Long
2025-12-26 20:24 ` Waiman Long
2026-01-04 9:04 ` Bing Jiao
2026-01-04 8:54 ` [PATCH v4] " Bing Jiao
2026-01-04 18:27 ` Andrew Morton
2026-01-05 5:08 ` Bing Jiao
2026-01-05 2:48 ` Chen Ridong
2026-01-05 5:10 ` Bing Jiao
2026-01-05 5:01 ` [PATCH v5] " Bing Jiao
2026-01-05 15:54 ` Gregory Price
2026-01-05 21:34 ` Bing Jiao
2026-01-06 7:56 ` [PATCH v6] " Bing Jiao
2026-01-06 14:23 ` Gregory Price
2026-01-06 19:36 ` Andrew Morton
2026-01-07 1:27 ` Chen Ridong
2026-01-08 3:32 ` [PATCH v7 0/2] " Bing Jiao
2026-01-08 3:32 ` [PATCH v7 1/2] " Bing Jiao
2026-01-08 3:32 ` [PATCH v7 2/2] mm/vmscan: select the closest preferred node in demote_folio_list() Bing Jiao
2026-01-10 3:00 ` [PATCH] mm/vmscan: fix uninitialized variable " Bing Jiao
2026-01-10 3:38 ` Bing Jiao
2026-01-14 6:59 ` [PATCH v8 0/2] mm/vmscan: select the closest preferred node " Bing Jiao
2026-01-14 6:59 ` [PATCH v8 2/2] " Bing Jiao
2026-01-14 20:53 ` [PATCH v9 0/2] mm/vmscan: fix demotion targets checks in reclaim/demotion Bing Jiao
2026-01-14 20:53 ` [PATCH v9 1/2] " Bing Jiao
2026-02-02 4:15 ` Shakeel Butt
2026-01-14 20:53 ` [PATCH v9 2/2] mm/vmscan: select the closest perferred node in demote_folio_list() Bing Jiao
2026-02-06 18:52 ` Shakeel Butt
2026-01-16 0:00 ` [PATCH v9 0/2] mm/vmscan: fix demotion targets checks in reclaim/demotion Andrew Morton
2026-01-16 7:00 ` Bing Jiao
2026-01-30 23:35 ` Shakeel Butt
2026-01-31 23:58 ` Bing Jiao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUjk-yVw8ddRgZyN@google.com \
--to=bingjiao@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.