From: Andrea Righi <arighi@nvidia.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Ian May <ianm@nvidia.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/6] mm/numa: Introduce numa_nearest_nodemask()
Date: Mon, 10 Feb 2025 09:28:36 +0100 [thread overview]
Message-ID: <Z6m4tEoiUBNBmIjV@gpd3> (raw)
In-Reply-To: <Z6joYmcjyT8eY32H@thinkpad>
Hi Yury,
On Sun, Feb 09, 2025 at 12:40:15PM -0500, Yury Norov wrote:
> On Fri, Feb 07, 2025 at 09:40:48PM +0100, Andrea Righi wrote:
> > Introduce the new helper numa_nearest_nodemask() to find the closest
> > node, in a specified nodemask and state, from a given starting node.
> >
> > Returns MAX_NUMNODES if no node is found.
> >
> > Cc: Yury Norov <yury.norov@gmail.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> > include/linux/nodemask_types.h | 6 +++++-
> > include/linux/numa.h | 8 +++++++
> > mm/mempolicy.c | 38 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 51 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/nodemask_types.h b/include/linux/nodemask_types.h
> > index 6b28d97ea6ed0..8d0b7a66c3a49 100644
> > --- a/include/linux/nodemask_types.h
> > +++ b/include/linux/nodemask_types.h
> > @@ -5,6 +5,10 @@
> > #include <linux/bitops.h>
> > #include <linux/numa.h>
> >
> > -typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
> > +struct nodemask {
> > + DECLARE_BITMAP(bits, MAX_NUMNODES);
> > +};
> > +
> > +typedef struct nodemask nodemask_t;
> >
> > #endif /* __LINUX_NODEMASK_TYPES_H */
> > diff --git a/include/linux/numa.h b/include/linux/numa.h
> > index 3567e40329ebc..a549b87d1fca5 100644
> > --- a/include/linux/numa.h
> > +++ b/include/linux/numa.h
> > @@ -27,6 +27,8 @@ static inline bool numa_valid_node(int nid)
> > #define __initdata_or_meminfo __initdata
> > #endif
> >
> > +struct nodemask;
>
> Numa should include this via linux/nodemask_types.h, or maybe
> nodemask.h.
Hm... nodemask_types.h needs to include numa.h to resolve MAX_NUMNODES,
Maybe we can move numa_nearest_nodemask() to linux/nodemask.h?
>
> > +
> > #ifdef CONFIG_NUMA
> > #include <asm/sparsemem.h>
> >
> > @@ -38,6 +40,7 @@ void __init alloc_offline_node_data(int nid);
> >
> > /* Generic implementation available */
> > int numa_nearest_node(int node, unsigned int state);
> > +int numa_nearest_nodemask(int node, unsigned int state, struct nodemask *mask);
> >
> > #ifndef memory_add_physaddr_to_nid
> > int memory_add_physaddr_to_nid(u64 start);
> > @@ -55,6 +58,11 @@ static inline int numa_nearest_node(int node, unsigned int state)
> > return NUMA_NO_NODE;
> > }
> >
> > +static inline int numa_nearest_nodemask(int node, unsigned int state, struct nodemask *mask)
> > +{
> > + return NUMA_NO_NODE;
> > +}
> > +
> > static inline int memory_add_physaddr_to_nid(u64 start)
> > {
> > return 0;
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 162407fbf2bc7..1cfee509c7229 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -196,6 +196,44 @@ int numa_nearest_node(int node, unsigned int state)
> > }
> > EXPORT_SYMBOL_GPL(numa_nearest_node);
> >
> > +/**
> > + * numa_nearest_nodemask - Find the node in @mask at the nearest distance
> > + * from @node.
> > + *
>
> So, I have a feeling about this whole naming scheme. At first, this
> function (and the existing numa_nearest_node()) searches for something,
> but doesn't begin with find_, search_ or similar. Second, the naming
> of existing numa_nearest_node() doesn't reflect that it searches
> against the state. Should we always include some state for search? If
> so, we can skip mentioning the state, otherwise it should be in the
> name, I guess...
>
> The problem is that I have no idea for better naming, and I have no
> understanding about the future of this functions family. If it's just
> numa_nearest_node() and numa_nearest_nodemask(), I'm OK to go this
> way. If we'll add more flavors similarly to find_bit() family, we
> could probably discuss a naming scheme.
>
> Also, mm/mempolicy.c is a historical place for them, but maybe we need
> to move it somewhere else?
>
> Any thoughts appreciated.
Personally I think adding "find_" to the name would be a bit redundant, as
it seems quite obvious that it's finding the nearest node. It sounds a bit
like the get_something() discussion and we can just use something().
About adding "_state" to the name, it'd make sense since we have
for_each_node_state/for_each_node(), but we would need to change
numa_nearest_node() -> numa_nearest_node_state((), that is beyond the scope
of this patch set.
If I had to design this completely from scratch I'd probably propose
something like this:
- nearest_node_state(node, state)
- nearest_node(node) -> nearest_node_state(node, N_POSSIBLE)
- nearest_node_nodemask(node, nodemask) -> here the state can be managed
with the nodemask (as you suggested below)
But again, this is probably a more generic discussion that can be addressed
in a separate thread.
>
> > + * @node: the node to start the search from.
> > + * @state: the node state to filter nodes by.
> > + * @mask: a pointer to a nodemask representing the allowed nodes.
> > + *
> > + * This function iterates over all nodes in the given state and calculates
> > + * the distance to the starting node.
> > + *
> > + * Returns the node ID in @mask that is the closest in terms of distance
> > + * from @node, or MAX_NUMNODES if no node is found.
> > + */
> > +int numa_nearest_nodemask(int node, unsigned int state, nodemask_t *mask)
>
> Your only user calls the function with N_POSSIBLE:
>
> s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
> {
> nodemask_t unvisited = NODE_MASK_ALL;
>
> if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node))
> return pick_idle_cpu_from_node(cpus_allowed, NUMA_NO_NODE, flags);
>
>
> for_each_numa_node(node, unvisited, N_POSSIBLE)
> do_something();
> }
>
> Which means you don't need the state at all. Even more, you don't
> need to initialize the unvisited mask before checking the static
> branch:
>
> s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
> {
> nodemask_t unvisited;
>
> if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node))
> return pick_idle_cpu_from_node(cpus_allowed, NUMA_NO_NODE, flags);
>
> nodes_clear(unvisited);
>
> for_each_numa_node(node, unvisited)
> do_something();
> }
>
>
> If you need some state other than N_POSSIBLE, you can do it similarly:
>
> nodemask_complement(unvisited, N_CPU);
>
> /* Only N_CPU nodes iterated */
> for_each_numa_node(node, unvisited)
> do_something();
Good point. I think we can implicitly assume N_POSSIBLE and if we need to
filter only a certain state in the future, we can enforce that via the
nodemask.
>
>
> > +{
> > + int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES;
> > +
> > + if (node == NUMA_NO_NODE)
> > + return MAX_NUMNODES;
> > +
> > + if (node_state(node, state) && node_isset(node, *mask))
> > + return node;
>
> This is correct, but why do we need this special case? If distance to
> local node is always 0, and distance to remote node is always greater
> than 0, the normal search will return local node, right? Is that a
> performance trick? If so, can you put a comment please? Otherwise,
> maybe just drop it?
Yeah we can probably just drop it, I don't it gives much benefit in terms
of performance. And the special optimiation case of searching only in one
node can be managed already by the caller via SCX_PICK_IDLE_IN_NODE.
>
> > +
> > + for_each_node_state(n, state) {
> > + if (!node_isset(n, *mask))
> > + continue;
>
> for_each_node_state_and_mask(n, state, mask)
>
> Or if you take the above consideration, just
> for_each_node_mask(n, mask)
Ok, that's much better.
Thanks,
-Andrea
next prev parent reply other threads:[~2025-02-10 8:28 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-07 20:40 [PATCHSET v10 sched_ext/for-6.15] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2025-02-07 20:40 ` [PATCH 1/6] mm/numa: Introduce numa_nearest_nodemask() Andrea Righi
2025-02-09 17:40 ` Yury Norov
2025-02-10 8:28 ` Andrea Righi [this message]
2025-02-10 16:41 ` Yury Norov
2025-02-10 16:51 ` Andrea Righi
2025-02-07 20:40 ` [PATCH 2/6] sched/topology: Introduce for_each_numa_node() iterator Andrea Righi
2025-02-07 21:46 ` Tejun Heo
2025-02-07 21:55 ` Andrea Righi
2025-02-07 21:56 ` Tejun Heo
2025-02-09 17:51 ` Yury Norov
2025-02-09 17:50 ` Yury Norov
2025-02-07 20:40 ` [PATCH 3/6] sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE Andrea Righi
2025-02-07 20:40 ` [PATCH 4/6] sched_ext: idle: introduce SCX_PICK_IDLE_IN_NODE Andrea Righi
2025-02-07 22:02 ` Tejun Heo
2025-02-07 20:40 ` [PATCH 5/6] sched_ext: idle: Per-node idle cpumasks Andrea Righi
2025-02-07 22:30 ` Tejun Heo
2025-02-08 8:47 ` Andrea Righi
2025-02-09 18:07 ` Yury Norov
2025-02-10 16:57 ` Yury Norov
2025-02-11 7:32 ` Andrea Righi
2025-02-11 7:41 ` Andrea Righi
2025-02-11 9:50 ` Andrea Righi
2025-02-11 14:19 ` Yury Norov
2025-02-11 14:34 ` Andrea Righi
2025-02-11 14:45 ` Andrea Righi
2025-02-11 16:38 ` Steven Rostedt
2025-02-11 18:05 ` Andrea Righi
2025-02-07 20:40 ` [PATCH 6/6] sched_ext: idle: Introduce node-aware idle cpu kfunc helpers Andrea Righi
2025-02-07 22:39 ` Tejun Heo
2025-02-08 9:19 ` Andrea Righi
2025-02-09 6:31 ` Tejun Heo
2025-02-09 8:11 ` Andrea Righi
2025-02-10 6:01 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6m4tEoiUBNBmIjV@gpd3 \
--to=arighi@nvidia.com \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=ianm@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox