From: Yury Norov <yury.norov@gmail.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Joel Fernandes <joel@joelfernandes.org>,
Ian May <ianm@nvidia.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask()
Date: Thu, 13 Feb 2025 12:12:46 -0500 [thread overview]
Message-ID: <Z64oDlh9vzvRYziL@thinkpad> (raw)
In-Reply-To: <Z64brsSMAR7cLPUU@gpd3>
On Thu, Feb 13, 2025 at 05:19:58PM +0100, Andrea Righi wrote:
> On Thu, Feb 13, 2025 at 10:57:00AM -0500, Yury Norov wrote:
> > On Wed, Feb 12, 2025 at 05:48:09PM +0100, Andrea Righi wrote:
> > > Introduce the new helper nearest_node_nodemask() to find the closest
> > > node in a specified nodemask from a given starting node.
> > >
> > > Returns MAX_NUMNODES if no node is found.
> > >
> > > Cc: Yury Norov <yury.norov@gmail.com>
> > > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> >
> > Suggested-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
>
> Ok.
>
> >
> > > ---
> > > include/linux/numa.h | 7 +++++++
> > > mm/mempolicy.c | 32 ++++++++++++++++++++++++++++++++
> > > 2 files changed, 39 insertions(+)
> > >
> > > diff --git a/include/linux/numa.h b/include/linux/numa.h
> > > index 31d8bf8a951a7..e6baaf6051bcf 100644
> > > --- a/include/linux/numa.h
> > > +++ b/include/linux/numa.h
> > > @@ -31,6 +31,8 @@ void __init alloc_offline_node_data(int nid);
> > > /* Generic implementation available */
> > > int numa_nearest_node(int node, unsigned int state);
> > >
> > > +int nearest_node_nodemask(int node, nodemask_t *mask);
> > > +
> >
> > See how you use it. It looks a bit inconsistent to the other functions:
> >
> > #define for_each_node_numadist(node, unvisited) \
> > for (int start = (node), \
> > node = nearest_node_nodemask((start), &(unvisited)); \
> > node < MAX_NUMNODES; \
> > node_clear(node, (unvisited)), \
> > node = nearest_node_nodemask((start), &(unvisited)))
> >
> >
> > I would suggest to make it aligned with the rest of the API:
> >
> > #define node_clear(node, dst) __node_clear((node), &(dst))
> > static __always_inline void __node_clear(int node, volatile nodemask_t *dstp)
> > {
> > clear_bit(node, dstp->bits);
> > }
>
> Sorry Yury, can you elaborate more on this? What do you mean with
> inconsistent, is it the volatile nodemask_t *?
What I mean is:
#define nearest_node_nodemask(start, srcp)
__nearest_node_nodemask((start), &(srcp))
int __nearest_node_nodemask(int node, nodemask_t *mask);
That way you'll be able to make the above for-loop looking more
uniform:
#define for_each_node_numadist(node, unvisited) \
for (int __s = (node), \
(node) = nearest_node_nodemask(__s, (unvisited)); \
(node) < MAX_NUMNODES; \
node_clear((node), (unvisited)), \
(node) = nearest_node_nodemask(__s, (unvisited)))
> > > #ifndef memory_add_physaddr_to_nid
> > > int memory_add_physaddr_to_nid(u64 start);
> > > #endif
> > > @@ -47,6 +49,11 @@ static inline int numa_nearest_node(int node, unsigned int state)
> > > return NUMA_NO_NODE;
> > > }
> > >
> > > +static inline int nearest_node_nodemask(int node, nodemask_t *mask)
> > > +{
> > > + return NUMA_NO_NODE;
> > > +}
> > > +
> > > static inline int memory_add_physaddr_to_nid(u64 start)
> > > {
> > > return 0;
> > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > index 162407fbf2bc7..1e2acf187ea3a 100644
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -196,6 +196,38 @@ int numa_nearest_node(int node, unsigned int state)
> > > }
> > > EXPORT_SYMBOL_GPL(numa_nearest_node);
> > >
> > > +/**
> > > + * nearest_node_nodemask - Find the node in @mask at the nearest distance
> > > + * from @node.
> > > + *
> > > + * @node: the node to start the search from.
> > > + * @mask: a pointer to a nodemask representing the allowed nodes.
> > > + *
> > > + * This function iterates over all nodes in the given state and calculates
> > > + * the distance to the starting node.
> > > + *
> > > + * Returns the node ID in @mask that is the closest in terms of distance
> > > + * from @node, or MAX_NUMNODES if no node is found.
> > > + */
> > > +int nearest_node_nodemask(int node, nodemask_t *mask)
> > > +{
> > > + int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES;
> > > +
> > > + if (node == NUMA_NO_NODE)
> > > + return MAX_NUMNODES;
> >
> > This makes it unclear: you make it legal to pass NUMA_NO_NODE, but
> > your function returns something useless. I don't think it would help
> > users in any reasonable scenario.
> >
> > So, if you don't want user to call this with node == NUMA_NO_NODE,
> > just describe it in comment on top of the function. Otherwise, please
> > do something useful like
> >
> > if (node == NUMA_NO_NODE)
> > node = current_node;
> >
> > I would go with option 1. Notice, node_distance() doesn't bother to
> > check against NUMA_NO_NODE.
>
> Hm... is it? Looking at __node_distance(), it doesn't seem really safe to
> pass a negative value (maybe I'm missing something?).
It's not safe, but inside the kernel we don't check parameters. Out of
your courtesy you may decide to put a comment, but strictly speaking you
don't have to.
> Anyway, I'd also prefer to go with option 1 and not implicitly assuming
> NUMA_NO_NODE == current node (it feels that it might hide nasty bugs).
Yeah, very true
> So, I can add a comment in the description to clarify that NUMA_NO_NODE is
> forbidenx, but what is someone is passing it? Should we WARN_ON_ONCE() at
> least?
He will brick his testing board, and learn to read comments in a hard
way.
Speaking more seriously, you will be most likely CCed as an author of
that function, and you will be able to comment that on review. Also,
there's a great chance that it will be caught by KASAN or some other
sanitation tool even before someone sends a buggy patch.
This is an old as the world and very well known problem, and everyone
is aware.
Thanks,
Yury
next prev parent reply other threads:[~2025-02-13 17:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-12 16:48 [PATCHSET v11 sched_ext/for-6.15] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2025-02-12 16:48 ` [PATCH 1/7] nodemask: numa: reorganize inclusion path Andrea Righi
2025-02-13 15:29 ` Yury Norov
2025-02-13 15:59 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask() Andrea Righi
2025-02-13 15:57 ` Yury Norov
2025-02-13 16:19 ` Andrea Righi
2025-02-13 17:12 ` Yury Norov [this message]
2025-02-14 8:55 ` Andrea Righi
2025-02-14 16:04 ` Yury Norov
2025-02-12 16:48 ` [PATCH 3/7] sched/topology: Introduce for_each_node_numadist() iterator Andrea Righi
2025-02-13 16:02 ` Yury Norov
2025-02-13 16:32 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 4/7] sched_ext: idle: Make idle static keys private Andrea Righi
2025-02-12 16:48 ` [PATCH 5/7] sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE Andrea Righi
2025-02-13 16:08 ` Yury Norov
2025-02-13 16:22 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 6/7] sched_ext: idle: Per-node idle cpumasks Andrea Righi
2025-02-13 10:57 ` kernel test robot
2025-02-13 18:03 ` Yury Norov
2025-02-12 16:48 ` [PATCH 7/7] sched_ext: idle: Introduce node-aware idle cpu kfunc helpers Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z64oDlh9vzvRYziL@thinkpad \
--to=yury.norov@gmail.com \
--cc=arighi@nvidia.com \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=ianm@nvidia.com \
--cc=joel@joelfernandes.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox