From: Andrea Righi <arighi@nvidia.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Joel Fernandes <joel@joelfernandes.org>,
Ian May <ianm@nvidia.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask()
Date: Fri, 14 Feb 2025 09:55:25 +0100 [thread overview]
Message-ID: <Z68E_ar8l7vNOxgh@gpd3> (raw)
In-Reply-To: <Z64oDlh9vzvRYziL@thinkpad>
Hi Yury,
On Thu, Feb 13, 2025 at 12:12:46PM -0500, Yury Norov wrote:
...
> > > > include/linux/numa.h | 7 +++++++
> > > > mm/mempolicy.c | 32 ++++++++++++++++++++++++++++++++
> > > > 2 files changed, 39 insertions(+)
> > > >
> > > > diff --git a/include/linux/numa.h b/include/linux/numa.h
> > > > index 31d8bf8a951a7..e6baaf6051bcf 100644
> > > > --- a/include/linux/numa.h
> > > > +++ b/include/linux/numa.h
> > > > @@ -31,6 +31,8 @@ void __init alloc_offline_node_data(int nid);
> > > > /* Generic implementation available */
> > > > int numa_nearest_node(int node, unsigned int state);
> > > >
> > > > +int nearest_node_nodemask(int node, nodemask_t *mask);
> > > > +
> > >
> > > See how you use it. It looks a bit inconsistent to the other functions:
> > >
> > > #define for_each_node_numadist(node, unvisited) \
> > > for (int start = (node), \
> > > node = nearest_node_nodemask((start), &(unvisited)); \
> > > node < MAX_NUMNODES; \
> > > node_clear(node, (unvisited)), \
> > > node = nearest_node_nodemask((start), &(unvisited)))
> > >
> > >
> > > I would suggest to make it aligned with the rest of the API:
> > >
> > > #define node_clear(node, dst) __node_clear((node), &(dst))
> > > static __always_inline void __node_clear(int node, volatile nodemask_t *dstp)
> > > {
> > > clear_bit(node, dstp->bits);
> > > }
> >
> > Sorry Yury, can you elaborate more on this? What do you mean with
> > inconsistent, is it the volatile nodemask_t *?
>
> What I mean is:
> #define nearest_node_nodemask(start, srcp)
> __nearest_node_nodemask((start), &(srcp))
> int __nearest_node_nodemask(int node, nodemask_t *mask);
This all makes sense assuming that nearest_node_nodemask() is placed in
include/linux/nodemask.h and is considered as a nodemask API, but I thought
we determined to place it in include/linux/numa.h, since it seems more of a
NUMA API, similar to numa_nearest_node(), so under this assumption I was
planning to follow the same style of numa_nearest_node().
Or do you think it should go in linux/nodemask.h and follow the style of
the other nodemask APIs?
>
> That way you'll be able to make the above for-loop looking more
> uniform:
>
> #define for_each_node_numadist(node, unvisited) \
> for (int __s = (node), \
> (node) = nearest_node_nodemask(__s, (unvisited)); \
> (node) < MAX_NUMNODES; \
> node_clear((node), (unvisited)), \
> (node) = nearest_node_nodemask(__s, (unvisited)))
>
> > > > #ifndef memory_add_physaddr_to_nid
> > > > int memory_add_physaddr_to_nid(u64 start);
> > > > #endif
> > > > @@ -47,6 +49,11 @@ static inline int numa_nearest_node(int node, unsigned int state)
> > > > return NUMA_NO_NODE;
> > > > }
> > > >
> > > > +static inline int nearest_node_nodemask(int node, nodemask_t *mask)
> > > > +{
> > > > + return NUMA_NO_NODE;
> > > > +}
> > > > +
> > > > static inline int memory_add_physaddr_to_nid(u64 start)
> > > > {
> > > > return 0;
> > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > index 162407fbf2bc7..1e2acf187ea3a 100644
> > > > --- a/mm/mempolicy.c
> > > > +++ b/mm/mempolicy.c
> > > > @@ -196,6 +196,38 @@ int numa_nearest_node(int node, unsigned int state)
> > > > }
> > > > EXPORT_SYMBOL_GPL(numa_nearest_node);
> > > >
> > > > +/**
> > > > + * nearest_node_nodemask - Find the node in @mask at the nearest distance
> > > > + * from @node.
> > > > + *
> > > > + * @node: the node to start the search from.
> > > > + * @mask: a pointer to a nodemask representing the allowed nodes.
> > > > + *
> > > > + * This function iterates over all nodes in the given state and calculates
> > > > + * the distance to the starting node.
> > > > + *
> > > > + * Returns the node ID in @mask that is the closest in terms of distance
> > > > + * from @node, or MAX_NUMNODES if no node is found.
> > > > + */
> > > > +int nearest_node_nodemask(int node, nodemask_t *mask)
> > > > +{
> > > > + int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES;
> > > > +
> > > > + if (node == NUMA_NO_NODE)
> > > > + return MAX_NUMNODES;
> > >
> > > This makes it unclear: you make it legal to pass NUMA_NO_NODE, but
> > > your function returns something useless. I don't think it would help
> > > users in any reasonable scenario.
> > >
> > > So, if you don't want user to call this with node == NUMA_NO_NODE,
> > > just describe it in comment on top of the function. Otherwise, please
> > > do something useful like
> > >
> > > if (node == NUMA_NO_NODE)
> > > node = current_node;
> > >
> > > I would go with option 1. Notice, node_distance() doesn't bother to
> > > check against NUMA_NO_NODE.
> >
> > Hm... is it? Looking at __node_distance(), it doesn't seem really safe to
> > pass a negative value (maybe I'm missing something?).
>
> It's not safe, but inside the kernel we don't check parameters. Out of
> your courtesy you may decide to put a comment, but strictly speaking you
> don't have to.
>
> > Anyway, I'd also prefer to go with option 1 and not implicitly assuming
> > NUMA_NO_NODE == current node (it feels that it might hide nasty bugs).
>
> Yeah, very true
>
> > So, I can add a comment in the description to clarify that NUMA_NO_NODE is
> > forbidenx, but what is someone is passing it? Should we WARN_ON_ONCE() at
> > least?
>
> He will brick his testing board, and learn to read comments in a hard
> way.
>
> Speaking more seriously, you will be most likely CCed as an author of
> that function, and you will be able to comment that on review. Also,
> there's a great chance that it will be caught by KASAN or some other
> sanitation tool even before someone sends a buggy patch.
>
> This is an old as the world and very well known problem, and everyone
> is aware.
Ok, makes sense, I'll just clarify this in the comment then.
Thanks,
-Andrea
next prev parent reply other threads:[~2025-02-14 8:55 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-12 16:48 [PATCHSET v11 sched_ext/for-6.15] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2025-02-12 16:48 ` [PATCH 1/7] nodemask: numa: reorganize inclusion path Andrea Righi
2025-02-13 15:29 ` Yury Norov
2025-02-13 15:59 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask() Andrea Righi
2025-02-13 15:57 ` Yury Norov
2025-02-13 16:19 ` Andrea Righi
2025-02-13 17:12 ` Yury Norov
2025-02-14 8:55 ` Andrea Righi [this message]
2025-02-14 16:04 ` Yury Norov
2025-02-12 16:48 ` [PATCH 3/7] sched/topology: Introduce for_each_node_numadist() iterator Andrea Righi
2025-02-13 16:02 ` Yury Norov
2025-02-13 16:32 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 4/7] sched_ext: idle: Make idle static keys private Andrea Righi
2025-02-12 16:48 ` [PATCH 5/7] sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE Andrea Righi
2025-02-13 16:08 ` Yury Norov
2025-02-13 16:22 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 6/7] sched_ext: idle: Per-node idle cpumasks Andrea Righi
2025-02-13 10:57 ` kernel test robot
2025-02-13 18:03 ` Yury Norov
2025-02-12 16:48 ` [PATCH 7/7] sched_ext: idle: Introduce node-aware idle cpu kfunc helpers Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z68E_ar8l7vNOxgh@gpd3 \
--to=arighi@nvidia.com \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=ianm@nvidia.com \
--cc=joel@joelfernandes.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.