All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Ian May <ianm@nvidia.com>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask()
Date: Thu, 13 Feb 2025 12:12:46 -0500	[thread overview]
Message-ID: <Z64oDlh9vzvRYziL@thinkpad> (raw)
In-Reply-To: <Z64brsSMAR7cLPUU@gpd3>

On Thu, Feb 13, 2025 at 05:19:58PM +0100, Andrea Righi wrote:
> On Thu, Feb 13, 2025 at 10:57:00AM -0500, Yury Norov wrote:
> > On Wed, Feb 12, 2025 at 05:48:09PM +0100, Andrea Righi wrote:
> > > Introduce the new helper nearest_node_nodemask() to find the closest
> > > node in a specified nodemask from a given starting node.
> > > 
> > > Returns MAX_NUMNODES if no node is found.
> > > 
> > > Cc: Yury Norov <yury.norov@gmail.com>
> > > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > 
> > Suggested-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
> 
> Ok.
> 
> > 
> > > ---
> > >  include/linux/numa.h |  7 +++++++
> > >  mm/mempolicy.c       | 32 ++++++++++++++++++++++++++++++++
> > >  2 files changed, 39 insertions(+)
> > > 
> > > diff --git a/include/linux/numa.h b/include/linux/numa.h
> > > index 31d8bf8a951a7..e6baaf6051bcf 100644
> > > --- a/include/linux/numa.h
> > > +++ b/include/linux/numa.h
> > > @@ -31,6 +31,8 @@ void __init alloc_offline_node_data(int nid);
> > >  /* Generic implementation available */
> > >  int numa_nearest_node(int node, unsigned int state);
> > >  
> > > +int nearest_node_nodemask(int node, nodemask_t *mask);
> > > +
> > 
> > See how you use it. It looks a bit inconsistent to the other functions:
> > 
> >   #define for_each_node_numadist(node, unvisited)                                \
> >          for (int start = (node),                                                \
> >               node = nearest_node_nodemask((start), &(unvisited));               \
> >               node < MAX_NUMNODES;                                               \
> >               node_clear(node, (unvisited)),                                     \
> >               node = nearest_node_nodemask((start), &(unvisited)))
> >   
> > 
> > I would suggest to make it aligned with the rest of the API:
> > 
> >   #define node_clear(node, dst) __node_clear((node), &(dst))
> >   static __always_inline void __node_clear(int node, volatile nodemask_t *dstp)
> >   {
> >           clear_bit(node, dstp->bits);
> >   }
> 
> Sorry Yury, can you elaborate more on this? What do you mean with
> inconsistent, is it the volatile nodemask_t *?

What I mean is:
  #define nearest_node_nodemask(start, srcp)
                __nearest_node_nodemask((start), &(srcp))
  int __nearest_node_nodemask(int node, nodemask_t *mask);

That way you'll be able to make the above for-loop looking more
uniform:

  #define for_each_node_numadist(node, unvisited)                                \
         for (int __s = (node),                                                \
              (node) = nearest_node_nodemask(__s, (unvisited));               \
              (node) < MAX_NUMNODES;                                               \
              node_clear((node), (unvisited)),                                     \
              (node) = nearest_node_nodemask(__s, (unvisited)))

> > >  #ifndef memory_add_physaddr_to_nid
> > >  int memory_add_physaddr_to_nid(u64 start);
> > >  #endif
> > > @@ -47,6 +49,11 @@ static inline int numa_nearest_node(int node, unsigned int state)
> > >  	return NUMA_NO_NODE;
> > >  }
> > >  
> > > +static inline int nearest_node_nodemask(int node, nodemask_t *mask)
> > > +{
> > > +	return NUMA_NO_NODE;
> > > +}
> > > +
> > >  static inline int memory_add_physaddr_to_nid(u64 start)
> > >  {
> > >  	return 0;
> > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > index 162407fbf2bc7..1e2acf187ea3a 100644
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -196,6 +196,38 @@ int numa_nearest_node(int node, unsigned int state)
> > >  }
> > >  EXPORT_SYMBOL_GPL(numa_nearest_node);
> > >  
> > > +/**
> > > + * nearest_node_nodemask - Find the node in @mask at the nearest distance
> > > + *			   from @node.
> > > + *
> > > + * @node: the node to start the search from.
> > > + * @mask: a pointer to a nodemask representing the allowed nodes.
> > > + *
> > > + * This function iterates over all nodes in the given state and calculates
> > > + * the distance to the starting node.
> > > + *
> > > + * Returns the node ID in @mask that is the closest in terms of distance
> > > + * from @node, or MAX_NUMNODES if no node is found.
> > > + */
> > > +int nearest_node_nodemask(int node, nodemask_t *mask)
> > > +{
> > > +	int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES;
> > > +
> > > +	if (node == NUMA_NO_NODE)
> > > +		return MAX_NUMNODES;
> > 
> > This makes it unclear: you make it legal to pass NUMA_NO_NODE, but
> > your function returns something useless. I don't think it would help
> > users in any reasonable scenario.
> > 
> > So, if you don't want user to call this with node == NUMA_NO_NODE,
> > just describe it in comment on top of the function. Otherwise, please
> > do something useful like 
> > 
> > 	if (node == NUMA_NO_NODE)
> > 		node = current_node;
> > 
> > I would go with option 1. Notice, node_distance() doesn't bother to
> > check against NUMA_NO_NODE.
> 
> Hm... is it? Looking at __node_distance(), it doesn't seem really safe to
> pass a negative value (maybe I'm missing something?).

It's not safe, but inside the kernel we don't check parameters. Out of
your courtesy you may decide to put a comment, but strictly speaking you
don't have to.

> Anyway, I'd also prefer to go with option 1 and not implicitly assuming
> NUMA_NO_NODE == current node (it feels that it might hide nasty bugs).

Yeah, very true

> So, I can add a comment in the description to clarify that NUMA_NO_NODE is
> forbidenx, but what is someone is passing it? Should we WARN_ON_ONCE() at
> least?

He will brick his testing board, and learn to read comments in a hard
way.

Speaking more seriously, you will be most likely CCed as an author of
that function, and you will be able to comment that on review. Also,
there's a great chance that it will be caught by KASAN or some other
sanitation tool even before someone sends a buggy patch.

This is an old as the world and very well known problem, and everyone
is aware. 

Thanks,
Yury

  reply	other threads:[~2025-02-13 17:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-12 16:48 [PATCHSET v11 sched_ext/for-6.15] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2025-02-12 16:48 ` [PATCH 1/7] nodemask: numa: reorganize inclusion path Andrea Righi
2025-02-13 15:29   ` Yury Norov
2025-02-13 15:59     ` Andrea Righi
2025-02-12 16:48 ` [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask() Andrea Righi
2025-02-13 15:57   ` Yury Norov
2025-02-13 16:19     ` Andrea Righi
2025-02-13 17:12       ` Yury Norov [this message]
2025-02-14  8:55         ` Andrea Righi
2025-02-14 16:04           ` Yury Norov
2025-02-12 16:48 ` [PATCH 3/7] sched/topology: Introduce for_each_node_numadist() iterator Andrea Righi
2025-02-13 16:02   ` Yury Norov
2025-02-13 16:32     ` Andrea Righi
2025-02-12 16:48 ` [PATCH 4/7] sched_ext: idle: Make idle static keys private Andrea Righi
2025-02-12 16:48 ` [PATCH 5/7] sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE Andrea Righi
2025-02-13 16:08   ` Yury Norov
2025-02-13 16:22     ` Andrea Righi
2025-02-12 16:48 ` [PATCH 6/7] sched_ext: idle: Per-node idle cpumasks Andrea Righi
2025-02-13 10:57   ` kernel test robot
2025-02-13 18:03   ` Yury Norov
2025-02-12 16:48 ` [PATCH 7/7] sched_ext: idle: Introduce node-aware idle cpu kfunc helpers Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z64oDlh9vzvRYziL@thinkpad \
    --to=yury.norov@gmail.com \
    --cc=arighi@nvidia.com \
    --cc=bpf@vger.kernel.org \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ianm@nvidia.com \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.