From: Yury Norov <ynorov@nvidia.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
linux-kernel@vger.kernel.org, Yury Norov <yury.norov@gmail.com>
Subject: Re: [PATCH] sched/topology: optimize sched_numa_find_nth_cpu()
Date: Sat, 28 Mar 2026 17:42:28 -0400 [thread overview]
Message-ID: <achLRH-MQbrxF4p1@yury> (raw)
In-Reply-To: <xhsmh5x6igysq.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
On Thu, Mar 26, 2026 at 08:45:25PM +0100, Valentin Schneider wrote:
> On 26/03/26 20:09, Valentin Schneider wrote:
> > On 19/03/26 13:26, Yury Norov wrote:
> >> The binary search callback hop_cmp() uses cpumask_weight_and() on each
> >> iteration. Switch it to cpumask_nth_and() as it returns earlier, as
> >> soon as the required number of CPUs is found.
> >>
> >> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> >
> > Woopsie, forget about the empty reply.
> >
> > Took me a little while to get back on track with how
> > sched_numa_find_nth_cpu() works. Doesn't help that it and the
> > cpumask_nth*() family have a @cpu parameter when it isn't a CPU but an
> > offset from a search start (i.e. an index, as coined for cpumask_local_spread()).
> >
> > Comments would help, I'll try to come up with something tomorrow if I wake
> > up from whatever's brewing in my lungs :(
>
> I figured I'd give it a try before the fever kicks in:
> ---
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 32dcddaead82d..7069179d5ee0c 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2267,6 +2267,7 @@ static int hop_cmp(const void *a, const void *b)
> struct cpumask **prev_hop, **cur_hop = *(struct cpumask ***)b;
> struct __cmp_key *k = (struct __cmp_key *)a;
>
> + /* Not enough CPUs reachable in that many hops */
> if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
> return 1;
>
> @@ -2275,6 +2276,10 @@ static int hop_cmp(const void *a, const void *b)
> return 0;
> }
>
> + /*
> + * cur_hop spans enough CPUs to return an nth one, if the immediately
> + * preceding hop doesn't then we're done.
> + */
> prev_hop = *((struct cpumask ***)b - 1);
> k->w = cpumask_weight_and(k->cpus, prev_hop[k->node]);
> if (k->w <= k->cpu)
I'm not a fan of comments explaining how the code works, especially
if they pollute the code itself. Instead of increasing the function
vertical length, can you add a top comment like:
Comparator for the binary search in sched_numa_find_nth_cpu().
Returns positive value if the given hop doesn't contain enough CPUs.
Returns zero if the current hop is a minimal hop containing enough CPUs.
Returns negative value if the current hop is not a minimal hop containing
enough CPUs.
If you think the comparator is too complicated, it's always better to
add self-explaining helpers:
if (not_enough_hops())
return 1;
if (just_enough_hops())
return 0;
/* Otherwise too many hops */
return -1;
> @@ -2284,16 +2289,16 @@ static int hop_cmp(const void *a, const void *b)
> }
>
> /**
> - * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth closest CPU
> - * from @cpus to @cpu, taking into account distance
> - * from a given @node.
> + * sched_numa_find_nth_cpu() - given the NUMA topology, find the @nth_cpu in
> + * @cpus reachable from @node in the least amount
> + * of hops.
> * @cpus: cpumask to find a cpu from
> - * @cpu: CPU to start searching
> - * @node: NUMA node to order CPUs by distance
> + * @nth_cpu: CPU offset to search for
> + * @node: NUMA node to start the search from
> *
> * Return: cpu, or nr_cpu_ids when nothing found.
> */
> -int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
> +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int nth_cpu, int node)
If you think that 'cpu' is confusing, then 'nth_cpu' would be even
more confusing: you're searching for the nth_cpu, and you pass it as a
parameter.
Let's name it just num or idx, with the reasoning:
1. This is not CPU (contrary to cpumask_next(cpu), for example); and
2. This is just an index in a numa-based distance enumeration.
And the description should reflect that, like:
@idx: index of a CPU to find in a node-based distance CPU enumeration
> {
> struct __cmp_key k = { .cpus = cpus, .cpu = cpu };
> struct cpumask ***hop_masks;
> @@ -2315,8 +2320,21 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
> hop_masks = bsearch(&k, k.masks, sched_domains_numa_levels, sizeof(k.masks[0]), hop_cmp);
> if (!hop_masks)
> goto unlock;
> + /*
> + * bsearch returned sched_domains_numa_masks[hop], with @hop being the
> + * smallest amount of hops it takes to reach an @nth_cpu from @node.
> + */
Please no kernel doc syntax out of kernel doc blocks.
> hop = hop_masks - k.masks;
>
> + /*
> + * @hop is constructed by hop_cmp() such that sched_domains_numa_masks[hop][node]
hop_cmp() doesn't construct, it's a comparator for bsearch, which searches for
the nearest hop containing at least N CPUs.
> + * spans enough CPUs to return an @nth_cpu, and sched_domains_numa_masks[hop-1][node]
> + * doesn't.
> + *
> + * Get a cpumask without the CPUs from sched_domains_numa_masks[hop-1][node]
> + * subtract how many CPUs that contains (@k.w), and fetch our @nth_cpu from
> + * the resulting mask.
It explains only hop != NULL case. This sentence confuses more than
explain, to me. The below code is quite self-explaining.
> + */
> ret = hop ?
> cpumask_nth_and_andnot(cpu - k.w, cpus, k.masks[hop][node], k.masks[hop-1][node]) :
> cpumask_nth_and(cpu, cpus, k.masks[0][node]);
prev parent reply other threads:[~2026-03-28 21:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 17:26 [PATCH] sched/topology: optimize sched_numa_find_nth_cpu() Yury Norov
2026-03-26 19:05 ` Valentin Schneider
2026-03-26 19:09 ` Valentin Schneider
2026-03-26 19:45 ` Valentin Schneider
2026-03-27 13:38 ` Shrikanth Hegde
2026-03-27 15:59 ` Valentin Schneider
2026-03-28 21:42 ` Yury Norov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=achLRH-MQbrxF4p1@yury \
--to=ynorov@nvidia.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox