From: samir <samir@linux.ibm.com>
To: Steve Wahl <steve.wahl@hpe.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
linux-kernel@vger.kernel.org,
K Prateek Nayak <kprateek.nayak@amd.com>,
Vishal Chourasia <vishalc@linux.ibm.com>,
Russ Anderson <rja@hpe.com>, Dimitri Sivanich <sivanich@hpe.com>,
srikar@linux.ibm.com, sshegde@linux.ibm.com
Subject: Re: [PATCH v2] sched/topology: improve topology_span_sane speed
Date: Mon, 25 Nov 2024 21:50:53 +0530 [thread overview]
Message-ID: <02dbeaa81619ccf86204f577c5ef7705@linux.ibm.com> (raw)
In-Reply-To: <20241031200431.182443-1-steve.wahl@hpe.com>
On 2024-11-01 01:34, Steve Wahl wrote:
> Use a different approach to topology_span_sane(), that checks for the
> same constraint of no partial overlaps for any two CPU sets for
> non-NUMA topology levels, but does so in a way that is O(N) rather
> than O(N^2).
>
> Instead of comparing with all other masks to detect collisions, keep
> one mask that includes all CPUs seen so far and detect collisions with
> a single cpumask_intersects test.
>
> If the current mask has no collisions with previously seen masks, it
> should be a new mask, which can be uniquely identified ("id") by the
> lowest bit set in this mask. Mark that we've seen a mask with this
> id, and add the CPUs in this mask to the list of those seen.
>
> If the current mask does collide with previously seen masks, it should
> be exactly equal to a mask seen before, identified once again by the
> lowest bit the current mask has set. It's an error if we haven't seen
> a mask with that id, or if the current mask doesn't match the one we
> get by looking up that id.
>
> Move the topology_span_sane() check out of the existing topology level
> loop, let it do its own looping to match the needs of this algorithm.
>
> On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
> the average time to take one processor offline is reduced from 2.18
> seconds to 1.01 seconds. (Off-lining 959 of 1920 processors took
> 34m49.765s without this change, 16m10.038s with this change in place.)
>
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> ---
>
> Version 2: Adopted suggestion by K Prateek Nayak that removes an array
> and
> simplifies the code, and eliminates the erroneous use of
> num_possible_cpus() that Peter Zijlstra noted.
>
> Version 1 discussion:
>
> https://lore.kernel.org/all/20241010155111.230674-1-steve.wahl@hpe.com/
>
> kernel/sched/topology.c | 73 +++++++++++++++++++++++++++--------------
> 1 file changed, 48 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..6a2a3e91d59e 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2356,35 +2356,58 @@ static struct sched_domain
> *build_sched_domain(struct sched_domain_topology_leve
>
> /*
> * Ensure topology masks are sane, i.e. there are no conflicts
> (overlaps) for
> - * any two given CPUs at this (non-NUMA) topology level.
> + * any two given CPUs on non-NUMA topology levels.
> */
> -static bool topology_span_sane(struct sched_domain_topology_level *tl,
> - const struct cpumask *cpu_map, int cpu)
> +static bool topology_span_sane(const struct cpumask *cpu_map)
> {
> - int i = cpu + 1;
> + struct sched_domain_topology_level *tl;
> + struct cpumask *covered, *id_seen;
> + int cpu;
>
> - /* NUMA levels are allowed to overlap */
> - if (tl->flags & SDTL_OVERLAP)
> - return true;
> + lockdep_assert_held(&sched_domains_mutex);
> + covered = sched_domains_tmpmask;
> + id_seen = sched_domains_tmpmask2;
> +
> + for_each_sd_topology(tl) {
> +
> + /* NUMA levels are allowed to overlap */
> + if (tl->flags & SDTL_OVERLAP)
> + continue;
> +
> + cpumask_clear(covered);
> + cpumask_clear(id_seen);
>
> - /*
> - * Non-NUMA levels cannot partially overlap - they must be either
> - * completely equal or completely disjoint. Otherwise we can end up
> - * breaking the sched_group lists - i.e. a later get_group() pass
> - * breaks the linking done for an earlier span.
> - */
> - for_each_cpu_from(i, cpu_map) {
> /*
> - * We should 'and' all those masks with 'cpu_map' to exactly
> - * match the topology we're about to build, but that can only
> - * remove CPUs, which only lessens our ability to detect
> - * overlaps
> + * Non-NUMA levels cannot partially overlap - they must be either
> + * completely equal or completely disjoint. Otherwise we can end up
> + * breaking the sched_group lists - i.e. a later get_group() pass
> + * breaks the linking done for an earlier span.
> */
> - if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
> - cpumask_intersects(tl->mask(cpu), tl->mask(i)))
> - return false;
> + for_each_cpu(cpu, cpu_map) {
> + const struct cpumask *tl_cpu_mask = tl->mask(cpu);
> + int id;
> +
> + /* lowest bit set in this mask is used as a unique id */
> + id = cpumask_first(tl_cpu_mask);
> +
> + /* if this mask doesn't collide with what we've already seen */
> + if (!cpumask_intersects(tl_cpu_mask, covered)) {
> + /* Really odd case when cpu != id, likely not sane */
> + if ((cpu != id) && !cpumask_equal(tl_cpu_mask, tl->mask(id)))
> + return false;
> + if (cpumask_test_and_set_cpu(id, id_seen))
> + return false;
> + cpumask_or(covered, tl_cpu_mask, covered);
> + } else if ((!cpumask_test_cpu(id, id_seen)) ||
> + !cpumask_equal(tl->mask(id), tl_cpu_mask)) {
> + /*
> + * a collision with covered should have exactly matched
> + * a previously seen mask with the same id
> + */
> + return false;
> + }
> + }
> }
> -
> return true;
> }
>
> @@ -2417,9 +2440,6 @@ build_sched_domains(const struct cpumask
> *cpu_map, struct sched_domain_attr *att
> sd = NULL;
> for_each_sd_topology(tl) {
>
> - if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
> - goto error;
> -
> sd = build_sched_domain(tl, cpu_map, attr, sd, i);
>
> has_asym |= sd->flags & SD_ASYM_CPUCAPACITY;
> @@ -2433,6 +2453,9 @@ build_sched_domains(const struct cpumask
> *cpu_map, struct sched_domain_attr *att
> }
> }
>
> + if (WARN_ON(!topology_span_sane(cpu_map)))
> + goto error;
> +
> /* Build the groups for the domains */
> for_each_cpu(i, cpu_map) {
> for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
Hello Steve,
I have verified the above patch on PowerPC, and here are my findings:
Below are the 5 iteration data for “time ppc64_cpu --smt=off/4”
command(min, max, Average, and Std Dev).
——lscpu—-
CPU(s): 360
On-line CPU(s) list: 0-359
Without Patch:
————uname -a————
6.12.0+
Metric SMT Off (s) SMT 4 (s)
Min 71.123 37.629
Max 75.031 40.785
Average 73.175 39.992
Std Dev 1.619 1.191
With patch:
————uname -a————
6.12.0+
Metric SMT Off (s) SMT 4 (s)
Min 63.638 37.114
Max 73.707 40.773
Average 70.106 38.641
Std Dev 3.443 1.472
The logs indicate minimal improvement in the SMT 4 state with the patch.
However, a slight improvement is observed in the SMT OFF state with the
patch, as reflected in all three metrics: Min, Max, and Average.
SMT 4 State:
• Across all metrics (Min, Max, and Average), there is negligible
improvement with the patch.
• The differences are within the margin of measurement variability, as
reflected in the similar Std Dev values (~1.1–1.4).
SMT Off State:
• A noticeable improvement is observed in the Min, Max, and Average
values with the patch:
◦ Min: Improved by ~10.5% (from 71.123 to 63.638 seconds).
◦ Max: Improved by ~1.8% (from 75.031 to 73.707 seconds).
◦ Average: Improved by ~4.2% (from 73.175 to 70.106 seconds).
• The Std Dev increased significantly (from 1.619 to 3.443).
Thanks,
Samir
next prev parent reply other threads:[~2024-11-25 16:21 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-31 20:04 [PATCH v2] sched/topology: improve topology_span_sane speed Steve Wahl
2024-11-06 4:49 ` K Prateek Nayak
2024-11-07 17:43 ` Steve Wahl
2024-11-12 16:15 ` Valentin Schneider
2024-11-13 15:42 ` Steve Wahl
2024-11-19 21:03 ` Steve Wahl
2024-11-19 22:54 ` Valentin Schneider
2024-11-20 15:52 ` Steve Wahl
2024-11-25 16:20 ` samir [this message]
2024-11-26 16:25 ` Michael Kelley
2024-12-13 6:31 ` Saurabh Singh Sengar
2024-12-16 15:38 ` Valentin Schneider
2024-12-17 18:16 ` Steve Wahl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02dbeaa81619ccf86204f577c5ef7705@linux.ibm.com \
--to=samir@linux.ibm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rja@hpe.com \
--cc=rostedt@goodmis.org \
--cc=sivanich@hpe.com \
--cc=srikar@linux.ibm.com \
--cc=sshegde@linux.ibm.com \
--cc=steve.wahl@hpe.com \
--cc=vincent.guittot@linaro.org \
--cc=vishalc@linux.ibm.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox