* [PATCH 0/2] sched/topology: optimize topology_span_sane()
@ 2024-08-02 17:57 Yury Norov
2024-08-02 17:57 ` [PATCH 1/2] sched/topology: pre-compute topology_span_sane() loop params Yury Norov
2024-08-02 17:57 ` [PATCH 2/2] sched/topology: optimize topology_span_sane() Yury Norov
0 siblings, 2 replies; 10+ messages in thread
From: Yury Norov @ 2024-08-02 17:57 UTC (permalink / raw)
To: linux-kernel
Cc: Yury Norov, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider
Pre-compute pre-compute topology_span_sane() loop params and optimize
the functtion to avoid calling cpumask_equal() when masks are the same.
This series follows up comments from here:
https://lore.kernel.org/lkml/ZqqV5OxZPHUgjhag@LeoBras/T/#md6b2b6bdd09e63740bbf010530211842a79b5f57
Yury Norov (2):
sched/topology: pre-compute topology_span_sane() loop params
sched/topology: optimize topology_span_sane()
kernel/sched/topology.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] sched/topology: pre-compute topology_span_sane() loop params
2024-08-02 17:57 [PATCH 0/2] sched/topology: optimize topology_span_sane() Yury Norov
@ 2024-08-02 17:57 ` Yury Norov
2024-08-03 15:32 ` Chen Yu
2024-08-02 17:57 ` [PATCH 2/2] sched/topology: optimize topology_span_sane() Yury Norov
1 sibling, 1 reply; 10+ messages in thread
From: Yury Norov @ 2024-08-02 17:57 UTC (permalink / raw)
To: linux-kernel
Cc: Yury Norov, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider
tl->mask(cpu) is used unchanged in the loop, and tl->mask(i) in worst
case may be calculated twice as parameters for cpumask_equal() and
cpumask_intersects(). So, precalculate both.
Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Suggested-by: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
kernel/sched/topology.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 76504b776d03..754ad5fa3c99 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2353,6 +2353,7 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
static bool topology_span_sane(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map, int cpu)
{
+ const struct cpumask *mi, *mc = tl->mask(cpu);
int i = cpu + 1;
/* NUMA levels are allowed to overlap */
@@ -2366,14 +2367,15 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
* breaks the linking done for an earlier span.
*/
for_each_cpu_from(i, cpu_map) {
+ mi = tl->mask(i);
+
/*
* We should 'and' all those masks with 'cpu_map' to exactly
* match the topology we're about to build, but that can only
* remove CPUs, which only lessens our ability to detect
* overlaps
*/
- if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
- cpumask_intersects(tl->mask(cpu), tl->mask(i)))
+ if (!cpumask_equal(mc, mi) && cpumask_intersects(mc, mi))
return false;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/2] sched/topology: pre-compute topology_span_sane() loop params
2024-08-02 17:57 ` [PATCH 1/2] sched/topology: pre-compute topology_span_sane() loop params Yury Norov
@ 2024-08-03 15:32 ` Chen Yu
0 siblings, 0 replies; 10+ messages in thread
From: Chen Yu @ 2024-08-03 15:32 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider
On 2024-08-02 at 10:57:42 -0700, Yury Norov wrote:
> tl->mask(cpu) is used unchanged in the loop, and tl->mask(i) in worst
> case may be calculated twice as parameters for cpumask_equal() and
> cpumask_intersects(). So, precalculate both.
>
> Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> Suggested-by: Leonardo Bras <leobras@redhat.com>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
> kernel/sched/topology.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 76504b776d03..754ad5fa3c99 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2353,6 +2353,7 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
> static bool topology_span_sane(struct sched_domain_topology_level *tl,
> const struct cpumask *cpu_map, int cpu)
> {
> + const struct cpumask *mi, *mc = tl->mask(cpu);
Could it avoid the calculation by putting
mc = tl->mask(cpu) after the
if (tl->flags & SDTL_OVERLAP)?
Other than that,
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
thanks,
Chenyu
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] sched/topology: optimize topology_span_sane()
2024-08-02 17:57 [PATCH 0/2] sched/topology: optimize topology_span_sane() Yury Norov
2024-08-02 17:57 ` [PATCH 1/2] sched/topology: pre-compute topology_span_sane() loop params Yury Norov
@ 2024-08-02 17:57 ` Yury Norov
2024-08-06 15:50 ` Valentin Schneider
1 sibling, 1 reply; 10+ messages in thread
From: Yury Norov @ 2024-08-02 17:57 UTC (permalink / raw)
To: linux-kernel
Cc: Yury Norov, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider
The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
even when cpu != i. In such case, cpumask_equal() would always return
true, and we can proceed to the next iteration immediately.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
kernel/sched/topology.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 754ad5fa3c99..571759606954 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2368,6 +2368,8 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
*/
for_each_cpu_from(i, cpu_map) {
mi = tl->mask(i);
+ if (mi == mc)
+ continue;
/*
* We should 'and' all those masks with 'cpu_map' to exactly
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 2/2] sched/topology: optimize topology_span_sane()
2024-08-02 17:57 ` [PATCH 2/2] sched/topology: optimize topology_span_sane() Yury Norov
@ 2024-08-06 15:50 ` Valentin Schneider
2024-08-06 18:00 ` Yury Norov
0 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2024-08-06 15:50 UTC (permalink / raw)
To: Yury Norov, linux-kernel
Cc: Yury Norov, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman
On 02/08/24 10:57, Yury Norov wrote:
> The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
> even when cpu != i.
For which architecture have you observed this? AFAIA all implementations of
tl->sched_domain_mask_f are built on a per-CPU cpumask.
e.g. for the default topology, we have:
cpu_smt_mask() -> topology_sibling_cpumask()
which is implemented as:
arch/loongarch/include/asm/topology.h:35:#define topology_sibling_cpumask(cpu) (&cpu_sibling_map[cpu])
arch/mips/include/asm/topology.h:18:#define topology_sibling_cpumask(cpu) (&cpu_sibling_map[cpu])
arch/powerpc/include/asm/topology.h:139:#define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
arch/s390/include/asm/topology.h:31:#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_mask)
arch/sparc/include/asm/topology_64.h:50:#define topology_sibling_cpumask(cpu) (&per_cpu(cpu_sibling_map, cpu))
arch/x86/include/asm/topology.h:186:#define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
include/linux/arch_topology.h:91:#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
include/linux/topology.h:218:#define topology_sibling_cpumask(cpu) cpumask_of(cpu)
and ditto for cpu_coregroup_mask() & cpu_cpu_mask().
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] sched/topology: optimize topology_span_sane()
2024-08-06 15:50 ` Valentin Schneider
@ 2024-08-06 18:00 ` Yury Norov
2024-08-07 13:53 ` Valentin Schneider
0 siblings, 1 reply; 10+ messages in thread
From: Yury Norov @ 2024-08-06 18:00 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman
On Tue, Aug 06, 2024 at 05:50:23PM +0200, Valentin Schneider wrote:
> On 02/08/24 10:57, Yury Norov wrote:
> > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
> > even when cpu != i.
>
> For which architecture have you observed this? AFAIA all implementations of
> tl->sched_domain_mask_f are built on a per-CPU cpumask.
x86_64, qemu emulating 16 CPUs in 4 nodes, Linux 6.10, approximately
defconfig.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] sched/topology: optimize topology_span_sane()
2024-08-06 18:00 ` Yury Norov
@ 2024-08-07 13:53 ` Valentin Schneider
2024-08-07 16:39 ` Yury Norov
0 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2024-08-07 13:53 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman
On 06/08/24 11:00, Yury Norov wrote:
> On Tue, Aug 06, 2024 at 05:50:23PM +0200, Valentin Schneider wrote:
>> On 02/08/24 10:57, Yury Norov wrote:
>> > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
>> > even when cpu != i.
>>
>> For which architecture have you observed this? AFAIA all implementations of
>> tl->sched_domain_mask_f are built on a per-CPU cpumask.
>
> x86_64, qemu emulating 16 CPUs in 4 nodes, Linux 6.10, approximately
> defconfig.
For the default_topology:
cpu_smt_mask() # SMT
(per_cpu(cpu_sibling_map, cpu))
cpu_clustergroup_mask() # CLS
per_cpu(cpu_l2c_shared_map, cpu);
cpu_coregroup_mask() # MC
per_cpu(cpu_llc_shared_map, cpu);
cpu_cpu_mask() # PKG
cpumask_of_node(cpu_to_node(cpu));
Ok so PKG can potentially hit that condition, and so can any
sched_domain_mask_f that relies on the node masks...
I'm thinking ideally we should have checks in place to ensure all
node_to_cpumask_map[] masks are disjoint, then we could entirely skip the levels
that use these masks in topology_span_sane(), but there's unfortunately no
nice way to flag them... Also there would cases where there's no real
difference between PKG and NODE other than NODE is still based on a per-cpu
cpumask and PKG isn't, so I don't see a nicer way to go about this.
Please add something like the following to the changelog, and with that:
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
"""
Some topology levels (e.g. PKG in default_topology[]) have a
sched_domain_mask_f implementation that reuses the same mask for several
CPUs (in PKG's case, one mask for all CPUs in the same NUMA node).
For such topology levels, repeating cpumask_equal() checks is wasteful -
check that the tl->mask(i) pointers aren't the same first.
"""
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] sched/topology: optimize topology_span_sane()
2024-08-07 13:53 ` Valentin Schneider
@ 2024-08-07 16:39 ` Yury Norov
0 siblings, 0 replies; 10+ messages in thread
From: Yury Norov @ 2024-08-07 16:39 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, Christophe JAILLET, Leonardo Bras, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman
On Wed, Aug 07, 2024 at 03:53:18PM +0200, Valentin Schneider wrote:
> On 06/08/24 11:00, Yury Norov wrote:
> > On Tue, Aug 06, 2024 at 05:50:23PM +0200, Valentin Schneider wrote:
> >> On 02/08/24 10:57, Yury Norov wrote:
> >> > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
> >> > even when cpu != i.
> >>
> >> For which architecture have you observed this? AFAIA all implementations of
> >> tl->sched_domain_mask_f are built on a per-CPU cpumask.
> >
> > x86_64, qemu emulating 16 CPUs in 4 nodes, Linux 6.10, approximately
> > defconfig.
>
> For the default_topology:
> cpu_smt_mask() # SMT
> (per_cpu(cpu_sibling_map, cpu))
>
> cpu_clustergroup_mask() # CLS
> per_cpu(cpu_l2c_shared_map, cpu);
>
> cpu_coregroup_mask() # MC
> per_cpu(cpu_llc_shared_map, cpu);
>
> cpu_cpu_mask() # PKG
> cpumask_of_node(cpu_to_node(cpu));
>
> Ok so PKG can potentially hit that condition, and so can any
> sched_domain_mask_f that relies on the node masks...
>
> I'm thinking ideally we should have checks in place to ensure all
> node_to_cpumask_map[] masks are disjoint, then we could entirely skip the levels
> that use these masks in topology_span_sane(), but there's unfortunately no
> nice way to flag them... Also there would cases where there's no real
> difference between PKG and NODE other than NODE is still based on a per-cpu
> cpumask and PKG isn't, so I don't see a nicer way to go about this.
>
> Please add something like the following to the changelog, and with that:
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Sure, will do.
> """
> Some topology levels (e.g. PKG in default_topology[]) have a
> sched_domain_mask_f implementation that reuses the same mask for several
> CPUs (in PKG's case, one mask for all CPUs in the same NUMA node).
>
> For such topology levels, repeating cpumask_equal() checks is wasteful -
> check that the tl->mask(i) pointers aren't the same first.
> """
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 0/2] sched/topology: optimize topology_span_sane()
@ 2024-08-07 19:05 Yury Norov
2024-08-07 19:05 ` [PATCH 2/2] " Yury Norov
0 siblings, 1 reply; 10+ messages in thread
From: Yury Norov @ 2024-08-07 19:05 UTC (permalink / raw)
To: linux-kernel
Cc: Yury Norov, Chen Yu, Christophe JAILLET, Leonardo Bras,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider
The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
even when cpu != i. In such case, cpumask_equal() would always return
true, and we can proceed to the next iteration immediately.
Valentin Schneider shares on it:
PKG can potentially hit that condition, and so can any
sched_domain_mask_f that relies on the node masks...
I'm thinking ideally we should have checks in place to
ensure all node_to_cpumask_map[] masks are disjoint,
then we could entirely skip the levels that use these
masks in topology_span_sane(), but there's unfortunately
no nice way to flag them... Also there would be cases
where there's no real difference between PKG and NODE
other than NODE is still based on a per-cpu cpumask and
PKG isn't, so I don't see a nicer way to go about this.
v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/
v2:
- defer initialization of 'mc' in patch #1 @Chen Yu;
- more comments from Valentin Schneider.
Yury Norov (2):
sched/topology: pre-compute topology_span_sane() loop params
sched/topology: optimize topology_span_sane()
kernel/sched/topology.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] sched/topology: optimize topology_span_sane()
2024-08-07 19:05 [PATCH v2 0/2] " Yury Norov
@ 2024-08-07 19:05 ` Yury Norov
0 siblings, 0 replies; 10+ messages in thread
From: Yury Norov @ 2024-08-07 19:05 UTC (permalink / raw)
To: linux-kernel
Cc: Yury Norov, Chen Yu, Christophe JAILLET, Leonardo Bras,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider
The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
even though cpu != i. In such case, cpumask_equal() would always return
true, and we can proceed to the next iteration immediately.
Comment is provided by Valentin Schneider.
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
kernel/sched/topology.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8af3b48da458..3661d4173d1f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2370,6 +2370,18 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
*/
for_each_cpu_from(i, cpu_map) {
mi = tl->mask(i);
+ /*
+ * Some topology levels (e.g. PKG in default_topology[])
+ * have a sched_domain_mask_f implementation that reuses
+ * the same mask for several CPUs (in PKG's case, one mask
+ * for all CPUs in the same NUMA node).
+ *
+ * For such topology levels, repeating cpumask_equal()
+ * checks is wasteful. Instead, we first check that the
+ * tl->mask(i) pointers aren't the same.
+ */
+ if (mi == mc)
+ continue;
/*
* We should 'and' all those masks with 'cpu_map' to exactly
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
[parent not found: <20240319185148.985729-1-kyle.meyer@hpe.com>]
end of thread, other threads:[~2024-08-07 19:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 17:57 [PATCH 0/2] sched/topology: optimize topology_span_sane() Yury Norov
2024-08-02 17:57 ` [PATCH 1/2] sched/topology: pre-compute topology_span_sane() loop params Yury Norov
2024-08-03 15:32 ` Chen Yu
2024-08-02 17:57 ` [PATCH 2/2] sched/topology: optimize topology_span_sane() Yury Norov
2024-08-06 15:50 ` Valentin Schneider
2024-08-06 18:00 ` Yury Norov
2024-08-07 13:53 ` Valentin Schneider
2024-08-07 16:39 ` Yury Norov
-- strict thread matches above, loose matches on Subject: below --
2024-08-07 19:05 [PATCH v2 0/2] " Yury Norov
2024-08-07 19:05 ` [PATCH 2/2] " Yury Norov
[not found] <20240319185148.985729-1-kyle.meyer@hpe.com>
[not found] ` <20240319185148.985729-3-kyle.meyer@hpe.com>
2024-03-20 18:32 ` [PATCH 2/2] sched/topology: Optimize topology_span_sane() Yury Norov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox