* [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level
@ 2025-06-25 3:45 Li Chen
2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Li Chen @ 2025-06-25 3:45 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra,
K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn,
linux-kernel
From: Li Chen <chenl311@chinatelecom.cn>
This two–patch series cleans up sched-domain topology handling and
eliminates hundreds of pointless attach/destroy cycles for large
machines when SMT is not available.
Patch 1 (from Thomas, unchanged) gets rid of the #ifdef maze in
build_sched_topology() by statically initialising the topology array.
Patch 2 (mine) is a follow-up that simply memmoves the array when
cpu_smt_num_threads <= 1, so the SMT level never gets created and
immediately torn down again.
Tested on Qemu.
changelog:
v2: fix wording issue as suggested by Thomas [1]
v3: remove pointless memset and adjust PKG index accordingly, as
suggested by Thomas [2], and refine some other wording issues.
[1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
[2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
Li Chen (1):
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
Thomas Gleixner (1):
x86/smpboot: Decrapify build_sched_topology()
arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 27 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() 2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen @ 2025-06-25 3:45 ` Li Chen 2025-06-25 8:28 ` Peter Zijlstra 2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen 2025-06-25 5:50 ` [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level K Prateek Nayak 2 siblings, 1 reply; 12+ messages in thread From: Li Chen @ 2025-06-25 3:45 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Li Chen From: Thomas Gleixner <tglx@linutronix.de> The #ifdeffery and the initializers in build_sched_topology() are just disgusting. The SCHED_SMT #ifdef is also pointless because SCHED_SMT is unconditionally enabled when SMP is enabled. Statically initialize the domain levels in the topology array and let build_sched_topology() invalidate the package domain level when NUMA in package is available. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/kernel/smpboot.c | 45 +++++++++++++++------------------------ 1 file changed, 17 insertions(+), 28 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index fc78c2325fd29..7d202f9785362 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -478,43 +478,32 @@ static int x86_cluster_flags(void) */ static bool x86_has_numa_in_package; -static struct sched_domain_topology_level x86_topology[6]; +#define DOMAIN(maskfn, flagsfn, dname) { .mask = maskfn, .sd_flags = flagsfn, .name = #dname } -static void __init build_sched_topology(void) -{ - int i = 0; - -#ifdef CONFIG_SCHED_SMT - x86_topology[i++] = (struct sched_domain_topology_level){ - cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) - }; -#endif +static struct sched_domain_topology_level x86_topology[] = { + DOMAIN(cpu_smt_mask, cpu_smt_flags, SMT), #ifdef CONFIG_SCHED_CLUSTER - x86_topology[i++] = (struct sched_domain_topology_level){ - cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) - }; + DOMAIN(cpu_clustergroup_mask, x86_cluster_flags, CLS), #endif #ifdef CONFIG_SCHED_MC - x86_topology[i++] = (struct sched_domain_topology_level){ - cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) - }; + DOMAIN(cpu_coregroup_mask, x86_core_flags, MC), #endif - /* - * When there is NUMA topology inside the package skip the PKG domain - * since the NUMA domains will auto-magically create the right spanning - * domains based on the SLIT. - */ - if (!x86_has_numa_in_package) { - x86_topology[i++] = (struct sched_domain_topology_level){ - cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG) - }; - } + DOMAIN(cpu_cpu_mask, x86_sched_itmt_flags, PKG), + { NULL }, +}; +static void __init build_sched_topology(void) +{ /* - * There must be one trailing NULL entry left. + * When there is NUMA topology inside the package invalidate the + * PKG domain since the NUMA domains will auto-magically create the + * right spanning domains based on the SLIT. */ - BUG_ON(i >= ARRAY_SIZE(x86_topology)-1); + if (x86_has_numa_in_package) { + unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; + memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); + } set_sched_topology(x86_topology); } -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() 2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen @ 2025-06-25 8:28 ` Peter Zijlstra 2025-06-30 0:21 ` Li Chen 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2025-06-25 8:28 UTC (permalink / raw) To: Li Chen Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Li Chen On Wed, Jun 25, 2025 at 11:45:49AM +0800, Li Chen wrote: > From: Thomas Gleixner <tglx@linutronix.de> > > The #ifdeffery and the initializers in build_sched_topology() are just > disgusting. The SCHED_SMT #ifdef is also pointless because SCHED_SMT is > unconditionally enabled when SMP is enabled. On x86, but not across all archs. Yes this is x86 code, but how is one supposed to keep all that nonsense straight in their head ;-) > Statically initialize the domain levels in the topology array and let > build_sched_topology() invalidate the package domain level when NUMA in > package is available. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > arch/x86/kernel/smpboot.c | 45 +++++++++++++++------------------------ > 1 file changed, 17 insertions(+), 28 deletions(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index fc78c2325fd29..7d202f9785362 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -478,43 +478,32 @@ static int x86_cluster_flags(void) > */ > static bool x86_has_numa_in_package; > > -static struct sched_domain_topology_level x86_topology[6]; > +#define DOMAIN(maskfn, flagsfn, dname) { .mask = maskfn, .sd_flags = flagsfn, .name = #dname } > > -static void __init build_sched_topology(void) > -{ > - int i = 0; > - > -#ifdef CONFIG_SCHED_SMT > - x86_topology[i++] = (struct sched_domain_topology_level){ > - cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) > - }; > -#endif > +static struct sched_domain_topology_level x86_topology[] = { > + DOMAIN(cpu_smt_mask, cpu_smt_flags, SMT), > #ifdef CONFIG_SCHED_CLUSTER > - x86_topology[i++] = (struct sched_domain_topology_level){ > - cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) > - }; > + DOMAIN(cpu_clustergroup_mask, x86_cluster_flags, CLS), > #endif > #ifdef CONFIG_SCHED_MC > - x86_topology[i++] = (struct sched_domain_topology_level){ > - cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) > - }; > + DOMAIN(cpu_coregroup_mask, x86_core_flags, MC), > #endif > - /* > - * When there is NUMA topology inside the package skip the PKG domain > - * since the NUMA domains will auto-magically create the right spanning > - * domains based on the SLIT. > - */ > - if (!x86_has_numa_in_package) { > - x86_topology[i++] = (struct sched_domain_topology_level){ > - cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG) > - }; > - } > + DOMAIN(cpu_cpu_mask, x86_sched_itmt_flags, PKG), > + { NULL }, > +}; > > +static void __init build_sched_topology(void) > +{ > /* > - * There must be one trailing NULL entry left. > + * When there is NUMA topology inside the package invalidate the > + * PKG domain since the NUMA domains will auto-magically create the > + * right spanning domains based on the SLIT. > */ > - BUG_ON(i >= ARRAY_SIZE(x86_topology)-1); > + if (x86_has_numa_in_package) { > + unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; > > + memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > + } > set_sched_topology(x86_topology); > } Urgh, this patch is doing at least 4 things and nigh on unreadable because of that. - introduces DOMAIN() helper - drops (the now pointless) SD_INIT_NAME() helper - drops CONFIG_SCHED_SMT (x86 special) - moves to static initialize and truncate ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() 2025-06-25 8:28 ` Peter Zijlstra @ 2025-06-30 0:21 ` Li Chen 2025-06-30 12:56 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Li Chen @ 2025-06-30 0:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Li Chen Hi Peter, ---- On Wed, 25 Jun 2025 16:28:19 +0800 Peter Zijlstra <peterz@infradead.org> wrote --- > On Wed, Jun 25, 2025 at 11:45:49AM +0800, Li Chen wrote: > > From: Thomas Gleixner <tglx@linutronix.de> > > > > The #ifdeffery and the initializers in build_sched_topology() are just > > disgusting. The SCHED_SMT #ifdef is also pointless because SCHED_SMT is > > unconditionally enabled when SMP is enabled. > > On x86, but not across all archs. Yes this is x86 code, but how is one > supposed to keep all that nonsense straight in their head ;-) > > > Statically initialize the domain levels in the topology array and let > > build_sched_topology() invalidate the package domain level when NUMA in > > package is available. > > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > --- > > arch/x86/kernel/smpboot.c | 45 +++++++++++++++------------------------ > > 1 file changed, 17 insertions(+), 28 deletions(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index fc78c2325fd29..7d202f9785362 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -478,43 +478,32 @@ static int x86_cluster_flags(void) > > */ > > static bool x86_has_numa_in_package; > > > > -static struct sched_domain_topology_level x86_topology[6]; > > +#define DOMAIN(maskfn, flagsfn, dname) { .mask = maskfn, .sd_flags = flagsfn, .name = #dname } > > > > -static void __init build_sched_topology(void) > > -{ > > - int i = 0; > > - > > -#ifdef CONFIG_SCHED_SMT > > - x86_topology[i++] = (struct sched_domain_topology_level){ > > - cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) > > - }; > > -#endif > > +static struct sched_domain_topology_level x86_topology[] = { > > + DOMAIN(cpu_smt_mask, cpu_smt_flags, SMT), > > #ifdef CONFIG_SCHED_CLUSTER > > - x86_topology[i++] = (struct sched_domain_topology_level){ > > - cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS) > > - }; > > + DOMAIN(cpu_clustergroup_mask, x86_cluster_flags, CLS), > > #endif > > #ifdef CONFIG_SCHED_MC > > - x86_topology[i++] = (struct sched_domain_topology_level){ > > - cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) > > - }; > > + DOMAIN(cpu_coregroup_mask, x86_core_flags, MC), > > #endif > > - /* > > - * When there is NUMA topology inside the package skip the PKG domain > > - * since the NUMA domains will auto-magically create the right spanning > > - * domains based on the SLIT. > > - */ > > - if (!x86_has_numa_in_package) { > > - x86_topology[i++] = (struct sched_domain_topology_level){ > > - cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG) > > - }; > > - } > > + DOMAIN(cpu_cpu_mask, x86_sched_itmt_flags, PKG), > > + { NULL }, > > +}; > > > > +static void __init build_sched_topology(void) > > +{ > > /* > > - * There must be one trailing NULL entry left. > > + * When there is NUMA topology inside the package invalidate the > > + * PKG domain since the NUMA domains will auto-magically create the > > + * right spanning domains based on the SLIT. > > */ > > - BUG_ON(i >= ARRAY_SIZE(x86_topology)-1); > > + if (x86_has_numa_in_package) { > > + unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; > > > > + memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > > + } > > set_sched_topology(x86_topology); > > } > > Urgh, this patch is doing at least 4 things and nigh on unreadable > because of that. > > - introduces DOMAIN() helper > - drops (the now pointless) SD_INIT_NAME() helper > - drops CONFIG_SCHED_SMT (x86 special) > - moves to static initialize and truncate Thanks for your review, I would split this Thomas's patch to 4 different patches and preserve his signoff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() 2025-06-30 0:21 ` Li Chen @ 2025-06-30 12:56 ` Thomas Gleixner 0 siblings, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2025-06-30 12:56 UTC (permalink / raw) To: Li Chen, Peter Zijlstra Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Li Chen On Mon, Jun 30 2025 at 08:21, Li Chen wrote: > ---- On Wed, 25 Jun 2025 16:28:19 +0800 Peter Zijlstra <peterz@infradead.org> wrote --- > > Urgh, this patch is doing at least 4 things and nigh on unreadable > > because of that. > > > > - introduces DOMAIN() helper > > - drops (the now pointless) SD_INIT_NAME() helper > > - drops CONFIG_SCHED_SMT (x86 special) > > - moves to static initialize and truncate > > Thanks for your review, I would split this Thomas's patch to 4 different patches > and preserve his signoff Nah. Just split it up and add Suggested-by: Thomas .... ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled 2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen 2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen @ 2025-06-25 3:45 ` Li Chen 2025-06-25 5:45 ` K Prateek Nayak 2025-06-25 8:29 ` Peter Zijlstra 2025-06-25 5:50 ` [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level K Prateek Nayak 2 siblings, 2 replies; 12+ messages in thread From: Li Chen @ 2025-06-25 3:45 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy, Li Chen From: Li Chen <chenl311@chinatelecom.cn> Currently, the SMT domain is added into sched_domain_topology by default. If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight is just 1, it will destroy it. On a large machine, such as one with 512 cores, this results in 512 redundant domain attach/destroy operations. Avoid these unnecessary operations by simply checking cpu_smt_num_threads and remove SMT domain if the SMT domain is not enabled, and adjust the PKG index accordingly if NUMA-in-package invalidates that level as well. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> --- changelog: v2: fix wording issue as suggested by Thomas [1] v3: remove pointless memset and adjust PKG index accordingly, as suggested by Thomas [2] [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/ [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/ arch/x86/kernel/smpboot.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 7d202f9785362..4b6daa1545445 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = { static void __init build_sched_topology(void) { + bool smt_dropped = false; + + if (cpu_smt_num_threads <= 1) { + /* + * SMT level is x86_topology[0]. Shift the array left by one, + */ + memmove(&x86_topology[0], &x86_topology[1], + sizeof(x86_topology) - sizeof(x86_topology[0])); + smt_dropped = true; + } + /* * When there is NUMA topology inside the package invalidate the * PKG domain since the NUMA domains will auto-magically create the * right spanning domains based on the SLIT. */ if (x86_has_numa_in_package) { - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; + unsigned int pkgdom; + + if (smt_dropped) + pkgdom = ARRAY_SIZE(x86_topology) - 3; + else + pkgdom = ARRAY_SIZE(x86_topology) - 2; memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); } -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled 2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen @ 2025-06-25 5:45 ` K Prateek Nayak 2025-06-25 8:30 ` Peter Zijlstra 2025-06-30 0:39 ` Li Chen 2025-06-25 8:29 ` Peter Zijlstra 1 sibling, 2 replies; 12+ messages in thread From: K Prateek Nayak @ 2025-06-25 5:45 UTC (permalink / raw) To: Li Chen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy, Li Chen Hello Li, On 6/25/2025 9:15 AM, Li Chen wrote: > From: Li Chen <chenl311@chinatelecom.cn> > > Currently, the SMT domain is added into sched_domain_topology by default. > > If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight > is just 1, it will destroy it. > > On a large machine, such as one with 512 cores, this results in > 512 redundant domain attach/destroy operations. > > Avoid these unnecessary operations by simply checking > cpu_smt_num_threads and remove SMT domain if the SMT domain is not > enabled, and adjust the PKG index accordingly if NUMA-in-package > invalidates that level as well. > > Signed-off-by: Li Chen <chenl311@chinatelecom.cn> > --- > changelog: > v2: fix wording issue as suggested by Thomas [1] > v3: remove pointless memset and adjust PKG index accordingly, > as suggested by Thomas [2] > > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/ > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/ > > arch/x86/kernel/smpboot.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 7d202f9785362..4b6daa1545445 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = { > > static void __init build_sched_topology(void) > { > + bool smt_dropped = false; > + > + if (cpu_smt_num_threads <= 1) { > + /* > + * SMT level is x86_topology[0]. Shift the array left by one, > + */ > + memmove(&x86_topology[0], &x86_topology[1], > + sizeof(x86_topology) - sizeof(x86_topology[0])); > + smt_dropped = true; > + } > + Instead of this memmove, wouldn't just passing the "&x86_topology[1]" to set_sched_topology() when "cpu_smt_num_threads <= 1" work? Something like below: (Only tested on a QEMU based VM) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 3818b16c19e1..1793248d28cd 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -494,6 +494,8 @@ static struct sched_domain_topology_level x86_topology[] = { static void __init build_sched_topology(void) { + struct sched_domain_topology_level *topology = x86_topology; + /* * When there is NUMA topology inside the package invalidate the * PKG domain since the NUMA domains will auto-magically create the @@ -504,7 +506,15 @@ static void __init build_sched_topology(void) memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); } - set_sched_topology(x86_topology); + + /* + * Drop the SMT domains if there is only one thread per-core + * since it'll get degenerated by the scheduler anyways. + */ + if (cpu_smt_num_threads <= 1) + ++topology; + + set_sched_topology(topology); } void set_cpu_sibling_map(int cpu) > /* > * When there is NUMA topology inside the package invalidate the > * PKG domain since the NUMA domains will auto-magically create the > * right spanning domains based on the SLIT. > */ > if (x86_has_numa_in_package) { > - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; > + unsigned int pkgdom; > + > + if (smt_dropped) > + pkgdom = ARRAY_SIZE(x86_topology) - 3; > + else > + pkgdom = ARRAY_SIZE(x86_topology) - 2; > > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > } -- Thanks and Regards, Prateek ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled 2025-06-25 5:45 ` K Prateek Nayak @ 2025-06-25 8:30 ` Peter Zijlstra 2025-06-30 0:39 ` Li Chen 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2025-06-25 8:30 UTC (permalink / raw) To: K Prateek Nayak Cc: Li Chen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy, Li Chen On Wed, Jun 25, 2025 at 11:15:22AM +0530, K Prateek Nayak wrote: > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 3818b16c19e1..1793248d28cd 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -494,6 +494,8 @@ static struct sched_domain_topology_level x86_topology[] = { > static void __init build_sched_topology(void) > { > + struct sched_domain_topology_level *topology = x86_topology; > + > /* > * When there is NUMA topology inside the package invalidate the > * PKG domain since the NUMA domains will auto-magically create the > @@ -504,7 +506,15 @@ static void __init build_sched_topology(void) > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > } > - set_sched_topology(x86_topology); > + > + /* > + * Drop the SMT domains if there is only one thread per-core > + * since it'll get degenerated by the scheduler anyways. > + */ > + if (cpu_smt_num_threads <= 1) > + ++topology; > + > + set_sched_topology(topology); > } > void set_cpu_sibling_map(int cpu) Yes, that is far saner. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled 2025-06-25 5:45 ` K Prateek Nayak 2025-06-25 8:30 ` Peter Zijlstra @ 2025-06-30 0:39 ` Li Chen 1 sibling, 0 replies; 12+ messages in thread From: Li Chen @ 2025-06-30 0:39 UTC (permalink / raw) To: K Prateek Nayak Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy, Li Chen Hi K, ---- On Wed, 25 Jun 2025 13:45:22 +0800 K Prateek Nayak <kprateek.nayak@amd.com> wrote --- > Hello Li, > > On 6/25/2025 9:15 AM, Li Chen wrote: > > From: Li Chen <chenl311@chinatelecom.cn> > > > > Currently, the SMT domain is added into sched_domain_topology by default. > > > > If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight > > is just 1, it will destroy it. > > > > On a large machine, such as one with 512 cores, this results in > > 512 redundant domain attach/destroy operations. > > > > Avoid these unnecessary operations by simply checking > > cpu_smt_num_threads and remove SMT domain if the SMT domain is not > > enabled, and adjust the PKG index accordingly if NUMA-in-package > > invalidates that level as well. > > > > Signed-off-by: Li Chen <chenl311@chinatelecom.cn> > > --- > > changelog: > > v2: fix wording issue as suggested by Thomas [1] > > v3: remove pointless memset and adjust PKG index accordingly, > > as suggested by Thomas [2] > > > > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/ > > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/ > > > > arch/x86/kernel/smpboot.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index 7d202f9785362..4b6daa1545445 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = { > > > > static void __init build_sched_topology(void) > > { > > + bool smt_dropped = false; > > + > > + if (cpu_smt_num_threads <= 1) { > > + /* > > + * SMT level is x86_topology[0]. Shift the array left by one, > > + */ > > + memmove(&x86_topology[0], &x86_topology[1], > > + sizeof(x86_topology) - sizeof(x86_topology[0])); > > + smt_dropped = true; > > + } > > + > > Instead of this memmove, wouldn't just passing the "&x86_topology[1]" > to set_sched_topology() when "cpu_smt_num_threads <= 1" work? > > Something like below: > > (Only tested on a QEMU based VM) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 3818b16c19e1..1793248d28cd 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -494,6 +494,8 @@ static struct sched_domain_topology_level x86_topology[] = { > > static void __init build_sched_topology(void) > { > + struct sched_domain_topology_level *topology = x86_topology; > + > /* > * When there is NUMA topology inside the package invalidate the > * PKG domain since the NUMA domains will auto-magically create the > @@ -504,7 +506,15 @@ static void __init build_sched_topology(void) > > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > } > - set_sched_topology(x86_topology); > + > + /* > + * Drop the SMT domains if there is only one thread per-core > + * since it'll get degenerated by the scheduler anyways. > + */ > + if (cpu_smt_num_threads <= 1) > + ++topology; > + > + set_sched_topology(topology); > } > > void set_cpu_sibling_map(int cpu) Yes, I also agree. I would switch to it in the next series and add your signoff. Thanks! > > /* > > * When there is NUMA topology inside the package invalidate the > > * PKG domain since the NUMA domains will auto-magically create the > > * right spanning domains based on the SLIT. > > */ > > if (x86_has_numa_in_package) { > > - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; > > + unsigned int pkgdom; > > + > > + if (smt_dropped) > > + pkgdom = ARRAY_SIZE(x86_topology) - 3; > > + else > > + pkgdom = ARRAY_SIZE(x86_topology) - 2; > > > > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > > } > > -- > Thanks and Regards, > Prateek > > Regards, Li ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled 2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen 2025-06-25 5:45 ` K Prateek Nayak @ 2025-06-25 8:29 ` Peter Zijlstra 2025-06-30 0:29 ` Li Chen 1 sibling, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2025-06-25 8:29 UTC (permalink / raw) To: Li Chen Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy, Li Chen On Wed, Jun 25, 2025 at 11:45:50AM +0800, Li Chen wrote: > From: Li Chen <chenl311@chinatelecom.cn> > > Currently, the SMT domain is added into sched_domain_topology by default. > > If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight > is just 1, it will destroy it. > > On a large machine, such as one with 512 cores, this results in > 512 redundant domain attach/destroy operations. > > Avoid these unnecessary operations by simply checking > cpu_smt_num_threads and remove SMT domain if the SMT domain is not > enabled, and adjust the PKG index accordingly if NUMA-in-package > invalidates that level as well. > > Signed-off-by: Li Chen <chenl311@chinatelecom.cn> > --- > changelog: > v2: fix wording issue as suggested by Thomas [1] > v3: remove pointless memset and adjust PKG index accordingly, > as suggested by Thomas [2] > > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/ > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/ > > arch/x86/kernel/smpboot.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 7d202f9785362..4b6daa1545445 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = { > > static void __init build_sched_topology(void) > { > + bool smt_dropped = false; > + > + if (cpu_smt_num_threads <= 1) { > + /* > + * SMT level is x86_topology[0]. Shift the array left by one, > + */ > + memmove(&x86_topology[0], &x86_topology[1], > + sizeof(x86_topology) - sizeof(x86_topology[0])); > + smt_dropped = true; > + } > + > /* > * When there is NUMA topology inside the package invalidate the > * PKG domain since the NUMA domains will auto-magically create the > * right spanning domains based on the SLIT. > */ > if (x86_has_numa_in_package) { > - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; > + unsigned int pkgdom; > + > + if (smt_dropped) > + pkgdom = ARRAY_SIZE(x86_topology) - 3; > + else > + pkgdom = ARRAY_SIZE(x86_topology) - 2; > > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > } Oh gawd, and you all really think this is better than dynamically creating that array? This is quite terrible. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled 2025-06-25 8:29 ` Peter Zijlstra @ 2025-06-30 0:29 ` Li Chen 0 siblings, 0 replies; 12+ messages in thread From: Li Chen @ 2025-06-30 0:29 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy, Li Chen Hi Peter, ---- On Wed, 25 Jun 2025 16:29:32 +0800 Peter Zijlstra <peterz@infradead.org> wrote --- > On Wed, Jun 25, 2025 at 11:45:50AM +0800, Li Chen wrote: > > From: Li Chen <chenl311@chinatelecom.cn> > > > > Currently, the SMT domain is added into sched_domain_topology by default. > > > > If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight > > is just 1, it will destroy it. > > > > On a large machine, such as one with 512 cores, this results in > > 512 redundant domain attach/destroy operations. > > > > Avoid these unnecessary operations by simply checking > > cpu_smt_num_threads and remove SMT domain if the SMT domain is not > > enabled, and adjust the PKG index accordingly if NUMA-in-package > > invalidates that level as well. > > > > Signed-off-by: Li Chen <chenl311@chinatelecom.cn> > > --- > > changelog: > > v2: fix wording issue as suggested by Thomas [1] > > v3: remove pointless memset and adjust PKG index accordingly, > > as suggested by Thomas [2] > > > > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/ > > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/ > > > > arch/x86/kernel/smpboot.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index 7d202f9785362..4b6daa1545445 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = { > > > > static void __init build_sched_topology(void) > > { > > + bool smt_dropped = false; > > + > > + if (cpu_smt_num_threads <= 1) { > > + /* > > + * SMT level is x86_topology[0]. Shift the array left by one, > > + */ > > + memmove(&x86_topology[0], &x86_topology[1], > > + sizeof(x86_topology) - sizeof(x86_topology[0])); > > + smt_dropped = true; > > + } > > + > > /* > > * When there is NUMA topology inside the package invalidate the > > * PKG domain since the NUMA domains will auto-magically create the > > * right spanning domains based on the SLIT. > > */ > > if (x86_has_numa_in_package) { > > - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2; > > + unsigned int pkgdom; > > + > > + if (smt_dropped) > > + pkgdom = ARRAY_SIZE(x86_topology) - 3; > > + else > > + pkgdom = ARRAY_SIZE(x86_topology) - 2; > > > > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom])); > > } > > Oh gawd, and you all really think this is better than dynamically > creating that array? > > This is quite terrible. > Do Thomas and you still need to agree on switching to a static array? I'm unsure of the rule. BTW, initially, I had a patch that didn't include that dynamic/static array change. I don't know if this is ok for you: https://lore.kernel.org/all/1965cae22a0.12ab5a70c833868.7155412488566097801@linux.beauty/ Regards, Li ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level 2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen 2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen 2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen @ 2025-06-25 5:50 ` K Prateek Nayak 2 siblings, 0 replies; 12+ messages in thread From: K Prateek Nayak @ 2025-06-25 5:50 UTC (permalink / raw) To: Li Chen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra, Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel Hello Li, On 6/25/2025 9:15 AM, Li Chen wrote: > From: Li Chen <chenl311@chinatelecom.cn> > > This two–patch series cleans up sched-domain topology handling and > eliminates hundreds of pointless attach/destroy cycles for large > machines when SMT is not available. > > Patch 1 (from Thomas, unchanged) gets rid of the #ifdef maze in > build_sched_topology() by statically initialising the topology array. > > Patch 2 (mine) is a follow-up that simply memmoves the array when > cpu_smt_num_threads <= 1, so the SMT level never gets created and > immediately torn down again. > > Tested on Qemu. I think there is a simpler approach to Patch 2 but other than that I gave this series a spin during my review with different topology on QEMU and didn't find any surprises. I tested with and without the "threads" parameter to "-smp" and also with "nosmt" kernel cmdline and everything seemed to be in order. Feel free to add: Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> > > changelog: > v2: fix wording issue as suggested by Thomas [1] > v3: remove pointless memset and adjust PKG index accordingly, as > suggested by Thomas [2], and refine some other wording issues. > > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/ > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/ > > Li Chen (1): > x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled > > Thomas Gleixner (1): > x86/smpboot: Decrapify build_sched_topology() > > arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++------------------ > 1 file changed, 32 insertions(+), 27 deletions(-) > -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-30 12:56 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen 2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen 2025-06-25 8:28 ` Peter Zijlstra 2025-06-30 0:21 ` Li Chen 2025-06-30 12:56 ` Thomas Gleixner 2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen 2025-06-25 5:45 ` K Prateek Nayak 2025-06-25 8:30 ` Peter Zijlstra 2025-06-30 0:39 ` Li Chen 2025-06-25 8:29 ` Peter Zijlstra 2025-06-30 0:29 ` Li Chen 2025-06-25 5:50 ` [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level K Prateek Nayak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox