* [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level
@ 2025-06-25 3:45 Li Chen
2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Li Chen @ 2025-06-25 3:45 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra,
K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn,
linux-kernel
From: Li Chen <chenl311@chinatelecom.cn>
This two–patch series cleans up sched-domain topology handling and
eliminates hundreds of pointless attach/destroy cycles for large
machines when SMT is not available.
Patch 1 (from Thomas, unchanged) gets rid of the #ifdef maze in
build_sched_topology() by statically initialising the topology array.
Patch 2 (mine) is a follow-up that simply memmoves the array when
cpu_smt_num_threads <= 1, so the SMT level never gets created and
immediately torn down again.
Tested on Qemu.
changelog:
v2: fix wording issue as suggested by Thomas [1]
v3: remove pointless memset and adjust PKG index accordingly, as
suggested by Thomas [2], and refine some other wording issues.
[1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
[2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
Li Chen (1):
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
Thomas Gleixner (1):
x86/smpboot: Decrapify build_sched_topology()
arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 27 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology()
2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen
@ 2025-06-25 3:45 ` Li Chen
2025-06-25 8:28 ` Peter Zijlstra
2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen
2025-06-25 5:50 ` [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level K Prateek Nayak
2 siblings, 1 reply; 12+ messages in thread
From: Li Chen @ 2025-06-25 3:45 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra,
K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn,
linux-kernel, Li Chen
From: Thomas Gleixner <tglx@linutronix.de>
The #ifdeffery and the initializers in build_sched_topology() are just
disgusting. The SCHED_SMT #ifdef is also pointless because SCHED_SMT is
unconditionally enabled when SMP is enabled.
Statically initialize the domain levels in the topology array and let
build_sched_topology() invalidate the package domain level when NUMA in
package is available.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/smpboot.c | 45 +++++++++++++++------------------------
1 file changed, 17 insertions(+), 28 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fc78c2325fd29..7d202f9785362 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -478,43 +478,32 @@ static int x86_cluster_flags(void)
*/
static bool x86_has_numa_in_package;
-static struct sched_domain_topology_level x86_topology[6];
+#define DOMAIN(maskfn, flagsfn, dname) { .mask = maskfn, .sd_flags = flagsfn, .name = #dname }
-static void __init build_sched_topology(void)
-{
- int i = 0;
-
-#ifdef CONFIG_SCHED_SMT
- x86_topology[i++] = (struct sched_domain_topology_level){
- cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT)
- };
-#endif
+static struct sched_domain_topology_level x86_topology[] = {
+ DOMAIN(cpu_smt_mask, cpu_smt_flags, SMT),
#ifdef CONFIG_SCHED_CLUSTER
- x86_topology[i++] = (struct sched_domain_topology_level){
- cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS)
- };
+ DOMAIN(cpu_clustergroup_mask, x86_cluster_flags, CLS),
#endif
#ifdef CONFIG_SCHED_MC
- x86_topology[i++] = (struct sched_domain_topology_level){
- cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC)
- };
+ DOMAIN(cpu_coregroup_mask, x86_core_flags, MC),
#endif
- /*
- * When there is NUMA topology inside the package skip the PKG domain
- * since the NUMA domains will auto-magically create the right spanning
- * domains based on the SLIT.
- */
- if (!x86_has_numa_in_package) {
- x86_topology[i++] = (struct sched_domain_topology_level){
- cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG)
- };
- }
+ DOMAIN(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
+ { NULL },
+};
+static void __init build_sched_topology(void)
+{
/*
- * There must be one trailing NULL entry left.
+ * When there is NUMA topology inside the package invalidate the
+ * PKG domain since the NUMA domains will auto-magically create the
+ * right spanning domains based on the SLIT.
*/
- BUG_ON(i >= ARRAY_SIZE(x86_topology)-1);
+ if (x86_has_numa_in_package) {
+ unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
+ memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
+ }
set_sched_topology(x86_topology);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen
2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen
@ 2025-06-25 3:45 ` Li Chen
2025-06-25 5:45 ` K Prateek Nayak
2025-06-25 8:29 ` Peter Zijlstra
2025-06-25 5:50 ` [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level K Prateek Nayak
2 siblings, 2 replies; 12+ messages in thread
From: Li Chen @ 2025-06-25 3:45 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra,
K Prateek Nayak, Sohil Mehta, Brian Gerst, Patryk Wlazlyn,
linux-kernel, Gautham R. Shenoy, Li Chen
From: Li Chen <chenl311@chinatelecom.cn>
Currently, the SMT domain is added into sched_domain_topology by default.
If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight
is just 1, it will destroy it.
On a large machine, such as one with 512 cores, this results in
512 redundant domain attach/destroy operations.
Avoid these unnecessary operations by simply checking
cpu_smt_num_threads and remove SMT domain if the SMT domain is not
enabled, and adjust the PKG index accordingly if NUMA-in-package
invalidates that level as well.
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
---
changelog:
v2: fix wording issue as suggested by Thomas [1]
v3: remove pointless memset and adjust PKG index accordingly,
as suggested by Thomas [2]
[1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
[2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
arch/x86/kernel/smpboot.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7d202f9785362..4b6daa1545445 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = {
static void __init build_sched_topology(void)
{
+ bool smt_dropped = false;
+
+ if (cpu_smt_num_threads <= 1) {
+ /*
+ * SMT level is x86_topology[0]. Shift the array left by one,
+ */
+ memmove(&x86_topology[0], &x86_topology[1],
+ sizeof(x86_topology) - sizeof(x86_topology[0]));
+ smt_dropped = true;
+ }
+
/*
* When there is NUMA topology inside the package invalidate the
* PKG domain since the NUMA domains will auto-magically create the
* right spanning domains based on the SLIT.
*/
if (x86_has_numa_in_package) {
- unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
+ unsigned int pkgdom;
+
+ if (smt_dropped)
+ pkgdom = ARRAY_SIZE(x86_topology) - 3;
+ else
+ pkgdom = ARRAY_SIZE(x86_topology) - 2;
memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
}
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen
@ 2025-06-25 5:45 ` K Prateek Nayak
2025-06-25 8:30 ` Peter Zijlstra
2025-06-30 0:39 ` Li Chen
2025-06-25 8:29 ` Peter Zijlstra
1 sibling, 2 replies; 12+ messages in thread
From: K Prateek Nayak @ 2025-06-25 5:45 UTC (permalink / raw)
To: Li Chen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki,
Peter Zijlstra, Sohil Mehta, Brian Gerst, Patryk Wlazlyn,
linux-kernel, Gautham R. Shenoy, Li Chen
Hello Li,
On 6/25/2025 9:15 AM, Li Chen wrote:
> From: Li Chen <chenl311@chinatelecom.cn>
>
> Currently, the SMT domain is added into sched_domain_topology by default.
>
> If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight
> is just 1, it will destroy it.
>
> On a large machine, such as one with 512 cores, this results in
> 512 redundant domain attach/destroy operations.
>
> Avoid these unnecessary operations by simply checking
> cpu_smt_num_threads and remove SMT domain if the SMT domain is not
> enabled, and adjust the PKG index accordingly if NUMA-in-package
> invalidates that level as well.
>
> Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
> ---
> changelog:
> v2: fix wording issue as suggested by Thomas [1]
> v3: remove pointless memset and adjust PKG index accordingly,
> as suggested by Thomas [2]
>
> [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
> [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
>
> arch/x86/kernel/smpboot.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 7d202f9785362..4b6daa1545445 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = {
>
> static void __init build_sched_topology(void)
> {
> + bool smt_dropped = false;
> +
> + if (cpu_smt_num_threads <= 1) {
> + /*
> + * SMT level is x86_topology[0]. Shift the array left by one,
> + */
> + memmove(&x86_topology[0], &x86_topology[1],
> + sizeof(x86_topology) - sizeof(x86_topology[0]));
> + smt_dropped = true;
> + }
> +
Instead of this memmove, wouldn't just passing the "&x86_topology[1]"
to set_sched_topology() when "cpu_smt_num_threads <= 1" work?
Something like below:
(Only tested on a QEMU based VM)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3818b16c19e1..1793248d28cd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -494,6 +494,8 @@ static struct sched_domain_topology_level x86_topology[] = {
static void __init build_sched_topology(void)
{
+ struct sched_domain_topology_level *topology = x86_topology;
+
/*
* When there is NUMA topology inside the package invalidate the
* PKG domain since the NUMA domains will auto-magically create the
@@ -504,7 +506,15 @@ static void __init build_sched_topology(void)
memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
}
- set_sched_topology(x86_topology);
+
+ /*
+ * Drop the SMT domains if there is only one thread per-core
+ * since it'll get degenerated by the scheduler anyways.
+ */
+ if (cpu_smt_num_threads <= 1)
+ ++topology;
+
+ set_sched_topology(topology);
}
void set_cpu_sibling_map(int cpu)
> /*
> * When there is NUMA topology inside the package invalidate the
> * PKG domain since the NUMA domains will auto-magically create the
> * right spanning domains based on the SLIT.
> */
> if (x86_has_numa_in_package) {
> - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
> + unsigned int pkgdom;
> +
> + if (smt_dropped)
> + pkgdom = ARRAY_SIZE(x86_topology) - 3;
> + else
> + pkgdom = ARRAY_SIZE(x86_topology) - 2;
>
> memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> }
--
Thanks and Regards,
Prateek
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level
2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen
2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen
2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen
@ 2025-06-25 5:50 ` K Prateek Nayak
2 siblings, 0 replies; 12+ messages in thread
From: K Prateek Nayak @ 2025-06-25 5:50 UTC (permalink / raw)
To: Li Chen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki,
Peter Zijlstra, Sohil Mehta, Brian Gerst, Patryk Wlazlyn,
linux-kernel
Hello Li,
On 6/25/2025 9:15 AM, Li Chen wrote:
> From: Li Chen <chenl311@chinatelecom.cn>
>
> This two–patch series cleans up sched-domain topology handling and
> eliminates hundreds of pointless attach/destroy cycles for large
> machines when SMT is not available.
>
> Patch 1 (from Thomas, unchanged) gets rid of the #ifdef maze in
> build_sched_topology() by statically initialising the topology array.
>
> Patch 2 (mine) is a follow-up that simply memmoves the array when
> cpu_smt_num_threads <= 1, so the SMT level never gets created and
> immediately torn down again.
>
> Tested on Qemu.
I think there is a simpler approach to Patch 2 but other than that
I gave this series a spin during my review with different topology
on QEMU and didn't find any surprises. I tested with and without
the "threads" parameter to "-smp" and also with "nosmt" kernel
cmdline and everything seemed to be in order. Feel free to add:
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>
> changelog:
> v2: fix wording issue as suggested by Thomas [1]
> v3: remove pointless memset and adjust PKG index accordingly, as
> suggested by Thomas [2], and refine some other wording issues.
>
> [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
> [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
>
> Li Chen (1):
> x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
>
> Thomas Gleixner (1):
> x86/smpboot: Decrapify build_sched_topology()
>
> arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++------------------
> 1 file changed, 32 insertions(+), 27 deletions(-)
>
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology()
2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen
@ 2025-06-25 8:28 ` Peter Zijlstra
2025-06-30 0:21 ` Li Chen
0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2025-06-25 8:28 UTC (permalink / raw)
To: Li Chen
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta,
Brian Gerst, Patryk Wlazlyn, linux-kernel, Li Chen
On Wed, Jun 25, 2025 at 11:45:49AM +0800, Li Chen wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The #ifdeffery and the initializers in build_sched_topology() are just
> disgusting. The SCHED_SMT #ifdef is also pointless because SCHED_SMT is
> unconditionally enabled when SMP is enabled.
On x86, but not across all archs. Yes this is x86 code, but how is one
supposed to keep all that nonsense straight in their head ;-)
> Statically initialize the domain levels in the topology array and let
> build_sched_topology() invalidate the package domain level when NUMA in
> package is available.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> arch/x86/kernel/smpboot.c | 45 +++++++++++++++------------------------
> 1 file changed, 17 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index fc78c2325fd29..7d202f9785362 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -478,43 +478,32 @@ static int x86_cluster_flags(void)
> */
> static bool x86_has_numa_in_package;
>
> -static struct sched_domain_topology_level x86_topology[6];
> +#define DOMAIN(maskfn, flagsfn, dname) { .mask = maskfn, .sd_flags = flagsfn, .name = #dname }
>
> -static void __init build_sched_topology(void)
> -{
> - int i = 0;
> -
> -#ifdef CONFIG_SCHED_SMT
> - x86_topology[i++] = (struct sched_domain_topology_level){
> - cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT)
> - };
> -#endif
> +static struct sched_domain_topology_level x86_topology[] = {
> + DOMAIN(cpu_smt_mask, cpu_smt_flags, SMT),
> #ifdef CONFIG_SCHED_CLUSTER
> - x86_topology[i++] = (struct sched_domain_topology_level){
> - cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS)
> - };
> + DOMAIN(cpu_clustergroup_mask, x86_cluster_flags, CLS),
> #endif
> #ifdef CONFIG_SCHED_MC
> - x86_topology[i++] = (struct sched_domain_topology_level){
> - cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC)
> - };
> + DOMAIN(cpu_coregroup_mask, x86_core_flags, MC),
> #endif
> - /*
> - * When there is NUMA topology inside the package skip the PKG domain
> - * since the NUMA domains will auto-magically create the right spanning
> - * domains based on the SLIT.
> - */
> - if (!x86_has_numa_in_package) {
> - x86_topology[i++] = (struct sched_domain_topology_level){
> - cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG)
> - };
> - }
> + DOMAIN(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
> + { NULL },
> +};
>
> +static void __init build_sched_topology(void)
> +{
> /*
> - * There must be one trailing NULL entry left.
> + * When there is NUMA topology inside the package invalidate the
> + * PKG domain since the NUMA domains will auto-magically create the
> + * right spanning domains based on the SLIT.
> */
> - BUG_ON(i >= ARRAY_SIZE(x86_topology)-1);
> + if (x86_has_numa_in_package) {
> + unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
>
> + memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> + }
> set_sched_topology(x86_topology);
> }
Urgh, this patch is doing at least 4 things and nigh on unreadable
because of that.
- introduces DOMAIN() helper
- drops (the now pointless) SD_INIT_NAME() helper
- drops CONFIG_SCHED_SMT (x86 special)
- moves to static initialize and truncate
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen
2025-06-25 5:45 ` K Prateek Nayak
@ 2025-06-25 8:29 ` Peter Zijlstra
2025-06-30 0:29 ` Li Chen
1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2025-06-25 8:29 UTC (permalink / raw)
To: Li Chen
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta,
Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy,
Li Chen
On Wed, Jun 25, 2025 at 11:45:50AM +0800, Li Chen wrote:
> From: Li Chen <chenl311@chinatelecom.cn>
>
> Currently, the SMT domain is added into sched_domain_topology by default.
>
> If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight
> is just 1, it will destroy it.
>
> On a large machine, such as one with 512 cores, this results in
> 512 redundant domain attach/destroy operations.
>
> Avoid these unnecessary operations by simply checking
> cpu_smt_num_threads and remove SMT domain if the SMT domain is not
> enabled, and adjust the PKG index accordingly if NUMA-in-package
> invalidates that level as well.
>
> Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
> ---
> changelog:
> v2: fix wording issue as suggested by Thomas [1]
> v3: remove pointless memset and adjust PKG index accordingly,
> as suggested by Thomas [2]
>
> [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
> [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
>
> arch/x86/kernel/smpboot.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 7d202f9785362..4b6daa1545445 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = {
>
> static void __init build_sched_topology(void)
> {
> + bool smt_dropped = false;
> +
> + if (cpu_smt_num_threads <= 1) {
> + /*
> + * SMT level is x86_topology[0]. Shift the array left by one,
> + */
> + memmove(&x86_topology[0], &x86_topology[1],
> + sizeof(x86_topology) - sizeof(x86_topology[0]));
> + smt_dropped = true;
> + }
> +
> /*
> * When there is NUMA topology inside the package invalidate the
> * PKG domain since the NUMA domains will auto-magically create the
> * right spanning domains based on the SLIT.
> */
> if (x86_has_numa_in_package) {
> - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
> + unsigned int pkgdom;
> +
> + if (smt_dropped)
> + pkgdom = ARRAY_SIZE(x86_topology) - 3;
> + else
> + pkgdom = ARRAY_SIZE(x86_topology) - 2;
>
> memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> }
Oh gawd, and you all really think this is better than dynamically
creating that array?
This is quite terrible.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
2025-06-25 5:45 ` K Prateek Nayak
@ 2025-06-25 8:30 ` Peter Zijlstra
2025-06-30 0:39 ` Li Chen
1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2025-06-25 8:30 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Li Chen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H . Peter Anvin, Rafael J . Wysocki,
Sohil Mehta, Brian Gerst, Patryk Wlazlyn, linux-kernel,
Gautham R. Shenoy, Li Chen
On Wed, Jun 25, 2025 at 11:15:22AM +0530, K Prateek Nayak wrote:
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 3818b16c19e1..1793248d28cd 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -494,6 +494,8 @@ static struct sched_domain_topology_level x86_topology[] = {
> static void __init build_sched_topology(void)
> {
> + struct sched_domain_topology_level *topology = x86_topology;
> +
> /*
> * When there is NUMA topology inside the package invalidate the
> * PKG domain since the NUMA domains will auto-magically create the
> @@ -504,7 +506,15 @@ static void __init build_sched_topology(void)
> memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> }
> - set_sched_topology(x86_topology);
> +
> + /*
> + * Drop the SMT domains if there is only one thread per-core
> + * since it'll get degenerated by the scheduler anyways.
> + */
> + if (cpu_smt_num_threads <= 1)
> + ++topology;
> +
> + set_sched_topology(topology);
> }
> void set_cpu_sibling_map(int cpu)
Yes, that is far saner.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology()
2025-06-25 8:28 ` Peter Zijlstra
@ 2025-06-30 0:21 ` Li Chen
2025-06-30 12:56 ` Thomas Gleixner
0 siblings, 1 reply; 12+ messages in thread
From: Li Chen @ 2025-06-30 0:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta,
Brian Gerst, Patryk Wlazlyn, linux-kernel, Li Chen
Hi Peter,
---- On Wed, 25 Jun 2025 16:28:19 +0800 Peter Zijlstra <peterz@infradead.org> wrote ---
> On Wed, Jun 25, 2025 at 11:45:49AM +0800, Li Chen wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> >
> > The #ifdeffery and the initializers in build_sched_topology() are just
> > disgusting. The SCHED_SMT #ifdef is also pointless because SCHED_SMT is
> > unconditionally enabled when SMP is enabled.
>
> On x86, but not across all archs. Yes this is x86 code, but how is one
> supposed to keep all that nonsense straight in their head ;-)
>
> > Statically initialize the domain levels in the topology array and let
> > build_sched_topology() invalidate the package domain level when NUMA in
> > package is available.
> >
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> > arch/x86/kernel/smpboot.c | 45 +++++++++++++++------------------------
> > 1 file changed, 17 insertions(+), 28 deletions(-)
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index fc78c2325fd29..7d202f9785362 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -478,43 +478,32 @@ static int x86_cluster_flags(void)
> > */
> > static bool x86_has_numa_in_package;
> >
> > -static struct sched_domain_topology_level x86_topology[6];
> > +#define DOMAIN(maskfn, flagsfn, dname) { .mask = maskfn, .sd_flags = flagsfn, .name = #dname }
> >
> > -static void __init build_sched_topology(void)
> > -{
> > - int i = 0;
> > -
> > -#ifdef CONFIG_SCHED_SMT
> > - x86_topology[i++] = (struct sched_domain_topology_level){
> > - cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT)
> > - };
> > -#endif
> > +static struct sched_domain_topology_level x86_topology[] = {
> > + DOMAIN(cpu_smt_mask, cpu_smt_flags, SMT),
> > #ifdef CONFIG_SCHED_CLUSTER
> > - x86_topology[i++] = (struct sched_domain_topology_level){
> > - cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS)
> > - };
> > + DOMAIN(cpu_clustergroup_mask, x86_cluster_flags, CLS),
> > #endif
> > #ifdef CONFIG_SCHED_MC
> > - x86_topology[i++] = (struct sched_domain_topology_level){
> > - cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC)
> > - };
> > + DOMAIN(cpu_coregroup_mask, x86_core_flags, MC),
> > #endif
> > - /*
> > - * When there is NUMA topology inside the package skip the PKG domain
> > - * since the NUMA domains will auto-magically create the right spanning
> > - * domains based on the SLIT.
> > - */
> > - if (!x86_has_numa_in_package) {
> > - x86_topology[i++] = (struct sched_domain_topology_level){
> > - cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG)
> > - };
> > - }
> > + DOMAIN(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
> > + { NULL },
> > +};
> >
> > +static void __init build_sched_topology(void)
> > +{
> > /*
> > - * There must be one trailing NULL entry left.
> > + * When there is NUMA topology inside the package invalidate the
> > + * PKG domain since the NUMA domains will auto-magically create the
> > + * right spanning domains based on the SLIT.
> > */
> > - BUG_ON(i >= ARRAY_SIZE(x86_topology)-1);
> > + if (x86_has_numa_in_package) {
> > + unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
> >
> > + memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> > + }
> > set_sched_topology(x86_topology);
> > }
>
> Urgh, this patch is doing at least 4 things and nigh on unreadable
> because of that.
>
> - introduces DOMAIN() helper
> - drops (the now pointless) SD_INIT_NAME() helper
> - drops CONFIG_SCHED_SMT (x86 special)
> - moves to static initialize and truncate
Thanks for your review, I would split this Thomas's patch to 4 different patches
and preserve his signoff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
2025-06-25 8:29 ` Peter Zijlstra
@ 2025-06-30 0:29 ` Li Chen
0 siblings, 0 replies; 12+ messages in thread
From: Li Chen @ 2025-06-30 0:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta,
Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy,
Li Chen
Hi Peter,
---- On Wed, 25 Jun 2025 16:29:32 +0800 Peter Zijlstra <peterz@infradead.org> wrote ---
> On Wed, Jun 25, 2025 at 11:45:50AM +0800, Li Chen wrote:
> > From: Li Chen <chenl311@chinatelecom.cn>
> >
> > Currently, the SMT domain is added into sched_domain_topology by default.
> >
> > If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight
> > is just 1, it will destroy it.
> >
> > On a large machine, such as one with 512 cores, this results in
> > 512 redundant domain attach/destroy operations.
> >
> > Avoid these unnecessary operations by simply checking
> > cpu_smt_num_threads and remove SMT domain if the SMT domain is not
> > enabled, and adjust the PKG index accordingly if NUMA-in-package
> > invalidates that level as well.
> >
> > Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
> > ---
> > changelog:
> > v2: fix wording issue as suggested by Thomas [1]
> > v3: remove pointless memset and adjust PKG index accordingly,
> > as suggested by Thomas [2]
> >
> > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
> > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
> >
> > arch/x86/kernel/smpboot.c | 18 +++++++++++++++++-
> > 1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 7d202f9785362..4b6daa1545445 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = {
> >
> > static void __init build_sched_topology(void)
> > {
> > + bool smt_dropped = false;
> > +
> > + if (cpu_smt_num_threads <= 1) {
> > + /*
> > + * SMT level is x86_topology[0]. Shift the array left by one,
> > + */
> > + memmove(&x86_topology[0], &x86_topology[1],
> > + sizeof(x86_topology) - sizeof(x86_topology[0]));
> > + smt_dropped = true;
> > + }
> > +
> > /*
> > * When there is NUMA topology inside the package invalidate the
> > * PKG domain since the NUMA domains will auto-magically create the
> > * right spanning domains based on the SLIT.
> > */
> > if (x86_has_numa_in_package) {
> > - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
> > + unsigned int pkgdom;
> > +
> > + if (smt_dropped)
> > + pkgdom = ARRAY_SIZE(x86_topology) - 3;
> > + else
> > + pkgdom = ARRAY_SIZE(x86_topology) - 2;
> >
> > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> > }
>
> Oh gawd, and you all really think this is better than dynamically
> creating that array?
>
> This is quite terrible.
>
Do Thomas and you still need to agree on switching to a static array? I'm unsure of the rule.
BTW, initially, I had a patch that didn't include that dynamic/static array change. I don't know if this is ok for you:
https://lore.kernel.org/all/1965cae22a0.12ab5a70c833868.7155412488566097801@linux.beauty/
Regards,
Li
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
2025-06-25 5:45 ` K Prateek Nayak
2025-06-25 8:30 ` Peter Zijlstra
@ 2025-06-30 0:39 ` Li Chen
1 sibling, 0 replies; 12+ messages in thread
From: Li Chen @ 2025-06-30 0:39 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H . Peter Anvin, Rafael J . Wysocki, Peter Zijlstra, Sohil Mehta,
Brian Gerst, Patryk Wlazlyn, linux-kernel, Gautham R. Shenoy,
Li Chen
Hi K,
---- On Wed, 25 Jun 2025 13:45:22 +0800 K Prateek Nayak <kprateek.nayak@amd.com> wrote ---
> Hello Li,
>
> On 6/25/2025 9:15 AM, Li Chen wrote:
> > From: Li Chen <chenl311@chinatelecom.cn>
> >
> > Currently, the SMT domain is added into sched_domain_topology by default.
> >
> > If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight
> > is just 1, it will destroy it.
> >
> > On a large machine, such as one with 512 cores, this results in
> > 512 redundant domain attach/destroy operations.
> >
> > Avoid these unnecessary operations by simply checking
> > cpu_smt_num_threads and remove SMT domain if the SMT domain is not
> > enabled, and adjust the PKG index accordingly if NUMA-in-package
> > invalidates that level as well.
> >
> > Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
> > ---
> > changelog:
> > v2: fix wording issue as suggested by Thomas [1]
> > v3: remove pointless memset and adjust PKG index accordingly,
> > as suggested by Thomas [2]
> >
> > [1]: https://lore.kernel.org/all/87msa2r018.ffs@tglx/
> > [2]: https://lore.kernel.org/all/875xglntx1.ffs@tglx/
> >
> > arch/x86/kernel/smpboot.c | 18 +++++++++++++++++-
> > 1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 7d202f9785362..4b6daa1545445 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -494,13 +494,29 @@ static struct sched_domain_topology_level x86_topology[] = {
> >
> > static void __init build_sched_topology(void)
> > {
> > + bool smt_dropped = false;
> > +
> > + if (cpu_smt_num_threads <= 1) {
> > + /*
> > + * SMT level is x86_topology[0]. Shift the array left by one,
> > + */
> > + memmove(&x86_topology[0], &x86_topology[1],
> > + sizeof(x86_topology) - sizeof(x86_topology[0]));
> > + smt_dropped = true;
> > + }
> > +
>
> Instead of this memmove, wouldn't just passing the "&x86_topology[1]"
> to set_sched_topology() when "cpu_smt_num_threads <= 1" work?
>
> Something like below:
>
> (Only tested on a QEMU based VM)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 3818b16c19e1..1793248d28cd 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -494,6 +494,8 @@ static struct sched_domain_topology_level x86_topology[] = {
>
> static void __init build_sched_topology(void)
> {
> + struct sched_domain_topology_level *topology = x86_topology;
> +
> /*
> * When there is NUMA topology inside the package invalidate the
> * PKG domain since the NUMA domains will auto-magically create the
> @@ -504,7 +506,15 @@ static void __init build_sched_topology(void)
>
> memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> }
> - set_sched_topology(x86_topology);
> +
> + /*
> + * Drop the SMT domains if there is only one thread per-core
> + * since it'll get degenerated by the scheduler anyways.
> + */
> + if (cpu_smt_num_threads <= 1)
> + ++topology;
> +
> + set_sched_topology(topology);
> }
>
> void set_cpu_sibling_map(int cpu)
Yes, I also agree. I would switch to it in the next series and add your signoff.
Thanks!
> > /*
> > * When there is NUMA topology inside the package invalidate the
> > * PKG domain since the NUMA domains will auto-magically create the
> > * right spanning domains based on the SLIT.
> > */
> > if (x86_has_numa_in_package) {
> > - unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
> > + unsigned int pkgdom;
> > +
> > + if (smt_dropped)
> > + pkgdom = ARRAY_SIZE(x86_topology) - 3;
> > + else
> > + pkgdom = ARRAY_SIZE(x86_topology) - 2;
> >
> > memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
> > }
>
> --
> Thanks and Regards,
> Prateek
>
>
Regards,
Li
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology()
2025-06-30 0:21 ` Li Chen
@ 2025-06-30 12:56 ` Thomas Gleixner
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2025-06-30 12:56 UTC (permalink / raw)
To: Li Chen, Peter Zijlstra
Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
Rafael J . Wysocki, K Prateek Nayak, Sohil Mehta, Brian Gerst,
Patryk Wlazlyn, linux-kernel, Li Chen
On Mon, Jun 30 2025 at 08:21, Li Chen wrote:
> ---- On Wed, 25 Jun 2025 16:28:19 +0800 Peter Zijlstra <peterz@infradead.org> wrote ---
> > Urgh, this patch is doing at least 4 things and nigh on unreadable
> > because of that.
> >
> > - introduces DOMAIN() helper
> > - drops (the now pointless) SD_INIT_NAME() helper
> > - drops CONFIG_SCHED_SMT (x86 special)
> > - moves to static initialize and truncate
>
> Thanks for your review, I would split this Thomas's patch to 4 different patches
> and preserve his signoff
Nah. Just split it up and add
Suggested-by: Thomas ....
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-30 12:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25 3:45 [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level Li Chen
2025-06-25 3:45 ` [PATCH v3 1/2] x86/smpboot: Decrapify build_sched_topology() Li Chen
2025-06-25 8:28 ` Peter Zijlstra
2025-06-30 0:21 ` Li Chen
2025-06-30 12:56 ` Thomas Gleixner
2025-06-25 3:45 ` [PATCH v3 2/2] x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled Li Chen
2025-06-25 5:45 ` K Prateek Nayak
2025-06-25 8:30 ` Peter Zijlstra
2025-06-30 0:39 ` Li Chen
2025-06-25 8:29 ` Peter Zijlstra
2025-06-30 0:29 ` Li Chen
2025-06-25 5:50 ` [PATCH v3 0/2] x86/smpboot: tidy sched-topology and drop useless SMT level K Prateek Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).