* [PATCH v5 0/4] Support SMT control on arm64
@ 2024-08-06 8:53 Yicong Yang
2024-08-06 8:53 ` [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread() Yicong Yang
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Yicong Yang @ 2024-08-06 8:53 UTC (permalink / raw)
To: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, dietmar.eggemann, gregkh, rafael,
jonathan.cameron, prime.zeng, linuxarm, yangyicong, xuwei5,
guohanjun
From: Yicong Yang <yangyicong@hisilicon.com>
The core CPU control framework supports runtime SMT control which
is not yet supported on arm64. Besides the general vulnerabilities
concerns we want this runtime control on our arm64 server for:
- better single CPU performance in some cases
- saving overall power consumption
This patchset implements it in the following aspects:
- Provides a default topology_is_primary_thread()
- support retrieve SMT thread number on OF based system
- support retrieve SMT thread number on ACPI based system
- select HOTPLUG_SMT for arm64
Tests has been done on our real ACPI based arm64 server and on
ACPI/OF based QEMU VMs.
Change since v4:
- Provide a default topology_is_primary_thread() in the framework, Per Will
Link: https://lore.kernel.org/linux-arm-kernel/20231121092602.47792-1-yangyicong@huawei.com/
Change since v3:
- Fix some build and kconfig error reported by kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/linux-arm-kernel/20231114040110.54590-1-yangyicong@huawei.com/
Change since v2:
- Detect SMT thread number at topology build from ACPI/DT, avoid looping CPUs
- Split patches into ACPI/OF/arch_topology path and enable the kconfig for arm64
Link: https://lore.kernel.org/linux-arm-kernel/20231010115335.13862-1-yangyicong@huawei.com/
Yicong Yang (4):
cpu/SMT: Provide a default topology_is_primary_thread()
arch_topology: Support SMT control for OF based system
arm64: topology: Support SMT control on ACPI based system
arm64: Kconfig: Enable HOTPLUG_SMT
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
arch/powerpc/include/asm/topology.h | 1 +
arch/x86/include/asm/topology.h | 2 +-
drivers/base/arch_topology.c | 13 +++++++++++++
include/linux/topology.h | 14 ++++++++++++++
6 files changed, 54 insertions(+), 1 deletion(-)
--
2.24.0
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread()
2024-08-06 8:53 [PATCH v5 0/4] Support SMT control on arm64 Yicong Yang
@ 2024-08-06 8:53 ` Yicong Yang
2024-08-16 15:54 ` Dietmar Eggemann
2024-08-06 8:53 ` [PATCH v5 2/4] arch_topology: Support SMT control for OF based system Yicong Yang
` (2 subsequent siblings)
3 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-08-06 8:53 UTC (permalink / raw)
To: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, dietmar.eggemann, gregkh, rafael,
jonathan.cameron, prime.zeng, linuxarm, yangyicong, xuwei5,
guohanjun
From: Yicong Yang <yangyicong@hisilicon.com>
Currently if architectures want to support HOTPLUG_SMT they need to
provide a topology_is_primary_thread() telling the framework which
thread in the SMT cannot offline. However arm64 doesn't have a
restriction on which thread in the SMT cannot offline, a simplest
choice is that just make 1st thread as the "primary" thread. So
just make this as the default implementation in the framework and
let architectures like x86 that have special primary thread to
override this function.
There's no need to provide a stub function if !CONFIG_SMP or
!CONFIG_HOTPLUG_SMP. In such case the testing CPU is already
the 1st CPU in the SMT so it's always the primary thread.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
arch/powerpc/include/asm/topology.h | 1 +
arch/x86/include/asm/topology.h | 2 +-
include/linux/topology.h | 14 ++++++++++++++
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index f4e6f2dd04b7..e0971777f2d9 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -151,6 +151,7 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
{
return cpu == cpu_first_thread_sibling(cpu);
}
+#define topology_is_primary_thread topology_is_primary_thread
static inline bool topology_smt_thread_allowed(unsigned int cpu)
{
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index abe3a8f22cbd..86c077f8ee99 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -219,11 +219,11 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
{
return cpumask_test_cpu(cpu, cpu_primary_thread_mask);
}
+#define topology_is_primary_thread topology_is_primary_thread
#else /* CONFIG_SMP */
static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
static inline int topology_max_smt_threads(void) { return 1; }
-static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
#endif /* !CONFIG_SMP */
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 52f5850730b3..9c9f6b9f3462 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -240,6 +240,20 @@ static inline const struct cpumask *cpu_smt_mask(int cpu)
}
#endif
+#ifndef topology_is_primary_thread
+#define topology_is_primary_thread topology_is_primary_thread
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+ /*
+ * On SMT hotplug the primary thread of the SMT won't be disabled.
+ * Architectures do have a special primary thread (e.g. x86) need
+ * to override this function. Otherwise just make the first thread
+ * in the SMT as the primary thread.
+ */
+ return cpu == cpumask_first(topology_sibling_cpumask(cpu));
+}
+#endif
+
static inline const struct cpumask *cpu_cpu_mask(int cpu)
{
return cpumask_of_node(cpu_to_node(cpu));
--
2.24.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v5 2/4] arch_topology: Support SMT control for OF based system
2024-08-06 8:53 [PATCH v5 0/4] Support SMT control on arm64 Yicong Yang
2024-08-06 8:53 ` [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread() Yicong Yang
@ 2024-08-06 8:53 ` Yicong Yang
2024-08-16 15:55 ` Dietmar Eggemann
2024-08-06 8:53 ` [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI " Yicong Yang
2024-08-06 8:53 ` [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT Yicong Yang
3 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-08-06 8:53 UTC (permalink / raw)
To: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, dietmar.eggemann, gregkh, rafael,
jonathan.cameron, prime.zeng, linuxarm, yangyicong, xuwei5,
guohanjun
From: Yicong Yang <yangyicong@hisilicon.com>
On building the topology from the devicetree, we've already
gotten the SMT thread number of each core. Update the largest
SMT thread number to enable the SMT control.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
drivers/base/arch_topology.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 75fcb75d5515..95513abd664f 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -11,6 +11,7 @@
#include <linux/cleanup.h>
#include <linux/cpu.h>
#include <linux/cpufreq.h>
+#include <linux/cpu_smt.h>
#include <linux/device.h>
#include <linux/of.h>
#include <linux/slab.h>
@@ -531,6 +532,16 @@ static int __init get_cpu_for_node(struct device_node *node)
return cpu;
}
+static void __init update_smt_num_threads(unsigned int num_threads)
+{
+ static unsigned int max_smt_thread_num = 1;
+
+ if (num_threads > max_smt_thread_num) {
+ max_smt_thread_num = num_threads;
+ cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+ }
+}
+
static int __init parse_core(struct device_node *core, int package_id,
int cluster_id, int core_id)
{
@@ -561,6 +572,8 @@ static int __init parse_core(struct device_node *core, int package_id,
i++;
} while (1);
+ update_smt_num_threads(i);
+
cpu = get_cpu_for_node(core);
if (cpu >= 0) {
if (!leaf) {
--
2.24.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-06 8:53 [PATCH v5 0/4] Support SMT control on arm64 Yicong Yang
2024-08-06 8:53 ` [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread() Yicong Yang
2024-08-06 8:53 ` [PATCH v5 2/4] arch_topology: Support SMT control for OF based system Yicong Yang
@ 2024-08-06 8:53 ` Yicong Yang
2024-08-16 15:55 ` Dietmar Eggemann
2024-08-27 15:40 ` Pierre Gondois
2024-08-06 8:53 ` [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT Yicong Yang
3 siblings, 2 replies; 26+ messages in thread
From: Yicong Yang @ 2024-08-06 8:53 UTC (permalink / raw)
To: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, dietmar.eggemann, gregkh, rafael,
jonathan.cameron, prime.zeng, linuxarm, yangyicong, xuwei5,
guohanjun
From: Yicong Yang <yangyicong@hisilicon.com>
For ACPI we'll build the topology from PPTT and we cannot directly
get the SMT number of each core. Instead using a temporary xarray
to record the SMT number of each core when building the topology
and we can know the largest SMT number in the system. Then we can
enable the support of SMT control.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..f72e1e55b05e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -15,8 +15,10 @@
#include <linux/arch_topology.h>
#include <linux/cacheinfo.h>
#include <linux/cpufreq.h>
+#include <linux/cpu_smt.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/xarray.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
@@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
*/
int __init parse_acpi_topology(void)
{
+ int thread_num, max_smt_thread_num = 1;
+ struct xarray core_threads;
int cpu, topology_id;
+ void *entry;
if (acpi_disabled)
return 0;
+ xa_init(&core_threads);
+
for_each_possible_cpu(cpu) {
topology_id = find_acpi_cpu_topology(cpu, 0);
if (topology_id < 0)
@@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].thread_id = topology_id;
topology_id = find_acpi_cpu_topology(cpu, 1);
cpu_topology[cpu].core_id = topology_id;
+
+ entry = xa_load(&core_threads, topology_id);
+ if (!entry) {
+ xa_store(&core_threads, topology_id,
+ xa_mk_value(1), GFP_KERNEL);
+ } else {
+ thread_num = xa_to_value(entry);
+ thread_num++;
+ xa_store(&core_threads, topology_id,
+ xa_mk_value(thread_num), GFP_KERNEL);
+
+ if (thread_num > max_smt_thread_num)
+ max_smt_thread_num = thread_num;
+ }
} else {
cpu_topology[cpu].thread_id = -1;
cpu_topology[cpu].core_id = topology_id;
@@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].package_id = topology_id;
}
+ cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+
+ xa_destroy(&core_threads);
return 0;
}
#endif
--
2.24.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT
2024-08-06 8:53 [PATCH v5 0/4] Support SMT control on arm64 Yicong Yang
` (2 preceding siblings ...)
2024-08-06 8:53 ` [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI " Yicong Yang
@ 2024-08-06 8:53 ` Yicong Yang
2024-08-27 15:40 ` Pierre Gondois
3 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-08-06 8:53 UTC (permalink / raw)
To: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, dietmar.eggemann, gregkh, rafael,
jonathan.cameron, prime.zeng, linuxarm, yangyicong, xuwei5,
guohanjun
From: Yicong Yang <yangyicong@hisilicon.com>
Enable HOTPLUG_SMT for SMT control.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a2f8ff354ca6..bd3bc2f5e0ec 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -238,6 +238,7 @@ config ARM64
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
+ select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
--
2.24.0
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread()
2024-08-06 8:53 ` [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread() Yicong Yang
@ 2024-08-16 15:54 ` Dietmar Eggemann
2024-08-19 7:25 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Dietmar Eggemann @ 2024-08-16 15:54 UTC (permalink / raw)
To: Yicong Yang, catalin.marinas, will, sudeep.holla, tglx, peterz,
mpe, linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, yangyicong, xuwei5, guohanjun
On 06/08/2024 10:53, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> Currently if architectures want to support HOTPLUG_SMT they need to
> provide a topology_is_primary_thread() telling the framework which
> thread in the SMT cannot offline. However arm64 doesn't have a
> restriction on which thread in the SMT cannot offline, a simplest
> choice is that just make 1st thread as the "primary" thread. So
> just make this as the default implementation in the framework and
> let architectures like x86 that have special primary thread to
> override this function.
IMHO, ARM64 (/ARM), RISCV and PARISC (SMP) use GENERIC_ARCH_TOPOLOGY
(and drivers/base/arch_topology.c) so they could share
topology_is_primary_thread() also there?
[...]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 2/4] arch_topology: Support SMT control for OF based system
2024-08-06 8:53 ` [PATCH v5 2/4] arch_topology: Support SMT control for OF based system Yicong Yang
@ 2024-08-16 15:55 ` Dietmar Eggemann
2024-08-19 7:18 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Dietmar Eggemann @ 2024-08-16 15:55 UTC (permalink / raw)
To: Yicong Yang, catalin.marinas, will, sudeep.holla, tglx, peterz,
mpe, linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, yangyicong, xuwei5, guohanjun
On 06/08/2024 10:53, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> On building the topology from the devicetree, we've already
> gotten the SMT thread number of each core. Update the largest
> SMT thread number to enable the SMT control.
Do we have SMT Device Tree (DT) systems out there? But you right that DT
at least supports SMT.
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> drivers/base/arch_topology.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 75fcb75d5515..95513abd664f 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -11,6 +11,7 @@
> #include <linux/cleanup.h>
> #include <linux/cpu.h>
> #include <linux/cpufreq.h>
> +#include <linux/cpu_smt.h>
> #include <linux/device.h>
> #include <linux/of.h>
> #include <linux/slab.h>
> @@ -531,6 +532,16 @@ static int __init get_cpu_for_node(struct device_node *node)
> return cpu;
> }
>
> +static void __init update_smt_num_threads(unsigned int num_threads)
> +{
> + static unsigned int max_smt_thread_num = 1;
> +
> + if (num_threads > max_smt_thread_num) {
> + max_smt_thread_num = num_threads;
> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
> + }
This could theoretically (unlikely though) call
cpu_smt_set_num_threads() multiple times (on heterogeneous systems with
different numbers of SMT threads).
> +}
> +
> static int __init parse_core(struct device_node *core, int package_id,
> int cluster_id, int core_id)
> {
> @@ -561,6 +572,8 @@ static int __init parse_core(struct device_node *core, int package_id,
> i++;
> } while (1);
>
> + update_smt_num_threads(i);
> +
> cpu = get_cpu_for_node(core);
> if (cpu >= 0) {
> if (!leaf) {
Why not simply do this:
-->8--
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 75fcb75d5515..806537419715 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -30,6 +30,7 @@ static struct cpumask scale_freq_counters_mask;
static bool scale_freq_invariant;
DEFINE_PER_CPU(unsigned long, capacity_freq_ref) = 1;
EXPORT_PER_CPU_SYMBOL_GPL(capacity_freq_ref);
+static unsigned int max_smt_thread_num = 1;
static bool supports_scale_freq_counters(const struct cpumask *cpus)
{
@@ -577,6 +578,9 @@ static int __init parse_core(struct device_node *core, int package_id,
return -EINVAL;
}
+ if (max_smt_thread_num < i)
+ max_smt_thread_num = i;
+
return 0;
}
@@ -673,6 +677,9 @@ static int __init parse_socket(struct device_node *socket)
if (!has_socket)
ret = parse_cluster(socket, 0, -1, 0);
+ if (max_smt_thread_num > 1)
+ cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+
return ret;
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-06 8:53 ` [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI " Yicong Yang
@ 2024-08-16 15:55 ` Dietmar Eggemann
2024-08-19 7:03 ` Yicong Yang
2024-08-27 15:40 ` Pierre Gondois
1 sibling, 1 reply; 26+ messages in thread
From: Dietmar Eggemann @ 2024-08-16 15:55 UTC (permalink / raw)
To: Yicong Yang, catalin.marinas, will, sudeep.holla, tglx, peterz,
mpe, linux-arm-kernel, mingo, bp, dave.hansen
Cc: linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, yangyicong, xuwei5, guohanjun
On 06/08/2024 10:53, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> For ACPI we'll build the topology from PPTT and we cannot directly
> get the SMT number of each core. Instead using a temporary xarray
> to record the SMT number of each core when building the topology
> and we can know the largest SMT number in the system. Then we can
> enable the support of SMT control.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..f72e1e55b05e 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -15,8 +15,10 @@
> #include <linux/arch_topology.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpufreq.h>
> +#include <linux/cpu_smt.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/xarray.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
> */
> int __init parse_acpi_topology(void)
> {
> + int thread_num, max_smt_thread_num = 1;
> + struct xarray core_threads;
> int cpu, topology_id;
> + void *entry;
>
> if (acpi_disabled)
> return 0;
>
> + xa_init(&core_threads);
> +
> for_each_possible_cpu(cpu) {
> topology_id = find_acpi_cpu_topology(cpu, 0);
> if (topology_id < 0)
> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].thread_id = topology_id;
> topology_id = find_acpi_cpu_topology(cpu, 1);
> cpu_topology[cpu].core_id = topology_id;
> +
> + entry = xa_load(&core_threads, topology_id);
> + if (!entry) {
> + xa_store(&core_threads, topology_id,
> + xa_mk_value(1), GFP_KERNEL);
> + } else {
> + thread_num = xa_to_value(entry);
> + thread_num++;
> + xa_store(&core_threads, topology_id,
> + xa_mk_value(thread_num), GFP_KERNEL);
> +
> + if (thread_num > max_smt_thread_num)
> + max_smt_thread_num = thread_num;
> + }
So the xarray contains one element for each core_id with the information
how often the core_id occurs? I assume you have to iterate over all
possible CPUs since you don't know which logical CPUs belong to the same
core_id.
> } else {
> cpu_topology[cpu].thread_id = -1;
> cpu_topology[cpu].core_id = topology_id;
> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].package_id = topology_id;
> }
>
> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
> +
> + xa_destroy(&core_threads);
> return 0;
> }
> #endif
Tested on ThunderX2:
$ cat /proc/schedstat | head -6 | tail -4 | awk '{ print $1, $2 }'
cpu0 0
domain0 00000000,00000000,00000000,00000000,00000001,00000001,00000001,00000001
^ ^ ^ ^
domain1 00000000,00000000,00000000,00000000,ffffffff,ffffffff,ffffffff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
detecting 'max_smt_thread_num = 4' correctly.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-16 15:55 ` Dietmar Eggemann
@ 2024-08-19 7:03 ` Yicong Yang
2024-08-22 7:19 ` Dietmar Eggemann
0 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-08-19 7:03 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen, yangyicong,
linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/8/16 23:55, Dietmar Eggemann wrote:
> On 06/08/2024 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> For ACPI we'll build the topology from PPTT and we cannot directly
>> get the SMT number of each core. Instead using a temporary xarray
>> to record the SMT number of each core when building the topology
>> and we can know the largest SMT number in the system. Then we can
>> enable the support of SMT control.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>> 1 file changed, 24 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -15,8 +15,10 @@
>> #include <linux/arch_topology.h>
>> #include <linux/cacheinfo.h>
>> #include <linux/cpufreq.h>
>> +#include <linux/cpu_smt.h>
>> #include <linux/init.h>
>> #include <linux/percpu.h>
>> +#include <linux/xarray.h>
>>
>> #include <asm/cpu.h>
>> #include <asm/cputype.h>
>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>> */
>> int __init parse_acpi_topology(void)
>> {
>> + int thread_num, max_smt_thread_num = 1;
>> + struct xarray core_threads;
>> int cpu, topology_id;
>> + void *entry;
>>
>> if (acpi_disabled)
>> return 0;
>>
>> + xa_init(&core_threads);
>> +
>> for_each_possible_cpu(cpu) {
>> topology_id = find_acpi_cpu_topology(cpu, 0);
>> if (topology_id < 0)
>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].thread_id = topology_id;
>> topology_id = find_acpi_cpu_topology(cpu, 1);
>> cpu_topology[cpu].core_id = topology_id;
>> +
>> + entry = xa_load(&core_threads, topology_id);
>> + if (!entry) {
>> + xa_store(&core_threads, topology_id,
>> + xa_mk_value(1), GFP_KERNEL);
>> + } else {
>> + thread_num = xa_to_value(entry);
>> + thread_num++;
>> + xa_store(&core_threads, topology_id,
>> + xa_mk_value(thread_num), GFP_KERNEL);
>> +
>> + if (thread_num > max_smt_thread_num)
>> + max_smt_thread_num = thread_num;
>> + }
>
> So the xarray contains one element for each core_id with the information
> how often the core_id occurs? I assume you have to iterate over all
> possible CPUs since you don't know which logical CPUs belong to the same
> core_id.
>
Each xarray element counts the thread number of a certain core id. so the logic is like below:
1. if the "core id" entry doesn't exists, then we're accessing this core for the 1st time. create
one and make the thread number to 1
2. otherwise increment the thread number of "core id" this cpu belongs (PPTT already
told us which core this CPU belongs to). Update the max_smt_thread_num if necessary.
Then we can know max_smt_thread_num by meanwhile iterating the PPTT table and
build the topology for all the possible CPUs.
Otherwise we need to do a second scan for the max thread number after built the
topology. This way is implemented in v1 and it's complained about the overhead on large
scale systems since we need to loop the CPUs twice.
>> } else {
>> cpu_topology[cpu].thread_id = -1;
>> cpu_topology[cpu].core_id = topology_id;
>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].package_id = topology_id;
>> }
>>
>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> +
>> + xa_destroy(&core_threads);
>> return 0;
>> }
>> #endif
>
> Tested on ThunderX2:
>
> $ cat /proc/schedstat | head -6 | tail -4 | awk '{ print $1, $2 }'
> cpu0 0
> domain0 00000000,00000000,00000000,00000000,00000001,00000001,00000001,00000001
> ^ ^ ^ ^
> domain1 00000000,00000000,00000000,00000000,ffffffff,ffffffff,ffffffff,ffffffff
> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>
> detecting 'max_smt_thread_num = 4' correctly.
>
Thanks for the testing. ok for a tag?
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 2/4] arch_topology: Support SMT control for OF based system
2024-08-16 15:55 ` Dietmar Eggemann
@ 2024-08-19 7:18 ` Yicong Yang
0 siblings, 0 replies; 26+ messages in thread
From: Yicong Yang @ 2024-08-19 7:18 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen, yangyicong,
linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/8/16 23:55, Dietmar Eggemann wrote:
> On 06/08/2024 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> On building the topology from the devicetree, we've already
>> gotten the SMT thread number of each core. Update the largest
>> SMT thread number to enable the SMT control.
>
> Do we have SMT Device Tree (DT) systems out there? But you right that DT
> at least supports SMT.
>
My system's based on ACPI. For DT part it's emulated and tested on the QEMU VM.
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> drivers/base/arch_topology.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 75fcb75d5515..95513abd664f 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -11,6 +11,7 @@
>> #include <linux/cleanup.h>
>> #include <linux/cpu.h>
>> #include <linux/cpufreq.h>
>> +#include <linux/cpu_smt.h>
>> #include <linux/device.h>
>> #include <linux/of.h>
>> #include <linux/slab.h>
>> @@ -531,6 +532,16 @@ static int __init get_cpu_for_node(struct device_node *node)
>> return cpu;
>> }
>>
>> +static void __init update_smt_num_threads(unsigned int num_threads)
>> +{
>> + static unsigned int max_smt_thread_num = 1;
>> +
>> + if (num_threads > max_smt_thread_num) {
>> + max_smt_thread_num = num_threads;
>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> + }
>
> This could theoretically (unlikely though) call
> cpu_smt_set_num_threads() multiple times (on heterogeneous systems with
> different numbers of SMT threads).
Yes indeed. Was doing this purposely since I think this doing nothing unexpectedly but
only update the max threads recorded in the framework.
>> +}
>> +
>> static int __init parse_core(struct device_node *core, int package_id,
>> int cluster_id, int core_id)
>> {
>> @@ -561,6 +572,8 @@ static int __init parse_core(struct device_node *core, int package_id,
>> i++;
>> } while (1);
>>
>> + update_smt_num_threads(i);
>> +
>> cpu = get_cpu_for_node(core);
>> if (cpu >= 0) {
>> if (!leaf) {
>
> Why not simply do this:
>
> -->8--
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 75fcb75d5515..806537419715 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -30,6 +30,7 @@ static struct cpumask scale_freq_counters_mask;
> static bool scale_freq_invariant;
> DEFINE_PER_CPU(unsigned long, capacity_freq_ref) = 1;
> EXPORT_PER_CPU_SYMBOL_GPL(capacity_freq_ref);
> +static unsigned int max_smt_thread_num = 1;
>
This fine with me and this avoid calling cpu_smt_set_num_threads() multiple
times. We can switch to this implementation.
Thanks.
> static bool supports_scale_freq_counters(const struct cpumask *cpus)
> {
> @@ -577,6 +578,9 @@ static int __init parse_core(struct device_node *core, int package_id,
> return -EINVAL;
> }
>
> + if (max_smt_thread_num < i)
> + max_smt_thread_num = i;
> +
> return 0;
> }
>
> @@ -673,6 +677,9 @@ static int __init parse_socket(struct device_node *socket)
> if (!has_socket)
> ret = parse_cluster(socket, 0, -1, 0);
>
> + if (max_smt_thread_num > 1)
> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
> +
> return ret;
> }
>
>
> .
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread()
2024-08-16 15:54 ` Dietmar Eggemann
@ 2024-08-19 7:25 ` Yicong Yang
0 siblings, 0 replies; 26+ messages in thread
From: Yicong Yang @ 2024-08-19 7:25 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen, yangyicong,
linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/8/16 23:54, Dietmar Eggemann wrote:
> On 06/08/2024 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> Currently if architectures want to support HOTPLUG_SMT they need to
>> provide a topology_is_primary_thread() telling the framework which
>> thread in the SMT cannot offline. However arm64 doesn't have a
>> restriction on which thread in the SMT cannot offline, a simplest
>> choice is that just make 1st thread as the "primary" thread. So
>> just make this as the default implementation in the framework and
>> let architectures like x86 that have special primary thread to
>> override this function.
>
> IMHO, ARM64 (/ARM), RISCV and PARISC (SMP) use GENERIC_ARCH_TOPOLOGY
> (and drivers/base/arch_topology.c) so they could share
> topology_is_primary_thread() also there?
>
I think yes, if they want to implement HOTPLUG_SMT and don't have a restriction
on the primary thread.
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-19 7:03 ` Yicong Yang
@ 2024-08-22 7:19 ` Dietmar Eggemann
0 siblings, 0 replies; 26+ messages in thread
From: Dietmar Eggemann @ 2024-08-22 7:19 UTC (permalink / raw)
To: Yicong Yang
Cc: catalin.marinas, will, sudeep.holla, tglx, peterz, mpe,
linux-arm-kernel, mingo, bp, dave.hansen, yangyicong,
linuxppc-dev, x86, linux-kernel, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, xuwei5, guohanjun
On 19/08/2024 09:03, Yicong Yang wrote:
> On 2024/8/16 23:55, Dietmar Eggemann wrote:
>> On 06/08/2024 10:53, Yicong Yang wrote:
>>> From: Yicong Yang <yangyicong@hisilicon.com>
[...]
>> So the xarray contains one element for each core_id with the information
>> how often the core_id occurs? I assume you have to iterate over all
>> possible CPUs since you don't know which logical CPUs belong to the same
>> core_id.
>>
>
> Each xarray element counts the thread number of a certain core id. so the logic is like below:
> 1. if the "core id" entry doesn't exists, then we're accessing this core for the 1st time. create
> one and make the thread number to 1
> 2. otherwise increment the thread number of "core id" this cpu belongs (PPTT already
> told us which core this CPU belongs to). Update the max_smt_thread_num if necessary.
>
> Then we can know max_smt_thread_num by meanwhile iterating the PPTT table and
> build the topology for all the possible CPUs.
>
> Otherwise we need to do a second scan for the max thread number after built the
> topology. This way is implemented in v1 and it's complained about the overhead on large
> scale systems since we need to loop the CPUs twice.
OK.
[...]
>> Tested on ThunderX2:
>>
>> $ cat /proc/schedstat | head -6 | tail -4 | awk '{ print $1, $2 }'
>> cpu0 0
>> domain0 00000000,00000000,00000000,00000000,00000001,00000001,00000001,00000001
>> ^ ^ ^ ^
>> domain1 00000000,00000000,00000000,00000000,ffffffff,ffffffff,ffffffff,ffffffff
>> domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
>>
>> detecting 'max_smt_thread_num = 4' correctly.
>>
>
> Thanks for the testing. ok for a tag?
Yes, please go ahead.
[...]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT
2024-08-06 8:53 ` [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT Yicong Yang
@ 2024-08-27 15:40 ` Pierre Gondois
2024-08-29 6:50 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Pierre Gondois @ 2024-08-27 15:40 UTC (permalink / raw)
To: Yicong Yang
Cc: linuxppc-dev, bp, dave.hansen, x86, mingo, linux-arm-kernel, mpe,
peterz, tglx, sudeep.holla, will, catalin.marinas, linux-kernel,
dietmar.eggemann, gregkh, rafael, jonathan.cameron, prime.zeng,
linuxarm, yangyicong, xuwei5, guohanjun
Hello Yicong,
Is it necessary to have an explicit dependency over SMP for arm64 ? Cf.
commit 4b3dc9679cf7 ("arm64: force CONFIG_SMP=y and remove redundant #ifdefs")
Regards,
Pierre
On 8/6/24 10:53, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> Enable HOTPLUG_SMT for SMT control.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..bd3bc2f5e0ec 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -238,6 +238,7 @@ config ARM64
> select HAVE_KRETPROBES
> select HAVE_GENERIC_VDSO
> select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
> + select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
> select IRQ_DOMAIN
> select IRQ_FORCED_THREADING
> select KASAN_VMALLOC if KASAN
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-06 8:53 ` [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI " Yicong Yang
2024-08-16 15:55 ` Dietmar Eggemann
@ 2024-08-27 15:40 ` Pierre Gondois
2024-08-29 7:40 ` Yicong Yang
1 sibling, 1 reply; 26+ messages in thread
From: Pierre Gondois @ 2024-08-27 15:40 UTC (permalink / raw)
To: Yicong Yang
Cc: linuxppc-dev, bp, dave.hansen, mingo, linux-arm-kernel, mpe,
peterz, tglx, sudeep.holla, will, catalin.marinas, x86,
linux-kernel, dietmar.eggemann, gregkh, rafael, jonathan.cameron,
prime.zeng, linuxarm, yangyicong, xuwei5, guohanjun
Hello Yicong,
IIUC we are looking for the maximum number of threads a CPU can have
in the platform. But is is actually possible to have a platform with
CPUs not having the same number of threads ?
For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
that the number of threads is either 1 or cpu_smt_max_threads (as
CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
a (hypothetical) platform with:
- CPU0: 2 threads
- CPU1: 4 threads
should not be able to set the number of threads to 2, as
1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
So if there is an assumption that all the CPUs have the same number of
threads, it should be sufficient to count the number of threads for one
CPU right ?
In the other (unlikely) case (i.e. CPUs can have various number of threads),
I think we should either:
- use your method and check that all the CPUs have the same number of threads
- get the number of threads for one CPU and check that all the CPUs are
identical using the ACPI_PPTT_ACPI_IDENTICAL tag
- have a per-cpu cpu_smt_max_threads value ?
Same comment for the DT patch. If there is an assumption that all CPUs have
the same number of threads, then update_smt_num_threads() could only be called
once I suppose,
Regards,
Pierre
On 8/6/24 10:53, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> For ACPI we'll build the topology from PPTT and we cannot directly
> get the SMT number of each core. Instead using a temporary xarray
> to record the SMT number of each core when building the topology
> and we can know the largest SMT number in the system. Then we can
> enable the support of SMT control.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..f72e1e55b05e 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -15,8 +15,10 @@
> #include <linux/arch_topology.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpufreq.h>
> +#include <linux/cpu_smt.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/xarray.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
> */
> int __init parse_acpi_topology(void)
> {
> + int thread_num, max_smt_thread_num = 1;
> + struct xarray core_threads;
> int cpu, topology_id;
> + void *entry;
>
> if (acpi_disabled)
> return 0;
>
> + xa_init(&core_threads);
> +
> for_each_possible_cpu(cpu) {
> topology_id = find_acpi_cpu_topology(cpu, 0);
> if (topology_id < 0)
> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].thread_id = topology_id;
> topology_id = find_acpi_cpu_topology(cpu, 1);
> cpu_topology[cpu].core_id = topology_id;
> +
> + entry = xa_load(&core_threads, topology_id);
> + if (!entry) {
> + xa_store(&core_threads, topology_id,
> + xa_mk_value(1), GFP_KERNEL);
> + } else {
> + thread_num = xa_to_value(entry);
> + thread_num++;
> + xa_store(&core_threads, topology_id,
> + xa_mk_value(thread_num), GFP_KERNEL);
> +
> + if (thread_num > max_smt_thread_num)
> + max_smt_thread_num = thread_num;
> + }
> } else {
> cpu_topology[cpu].thread_id = -1;
> cpu_topology[cpu].core_id = topology_id;
> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].package_id = topology_id;
> }
>
> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
> +
> + xa_destroy(&core_threads);
> return 0;
> }
> #endif
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT
2024-08-27 15:40 ` Pierre Gondois
@ 2024-08-29 6:50 ` Yicong Yang
0 siblings, 0 replies; 26+ messages in thread
From: Yicong Yang @ 2024-08-29 6:50 UTC (permalink / raw)
To: Pierre Gondois
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, x86, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, linux-kernel, dietmar.eggemann, gregkh, rafael,
jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/8/27 23:40, Pierre Gondois wrote:
> Hello Yicong,
>
> Is it necessary to have an explicit dependency over SMP for arm64 ? Cf.
> commit 4b3dc9679cf7 ("arm64: force CONFIG_SMP=y and remove redundant #ifdefs")
Thanks for the information. Then it's redundant to depend on CONFIG_SMP. Will drop it.
>
> Regards,
> Pierre
>
> On 8/6/24 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> Enable HOTPLUG_SMT for SMT control.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> arch/arm64/Kconfig | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index a2f8ff354ca6..bd3bc2f5e0ec 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -238,6 +238,7 @@ config ARM64
>> select HAVE_KRETPROBES
>> select HAVE_GENERIC_VDSO
>> select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
>> + select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
>> select IRQ_DOMAIN
>> select IRQ_FORCED_THREADING
>> select KASAN_VMALLOC if KASAN
>
> .
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-27 15:40 ` Pierre Gondois
@ 2024-08-29 7:40 ` Yicong Yang
2024-08-29 12:46 ` Pierre Gondois
0 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-08-29 7:40 UTC (permalink / raw)
To: Pierre Gondois
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
Hi Pierre,
On 2024/8/27 23:40, Pierre Gondois wrote:
> Hello Yicong,
> IIUC we are looking for the maximum number of threads a CPU can have
> in the platform. But is is actually possible to have a platform with
> CPUs not having the same number of threads ?
>
I was thinking of the heterogenous platforms, for example part of the
cores have SMT and others don't (I don't have any though, but there
should be some such platforms for arm64).
> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
> that the number of threads is either 1 or cpu_smt_max_threads (as
> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
> a (hypothetical) platform with:
> - CPU0: 2 threads
> - CPU1: 4 threads
> should not be able to set the number of threads to 2, as
> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>
It's a bit more complex. If we enable/disable the SMT using on/off control
then we won't have this problem. We'll online all the CPUs on enabling and
offline CPUs which is not primary thread and don't care about the thread
number of each core.
Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
and only enabled on powerpc. I think this interface is not enough to handle
the hypothetical we assumed, since it assumes the homogenous platform that
all the CPUs have the same thread number. But this should be fine for
the platforms with SMT(with same thread number) and non-SMT cores, since it do
indicates the real thread number of the SMT cores.
> So if there is an assumption that all the CPUs have the same number of
> threads, it should be sufficient to count the number of threads for one
> CPU right ?
>
Naturally and conveniently I may think use of the threads number of CPU0 (boot
cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
platform :(
> In the other (unlikely) case (i.e. CPUs can have various number of threads),
> I think we should either:
> - use your method and check that all the CPUs have the same number of threads
> - get the number of threads for one CPU and check that all the CPUs are
> identical using the ACPI_PPTT_ACPI_IDENTICAL tag
I think this won't be simpler since we still need to iterate all the CPUs to see
if they have the same hetero_id (checked how we're using this ACPI tag in
arm_acpi_register_pmu_device()). We could already know if they're identical by
comparing the thread number and do the update if necessary.
> - have a per-cpu cpu_smt_max_threads value ?
This should be unnecessary in currently stage if there's no platforms
with several kind cores have different thread number like in your assumption
and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
a global cpu_smt_max_threads to enable the SMT control should be enough.
Does it make sense?
Thanks,
Yicong
>
> Same comment for the DT patch. If there is an assumption that all CPUs have
> the same number of threads, then update_smt_num_threads() could only be called
> once I suppose,
>
> Regards,
> Pierre
>
>
> On 8/6/24 10:53, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> For ACPI we'll build the topology from PPTT and we cannot directly
>> get the SMT number of each core. Instead using a temporary xarray
>> to record the SMT number of each core when building the topology
>> and we can know the largest SMT number in the system. Then we can
>> enable the support of SMT control.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>> 1 file changed, 24 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -15,8 +15,10 @@
>> #include <linux/arch_topology.h>
>> #include <linux/cacheinfo.h>
>> #include <linux/cpufreq.h>
>> +#include <linux/cpu_smt.h>
>> #include <linux/init.h>
>> #include <linux/percpu.h>
>> +#include <linux/xarray.h>
>> #include <asm/cpu.h>
>> #include <asm/cputype.h>
>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>> */
>> int __init parse_acpi_topology(void)
>> {
>> + int thread_num, max_smt_thread_num = 1;
>> + struct xarray core_threads;
>> int cpu, topology_id;
>> + void *entry;
>> if (acpi_disabled)
>> return 0;
>> + xa_init(&core_threads);
>> +
>> for_each_possible_cpu(cpu) {
>> topology_id = find_acpi_cpu_topology(cpu, 0);
>> if (topology_id < 0)
>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].thread_id = topology_id;
>> topology_id = find_acpi_cpu_topology(cpu, 1);
>> cpu_topology[cpu].core_id = topology_id;
>> +
>> + entry = xa_load(&core_threads, topology_id);
>> + if (!entry) {
>> + xa_store(&core_threads, topology_id,
>> + xa_mk_value(1), GFP_KERNEL);
>> + } else {
>> + thread_num = xa_to_value(entry);
>> + thread_num++;
>> + xa_store(&core_threads, topology_id,
>> + xa_mk_value(thread_num), GFP_KERNEL);
>> +
>> + if (thread_num > max_smt_thread_num)
>> + max_smt_thread_num = thread_num;
>> + }
>> } else {
>> cpu_topology[cpu].thread_id = -1;
>> cpu_topology[cpu].core_id = topology_id;
>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].package_id = topology_id;
>> }
>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> +
>> + xa_destroy(&core_threads);
>> return 0;
>> }
>> #endif
>
> .
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-29 7:40 ` Yicong Yang
@ 2024-08-29 12:46 ` Pierre Gondois
2024-08-30 9:35 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Pierre Gondois @ 2024-08-29 12:46 UTC (permalink / raw)
To: Yicong Yang
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
[-- Attachment #1: Type: text/plain, Size: 13143 bytes --]
Hello Yicong,
On 8/29/24 09:40, Yicong Yang wrote:
> Hi Pierre,
>
> On 2024/8/27 23:40, Pierre Gondois wrote:
>> Hello Yicong,
>> IIUC we are looking for the maximum number of threads a CPU can have
>> in the platform. But is is actually possible to have a platform with
>> CPUs not having the same number of threads ?
>>
>
> I was thinking of the heterogenous platforms, for example part of the
> cores have SMT and others don't (I don't have any though, but there
> should be some such platforms for arm64).
>
>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
>> that the number of threads is either 1 or cpu_smt_max_threads (as
>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
>> a (hypothetical) platform with:
>> - CPU0: 2 threads
>> - CPU1: 4 threads
>> should not be able to set the number of threads to 2, as
>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>>
>
> It's a bit more complex. If we enable/disable the SMT using on/off control
> then we won't have this problem. We'll online all the CPUs on enabling and
> offline CPUs which is not primary thread and don't care about the thread
> number of each core.
>
> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
> and only enabled on powerpc. I think this interface is not enough to handle
> the hypothetical we assumed, since it assumes the homogenous platform that
> all the CPUs have the same thread number. But this should be fine for
> the platforms with SMT(with same thread number) and non-SMT cores, since it do
> indicates the real thread number of the SMT cores.
>
>> So if there is an assumption that all the CPUs have the same number of
>> threads, it should be sufficient to count the number of threads for one
>> CPU right ?
>>
>
> Naturally and conveniently I may think use of the threads number of CPU0 (boot
> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
> platform :(
>
>> In the other (unlikely) case (i.e. CPUs can have various number of threads),
>> I think we should either:
>> - use your method and check that all the CPUs have the same number of threads
>> - get the number of threads for one CPU and check that all the CPUs are
>> identical using the ACPI_PPTT_ACPI_IDENTICAL tag
>
> I think this won't be simpler since we still need to iterate all the CPUs to see
> if they have the same hetero_id (checked how we're using this ACPI tag in
> arm_acpi_register_pmu_device()). We could already know if they're identical by
> comparing the thread number and do the update if necessary.
>
>> - have a per-cpu cpu_smt_max_threads value ?
>
> This should be unnecessary in currently stage if there's no platforms
> with several kind cores have different thread number like in your assumption
> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
> a global cpu_smt_max_threads to enable the SMT control should be enough.
> Does it make sense?
Yes, I agree with all the things you said:
- the current smt/control interface cannot handle asymmetric SMT platforms
- I am not aware if such platform exist so far
I think it would still be good to check all the CPUs have the same number of
threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the
patch attached (and to check CPUs have the same number of threads). Feel free
to take what you like/ignore what is inside the attached patch, or let me know
if I should submit a part in a separate patchset,
----------------------------
[RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
- On arm64 ACPI systems, change the thread_id assignment to have
increasing values starting from 0. This is already the case for DT
based systems. Doing so allows to uniformly identify the n-th
thread of a given CPU.
- Check that all CPUs have the same number of threads (for DT/ACPI)
- Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
for socket0 and 128 + (0,32,64,96) for socket1:
$ cd /sys/devices/system/cpu/smt/
$ cat ../online
0-255
$ echo 2 > control
$ cat ../online
0-63,128-191
$ echo 3 > control
$ cat ../online
0-95,128-223
$ echo on > control
$ cat ../online
0-255
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bd3bc2f5e0ec..1d8521483065 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -239,6 +239,7 @@ config ARM64
select HAVE_GENERIC_VDSO
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
+ select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 0f6ef432fb84..7dd211f81687 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
#define arch_scale_hw_pressure topology_get_hw_pressure
#define arch_update_hw_pressure topology_update_hw_pressure
+#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
+#include <linux/cpu_smt.h>
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+ return topology_thread_id(cpu) < cpu_smt_num_threads;
+}
+#endif
+
#include <asm-generic/topology.h>
#endif /* _ASM_ARM_TOPOLOGY_H */
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index f72e1e55b05e..a83babe19972 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
{
int thread_num, max_smt_thread_num = 1;
struct xarray core_threads;
+ bool have_thread = false;
int cpu, topology_id;
+ unsigned long i;
void *entry;
if (acpi_disabled)
@@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
return topology_id;
if (acpi_cpu_is_threaded(cpu)) {
+ have_thread = true;
+
cpu_topology[cpu].thread_id = topology_id;
topology_id = find_acpi_cpu_topology(cpu, 1);
cpu_topology[cpu].core_id = topology_id;
@@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
if (!entry) {
xa_store(&core_threads, topology_id,
xa_mk_value(1), GFP_KERNEL);
+ cpu_topology[cpu].thread_id = 0;
} else {
thread_num = xa_to_value(entry);
- thread_num++;
+ cpu_topology[cpu].thread_id = thread_num++;
xa_store(&core_threads, topology_id,
xa_mk_value(thread_num), GFP_KERNEL);
@@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].cluster_id = topology_id;
topology_id = find_acpi_cpu_topology_package(cpu);
cpu_topology[cpu].package_id = topology_id;
+
+ pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
+ cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
+ cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
}
+ if (have_thread)
+ xa_for_each(&core_threads, i, entry)
+ if (xa_to_value(entry) != max_smt_thread_num)
+ pr_warn("Heterogeneous SMT topology not handled");
+
cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
xa_destroy(&core_threads);
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 95513abd664f..20d7f5b72ddd 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
return cpu;
}
-static void __init update_smt_num_threads(unsigned int num_threads)
+static void __init update_smt_num_threads(int num_threads)
{
- static unsigned int max_smt_thread_num = 1;
+ static int max_smt_thread_num = -1;
- if (num_threads > max_smt_thread_num) {
+ if (max_smt_thread_num < 0) {
max_smt_thread_num = num_threads;
cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+ } else if (num_threads != max_smt_thread_num) {
+ pr_warn("Heterogeneous SMT topology not handled");
}
}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index b721f360d759..afdfdc64a0a1 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
#define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
#define topology_core_id(cpu) (cpu_topology[cpu].core_id)
+#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id)
#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
#define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
----------------------------
Regards,
Pierre
>
> Thanks,
> Yicong
>
>>
>> Same comment for the DT patch. If there is an assumption that all CPUs have
>> the same number of threads, then update_smt_num_threads() could only be called
>> once I suppose,
>>
>> Regards,
>> Pierre
>>
>>
>> On 8/6/24 10:53, Yicong Yang wrote:
>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>
>>> For ACPI we'll build the topology from PPTT and we cannot directly
>>> get the SMT number of each core. Instead using a temporary xarray
>>> to record the SMT number of each core when building the topology
>>> and we can know the largest SMT number in the system. Then we can
>>> enable the support of SMT control.
>>>
>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>> ---
>>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>> 1 file changed, 24 insertions(+)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -15,8 +15,10 @@
>>> #include <linux/arch_topology.h>
>>> #include <linux/cacheinfo.h>
>>> #include <linux/cpufreq.h>
>>> +#include <linux/cpu_smt.h>
>>> #include <linux/init.h>
>>> #include <linux/percpu.h>
>>> +#include <linux/xarray.h>
>>> #include <asm/cpu.h>
>>> #include <asm/cputype.h>
>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>> */
>>> int __init parse_acpi_topology(void)
>>> {
>>> + int thread_num, max_smt_thread_num = 1;
>>> + struct xarray core_threads;
>>> int cpu, topology_id;
>>> + void *entry;
>>> if (acpi_disabled)
>>> return 0;
>>> + xa_init(&core_threads);
>>> +
>>> for_each_possible_cpu(cpu) {
>>> topology_id = find_acpi_cpu_topology(cpu, 0);
>>> if (topology_id < 0)
>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>> cpu_topology[cpu].thread_id = topology_id;
>>> topology_id = find_acpi_cpu_topology(cpu, 1);
>>> cpu_topology[cpu].core_id = topology_id;
>>> +
>>> + entry = xa_load(&core_threads, topology_id);
>>> + if (!entry) {
>>> + xa_store(&core_threads, topology_id,
>>> + xa_mk_value(1), GFP_KERNEL);
>>> + } else {
>>> + thread_num = xa_to_value(entry);
>>> + thread_num++;
>>> + xa_store(&core_threads, topology_id,
>>> + xa_mk_value(thread_num), GFP_KERNEL);
>>> +
>>> + if (thread_num > max_smt_thread_num)
>>> + max_smt_thread_num = thread_num;
>>> + }
>>> } else {
>>> cpu_topology[cpu].thread_id = -1;
>>> cpu_topology[cpu].core_id = topology_id;
>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>> cpu_topology[cpu].package_id = topology_id;
>>> }
>>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>> +
>>> + xa_destroy(&core_threads);
>>> return 0;
>>> }
>>> #endif
>>
>> .
[-- Attachment #2: 0001-RFC-arm64-topology-Enable-CONFIG_SMT_NUM_THREADS_DYN.patch --]
[-- Type: text/x-patch, Size: 5575 bytes --]
From 6047e7700eb47a01f7ddb794677a72513fd2ff2f Mon Sep 17 00:00:00 2001
From: Pierre Gondois <pierre.gondois@arm.com>
Date: Thu, 29 Aug 2024 11:19:08 +0200
Subject: [PATCH] [RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
- On arm64 ACPI systems, change the thread_id assignment to have
increasing values starting from 0. This is already the case for DT
based systems. Doing so allows to uniformly identify the n-th
thread of a given CPU.
- Check that all CPUs have the same number of threads (for DT/ACPI)
- Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
for socket0 and 128 + (0,32,64,96) for socket1:
$ cd /sys/devices/system/cpu/smt/
$ cat ../online
0-255
$ echo 2 > control
$ cat ../online
0-63,128-191
$ echo 3 > control
$ cat ../online
0-95,128-223
$ echo on > control
$ cat ../online
0-255
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/topology.h | 8 ++++++++
arch/arm64/kernel/topology.c | 16 +++++++++++++++-
drivers/base/arch_topology.c | 8 +++++---
include/linux/arch_topology.h | 1 +
5 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bd3bc2f5e0ec..1d8521483065 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -239,6 +239,7 @@ config ARM64
select HAVE_GENERIC_VDSO
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
+ select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 0f6ef432fb84..7dd211f81687 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
#define arch_scale_hw_pressure topology_get_hw_pressure
#define arch_update_hw_pressure topology_update_hw_pressure
+#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
+#include <linux/cpu_smt.h>
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+ return topology_thread_id(cpu) < cpu_smt_num_threads;
+}
+#endif
+
#include <asm-generic/topology.h>
#endif /* _ASM_ARM_TOPOLOGY_H */
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index f72e1e55b05e..a83babe19972 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
{
int thread_num, max_smt_thread_num = 1;
struct xarray core_threads;
+ bool have_thread = false;
int cpu, topology_id;
+ unsigned long i;
void *entry;
if (acpi_disabled)
@@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
return topology_id;
if (acpi_cpu_is_threaded(cpu)) {
+ have_thread = true;
+
cpu_topology[cpu].thread_id = topology_id;
topology_id = find_acpi_cpu_topology(cpu, 1);
cpu_topology[cpu].core_id = topology_id;
@@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
if (!entry) {
xa_store(&core_threads, topology_id,
xa_mk_value(1), GFP_KERNEL);
+ cpu_topology[cpu].thread_id = 0;
} else {
thread_num = xa_to_value(entry);
- thread_num++;
+ cpu_topology[cpu].thread_id = thread_num++;
xa_store(&core_threads, topology_id,
xa_mk_value(thread_num), GFP_KERNEL);
@@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].cluster_id = topology_id;
topology_id = find_acpi_cpu_topology_package(cpu);
cpu_topology[cpu].package_id = topology_id;
+
+ pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
+ cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
+ cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
}
+ if (have_thread)
+ xa_for_each(&core_threads, i, entry)
+ if (xa_to_value(entry) != max_smt_thread_num)
+ pr_warn("Heterogeneous SMT topology not handled");
+
cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
xa_destroy(&core_threads);
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 95513abd664f..20d7f5b72ddd 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
return cpu;
}
-static void __init update_smt_num_threads(unsigned int num_threads)
+static void __init update_smt_num_threads(int num_threads)
{
- static unsigned int max_smt_thread_num = 1;
+ static int max_smt_thread_num = -1;
- if (num_threads > max_smt_thread_num) {
+ if (max_smt_thread_num < 0) {
max_smt_thread_num = num_threads;
cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+ } else if (num_threads != max_smt_thread_num) {
+ pr_warn("Heterogeneous SMT topology not handled");
}
}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index b721f360d759..afdfdc64a0a1 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
#define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
#define topology_core_id(cpu) (cpu_topology[cpu].core_id)
+#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id)
#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
#define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-29 12:46 ` Pierre Gondois
@ 2024-08-30 9:35 ` Yicong Yang
2024-09-02 7:43 ` Pierre Gondois
0 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-08-30 9:35 UTC (permalink / raw)
To: Pierre Gondois
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/8/29 20:46, Pierre Gondois wrote:
> Hello Yicong,
>
> On 8/29/24 09:40, Yicong Yang wrote:
>> Hi Pierre,
>>
>> On 2024/8/27 23:40, Pierre Gondois wrote:
>>> Hello Yicong,
>>> IIUC we are looking for the maximum number of threads a CPU can have
>>> in the platform. But is is actually possible to have a platform with
>>> CPUs not having the same number of threads ?
>>>
>>
>> I was thinking of the heterogenous platforms, for example part of the
>> cores have SMT and others don't (I don't have any though, but there
>> should be some such platforms for arm64).
>>
>>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
>>> that the number of threads is either 1 or cpu_smt_max_threads (as
>>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
>>> a (hypothetical) platform with:
>>> - CPU0: 2 threads
>>> - CPU1: 4 threads
>>> should not be able to set the number of threads to 2, as
>>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>>>
>>
>> It's a bit more complex. If we enable/disable the SMT using on/off control
>> then we won't have this problem. We'll online all the CPUs on enabling and
>> offline CPUs which is not primary thread and don't care about the thread
>> number of each core.
>>
>> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
>> and only enabled on powerpc. I think this interface is not enough to handle
>> the hypothetical we assumed, since it assumes the homogenous platform that
>> all the CPUs have the same thread number. But this should be fine for
>> the platforms with SMT(with same thread number) and non-SMT cores, since it do
>> indicates the real thread number of the SMT cores.
>>
>>> So if there is an assumption that all the CPUs have the same number of
>>> threads, it should be sufficient to count the number of threads for one
>>> CPU right ?
>>>
>>
>> Naturally and conveniently I may think use of the threads number of CPU0 (boot
>> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
>> platform :(
>>
>>> In the other (unlikely) case (i.e. CPUs can have various number of threads),
>>> I think we should either:
>>> - use your method and check that all the CPUs have the same number of threads
>>> - get the number of threads for one CPU and check that all the CPUs are
>>> identical using the ACPI_PPTT_ACPI_IDENTICAL tag
>>
>> I think this won't be simpler since we still need to iterate all the CPUs to see
>> if they have the same hetero_id (checked how we're using this ACPI tag in
>> arm_acpi_register_pmu_device()). We could already know if they're identical by
>> comparing the thread number and do the update if necessary.
>>
>>> - have a per-cpu cpu_smt_max_threads value ?
>>
>> This should be unnecessary in currently stage if there's no platforms
>> with several kind cores have different thread number like in your assumption
>> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
>> a global cpu_smt_max_threads to enable the SMT control should be enough.
>> Does it make sense?
>
> Yes, I agree with all the things you said:
> - the current smt/control interface cannot handle asymmetric SMT platforms
> - I am not aware if such platform exist so far
>
> I think it would still be good to check all the CPUs have the same number of
> threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the
> patch attached (and to check CPUs have the same number of threads). Feel free
> to take what you like/ignore what is inside the attached patch, or let me know
> if I should submit a part in a separate patchset,
>
Checked the changes, we can make this series as the basic support and a separate
series of yours as a further support of SMT control on arm64:
- support thread_id on ACPI based arm64 platform
- support partial SMT control by enable CONFIG_SMT_NUM_THREADS_DYNAMIC
some minor comments below.
> ----------------------------
>
> [RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
> - On arm64 ACPI systems, change the thread_id assignment to have
> increasing values starting from 0. This is already the case for DT
> based systems. Doing so allows to uniformly identify the n-th
> thread of a given CPU.
> - Check that all CPUs have the same number of threads (for DT/ACPI)
> - Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
> On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
> for socket0 and 128 + (0,32,64,96) for socket1:
> $ cd /sys/devices/system/cpu/smt/
> $ cat ../online
> 0-255
> $ echo 2 > control
> $ cat ../online
> 0-63,128-191
> $ echo 3 > control
> $ cat ../online
> 0-95,128-223
> $ echo on > control
> $ cat ../online
> 0-255
>
So it's a real SMT4 system, it does make sense to have this partial SMT control
support.
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index bd3bc2f5e0ec..1d8521483065 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -239,6 +239,7 @@ config ARM64
> select HAVE_GENERIC_VDSO
> select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
> select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
> + select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
> select IRQ_DOMAIN
> select IRQ_FORCED_THREADING
> select KASAN_VMALLOC if KASAN
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index 0f6ef432fb84..7dd211f81687 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
> #define arch_scale_hw_pressure topology_get_hw_pressure
> #define arch_update_hw_pressure topology_update_hw_pressure
>
> +#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
> +#include <linux/cpu_smt.h>
> +static inline bool topology_smt_thread_allowed(unsigned int cpu)
> +{
> + return topology_thread_id(cpu) < cpu_smt_num_threads;
> +}
> +#endif
> +
> #include <asm-generic/topology.h>
>
> #endif /* _ASM_ARM_TOPOLOGY_H */
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index f72e1e55b05e..a83babe19972 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
> {
> int thread_num, max_smt_thread_num = 1;
> struct xarray core_threads;
> + bool have_thread = false;
> int cpu, topology_id;
> + unsigned long i;
> void *entry;
>
> if (acpi_disabled)
> @@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
> return topology_id;
>
> if (acpi_cpu_is_threaded(cpu)) {
> + have_thread = true;
> +
> cpu_topology[cpu].thread_id = topology_id;
> topology_id = find_acpi_cpu_topology(cpu, 1);
> cpu_topology[cpu].core_id = topology_id;
> @@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
> if (!entry) {
> xa_store(&core_threads, topology_id,
> xa_mk_value(1), GFP_KERNEL);
> + cpu_topology[cpu].thread_id = 0;
> } else {
> thread_num = xa_to_value(entry);
> - thread_num++;
> + cpu_topology[cpu].thread_id = thread_num++;
> xa_store(&core_threads, topology_id,
> xa_mk_value(thread_num), GFP_KERNEL);
>
> @@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].cluster_id = topology_id;
> topology_id = find_acpi_cpu_topology_package(cpu);
> cpu_topology[cpu].package_id = topology_id;
> +
> + pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
> + cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
> + cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
> }
>
> + if (have_thread)
we could know this from max_smt_thread_num so have_thread maybe unnecessary.
> + xa_for_each(&core_threads, i, entry)
> + if (xa_to_value(entry) != max_smt_thread_num)
> + pr_warn("Heterogeneous SMT topology not handled");\
Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale
system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need
further think.
Thanks.
> +
> cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>
> xa_destroy(&core_threads);
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 95513abd664f..20d7f5b72ddd 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
> return cpu;
> }
>
> -static void __init update_smt_num_threads(unsigned int num_threads)
> +static void __init update_smt_num_threads(int num_threads)
> {
> - static unsigned int max_smt_thread_num = 1;
> + static int max_smt_thread_num = -1;
>
> - if (num_threads > max_smt_thread_num) {
> + if (max_smt_thread_num < 0) {
> max_smt_thread_num = num_threads;
> cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
> + } else if (num_threads != max_smt_thread_num) {
> + pr_warn("Heterogeneous SMT topology not handled");
> }
> }
>
> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> index b721f360d759..afdfdc64a0a1 100644
> --- a/include/linux/arch_topology.h
> +++ b/include/linux/arch_topology.h
> @@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
> #define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
> #define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
> #define topology_core_id(cpu) (cpu_topology[cpu].core_id)
> +#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id)
> #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
> #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
> #define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
>
> ----------------------------
>
>
> Regards,
> Pierre
>
>>
>> Thanks,
>> Yicong
>>
>>>
>>> Same comment for the DT patch. If there is an assumption that all CPUs have
>>> the same number of threads, then update_smt_num_threads() could only be called
>>> once I suppose,
>>>
>>> Regards,
>>> Pierre
>>>
>>>
>>> On 8/6/24 10:53, Yicong Yang wrote:
>>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>>
>>>> For ACPI we'll build the topology from PPTT and we cannot directly
>>>> get the SMT number of each core. Instead using a temporary xarray
>>>> to record the SMT number of each core when building the topology
>>>> and we can know the largest SMT number in the system. Then we can
>>>> enable the support of SMT control.
>>>>
>>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>>> ---
>>>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>>> 1 file changed, 24 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -15,8 +15,10 @@
>>>> #include <linux/arch_topology.h>
>>>> #include <linux/cacheinfo.h>
>>>> #include <linux/cpufreq.h>
>>>> +#include <linux/cpu_smt.h>
>>>> #include <linux/init.h>
>>>> #include <linux/percpu.h>
>>>> +#include <linux/xarray.h>
>>>> #include <asm/cpu.h>
>>>> #include <asm/cputype.h>
>>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>>> */
>>>> int __init parse_acpi_topology(void)
>>>> {
>>>> + int thread_num, max_smt_thread_num = 1;
>>>> + struct xarray core_threads;
>>>> int cpu, topology_id;
>>>> + void *entry;
>>>> if (acpi_disabled)
>>>> return 0;
>>>> + xa_init(&core_threads);
>>>> +
>>>> for_each_possible_cpu(cpu) {
>>>> topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> if (topology_id < 0)
>>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>>> cpu_topology[cpu].thread_id = topology_id;
>>>> topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> cpu_topology[cpu].core_id = topology_id;
>>>> +
>>>> + entry = xa_load(&core_threads, topology_id);
>>>> + if (!entry) {
>>>> + xa_store(&core_threads, topology_id,
>>>> + xa_mk_value(1), GFP_KERNEL);
>>>> + } else {
>>>> + thread_num = xa_to_value(entry);
>>>> + thread_num++;
>>>> + xa_store(&core_threads, topology_id,
>>>> + xa_mk_value(thread_num), GFP_KERNEL);
>>>> +
>>>> + if (thread_num > max_smt_thread_num)
>>>> + max_smt_thread_num = thread_num;
>>>> + }
>>>> } else {
>>>> cpu_topology[cpu].thread_id = -1;
>>>> cpu_topology[cpu].core_id = topology_id;
>>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>>> cpu_topology[cpu].package_id = topology_id;
>>>> }
>>>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>>> +
>>>> + xa_destroy(&core_threads);
>>>> return 0;
>>>> }
>>>> #endif
>>>
>>> .
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-08-30 9:35 ` Yicong Yang
@ 2024-09-02 7:43 ` Pierre Gondois
2024-09-03 12:44 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Pierre Gondois @ 2024-09-02 7:43 UTC (permalink / raw)
To: Yicong Yang
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
Hello Yicong
On 8/30/24 11:35, Yicong Yang wrote:
> On 2024/8/29 20:46, Pierre Gondois wrote:
>> Hello Yicong,
>>
>> On 8/29/24 09:40, Yicong Yang wrote:
>>> Hi Pierre,
>>>
>>> On 2024/8/27 23:40, Pierre Gondois wrote:
>>>> Hello Yicong,
>>>> IIUC we are looking for the maximum number of threads a CPU can have
>>>> in the platform. But is is actually possible to have a platform with
>>>> CPUs not having the same number of threads ?
>>>>
>>>
>>> I was thinking of the heterogenous platforms, for example part of the
>>> cores have SMT and others don't (I don't have any though, but there
>>> should be some such platforms for arm64).
>>>
>>>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
>>>> that the number of threads is either 1 or cpu_smt_max_threads (as
>>>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
>>>> a (hypothetical) platform with:
>>>> - CPU0: 2 threads
>>>> - CPU1: 4 threads
>>>> should not be able to set the number of threads to 2, as
>>>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>>>>
>>>
>>> It's a bit more complex. If we enable/disable the SMT using on/off control
>>> then we won't have this problem. We'll online all the CPUs on enabling and
>>> offline CPUs which is not primary thread and don't care about the thread
>>> number of each core.
>>>
>>> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
>>> and only enabled on powerpc. I think this interface is not enough to handle
>>> the hypothetical we assumed, since it assumes the homogenous platform that
>>> all the CPUs have the same thread number. But this should be fine for
>>> the platforms with SMT(with same thread number) and non-SMT cores, since it do
>>> indicates the real thread number of the SMT cores.
>>>
>>>> So if there is an assumption that all the CPUs have the same number of
>>>> threads, it should be sufficient to count the number of threads for one
>>>> CPU right ?
>>>>
>>>
>>> Naturally and conveniently I may think use of the threads number of CPU0 (boot
>>> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
>>> platform :(
>>>
>>>> In the other (unlikely) case (i.e. CPUs can have various number of threads),
>>>> I think we should either:
>>>> - use your method and check that all the CPUs have the same number of threads
>>>> - get the number of threads for one CPU and check that all the CPUs are
>>>> identical using the ACPI_PPTT_ACPI_IDENTICAL tag
>>>
>>> I think this won't be simpler since we still need to iterate all the CPUs to see
>>> if they have the same hetero_id (checked how we're using this ACPI tag in
>>> arm_acpi_register_pmu_device()). We could already know if they're identical by
>>> comparing the thread number and do the update if necessary.
>>>
>>>> - have a per-cpu cpu_smt_max_threads value ?
>>>
>>> This should be unnecessary in currently stage if there's no platforms
>>> with several kind cores have different thread number like in your assumption
>>> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
>>> a global cpu_smt_max_threads to enable the SMT control should be enough.
>>> Does it make sense?
>>
>> Yes, I agree with all the things you said:
>> - the current smt/control interface cannot handle asymmetric SMT platforms
>> - I am not aware if such platform exist so far
>>
>> I think it would still be good to check all the CPUs have the same number of
>> threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the
>> patch attached (and to check CPUs have the same number of threads). Feel free
>> to take what you like/ignore what is inside the attached patch, or let me know
>> if I should submit a part in a separate patchset,
>>
>
> Checked the changes, we can make this series as the basic support and a separate
> series of yours as a further support of SMT control on arm64:
> - support thread_id on ACPI based arm64 platform
> - support partial SMT control by enable CONFIG_SMT_NUM_THREADS_DYNAMIC
I'm not sure I fully understand what you mean. I can send patches to
enable SMT_NUM_THREADS_DYNAMIC on top of a v6 of yours IIUC. I let you pick
the changes that you estimate to make more sense in your serie (if that makes
sense) ? Please let met know if that works for you (or not).
>
> some minor comments below.
>
>> ----------------------------
>>
>> [RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>> - On arm64 ACPI systems, change the thread_id assignment to have
>> increasing values starting from 0. This is already the case for DT
>> based systems. Doing so allows to uniformly identify the n-th
>> thread of a given CPU.
>> - Check that all CPUs have the same number of threads (for DT/ACPI)
>> - Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>> On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
>> for socket0 and 128 + (0,32,64,96) for socket1:
>> $ cd /sys/devices/system/cpu/smt/
>> $ cat ../online
>> 0-255
>> $ echo 2 > control
>> $ cat ../online
>> 0-63,128-191
>> $ echo 3 > control
>> $ cat ../online
>> 0-95,128-223
>> $ echo on > control
>> $ cat ../online
>> 0-255
>>
>
> So it's a real SMT4 system, it does make sense to have this partial SMT control
> support.
Yes right
>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index bd3bc2f5e0ec..1d8521483065 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -239,6 +239,7 @@ config ARM64
>> select HAVE_GENERIC_VDSO
>> select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
>> select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
>> + select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
>> select IRQ_DOMAIN
>> select IRQ_FORCED_THREADING
>> select KASAN_VMALLOC if KASAN
>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>> index 0f6ef432fb84..7dd211f81687 100644
>> --- a/arch/arm64/include/asm/topology.h
>> +++ b/arch/arm64/include/asm/topology.h
>> @@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
>> #define arch_scale_hw_pressure topology_get_hw_pressure
>> #define arch_update_hw_pressure topology_update_hw_pressure
>>
>> +#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
>> +#include <linux/cpu_smt.h>
>> +static inline bool topology_smt_thread_allowed(unsigned int cpu)
>> +{
>> + return topology_thread_id(cpu) < cpu_smt_num_threads;
>> +}
>> +#endif
>> +
>> #include <asm-generic/topology.h>
>>
>> #endif /* _ASM_ARM_TOPOLOGY_H */
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index f72e1e55b05e..a83babe19972 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
>> {
>> int thread_num, max_smt_thread_num = 1;
>> struct xarray core_threads;
>> + bool have_thread = false;
>> int cpu, topology_id;
>> + unsigned long i;
>> void *entry;
>>
>> if (acpi_disabled)
>> @@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
>> return topology_id;
>>
>> if (acpi_cpu_is_threaded(cpu)) {
>> + have_thread = true;
>> +
>> cpu_topology[cpu].thread_id = topology_id;
>> topology_id = find_acpi_cpu_topology(cpu, 1);
>> cpu_topology[cpu].core_id = topology_id;
>> @@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
>> if (!entry) {
>> xa_store(&core_threads, topology_id,
>> xa_mk_value(1), GFP_KERNEL);
>> + cpu_topology[cpu].thread_id = 0;
>> } else {
>> thread_num = xa_to_value(entry);
>> - thread_num++;
>> + cpu_topology[cpu].thread_id = thread_num++;
>> xa_store(&core_threads, topology_id,
>> xa_mk_value(thread_num), GFP_KERNEL);
>>
>> @@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].cluster_id = topology_id;
>> topology_id = find_acpi_cpu_topology_package(cpu);
>> cpu_topology[cpu].package_id = topology_id;
>> +
>> + pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
>> + cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
>> + cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
>> }
>>
>> + if (have_thread)
>
> we could know this from max_smt_thread_num so have_thread maybe unnecessary.
Yes right, I will change that.
>
>> + xa_for_each(&core_threads, i, entry)
>> + if (xa_to_value(entry) != max_smt_thread_num)
>> + pr_warn("Heterogeneous SMT topology not handled");\
>
> Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale
> system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need
> further think.
I found this comments (not sure this is what you are refering to):
- https://lore.kernel.org/linux-arm-kernel/20231011103303.00002d8f@Huawei.com/
- https://lore.kernel.org/all/20230921150333.c2zqigs3xxwcg4ln@bogus/T/#m406c4c16871ca7ae431beb20feccfb5e14498452
I don't see another way to do it right now. Also, I thing the complexity is in
O(2n), which should be better than the original O(n**2),
Regards,
Pierre
>
> Thanks.
>
>> +
>> cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>
>> xa_destroy(&core_threads);
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 95513abd664f..20d7f5b72ddd 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
>> return cpu;
>> }
>>
>> -static void __init update_smt_num_threads(unsigned int num_threads)
>> +static void __init update_smt_num_threads(int num_threads)
>> {
>> - static unsigned int max_smt_thread_num = 1;
>> + static int max_smt_thread_num = -1;
>>
>> - if (num_threads > max_smt_thread_num) {
>> + if (max_smt_thread_num < 0) {
>> max_smt_thread_num = num_threads;
>> cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> + } else if (num_threads != max_smt_thread_num) {
>> + pr_warn("Heterogeneous SMT topology not handled");
>> }
>> }
>>
>> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
>> index b721f360d759..afdfdc64a0a1 100644
>> --- a/include/linux/arch_topology.h
>> +++ b/include/linux/arch_topology.h
>> @@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
>> #define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
>> #define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
>> #define topology_core_id(cpu) (cpu_topology[cpu].core_id)
>> +#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id)
>> #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
>> #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
>> #define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
>>
>> ----------------------------
>>
>>
>> Regards,
>> Pierre
>>
>>>
>>> Thanks,
>>> Yicong
>>>
>>>>
>>>> Same comment for the DT patch. If there is an assumption that all CPUs have
>>>> the same number of threads, then update_smt_num_threads() could only be called
>>>> once I suppose,
>>>>
>>>> Regards,
>>>> Pierre
>>>>
>>>>
>>>> On 8/6/24 10:53, Yicong Yang wrote:
>>>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>>>
>>>>> For ACPI we'll build the topology from PPTT and we cannot directly
>>>>> get the SMT number of each core. Instead using a temporary xarray
>>>>> to record the SMT number of each core when building the topology
>>>>> and we can know the largest SMT number in the system. Then we can
>>>>> enable the support of SMT control.
>>>>>
>>>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>>>> ---
>>>>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>>>> 1 file changed, 24 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>>>>> --- a/arch/arm64/kernel/topology.c
>>>>> +++ b/arch/arm64/kernel/topology.c
>>>>> @@ -15,8 +15,10 @@
>>>>> #include <linux/arch_topology.h>
>>>>> #include <linux/cacheinfo.h>
>>>>> #include <linux/cpufreq.h>
>>>>> +#include <linux/cpu_smt.h>
>>>>> #include <linux/init.h>
>>>>> #include <linux/percpu.h>
>>>>> +#include <linux/xarray.h>
>>>>> #include <asm/cpu.h>
>>>>> #include <asm/cputype.h>
>>>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>>>> */
>>>>> int __init parse_acpi_topology(void)
>>>>> {
>>>>> + int thread_num, max_smt_thread_num = 1;
>>>>> + struct xarray core_threads;
>>>>> int cpu, topology_id;
>>>>> + void *entry;
>>>>> if (acpi_disabled)
>>>>> return 0;
>>>>> + xa_init(&core_threads);
>>>>> +
>>>>> for_each_possible_cpu(cpu) {
>>>>> topology_id = find_acpi_cpu_topology(cpu, 0);
>>>>> if (topology_id < 0)
>>>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>>>> cpu_topology[cpu].thread_id = topology_id;
>>>>> topology_id = find_acpi_cpu_topology(cpu, 1);
>>>>> cpu_topology[cpu].core_id = topology_id;
>>>>> +
>>>>> + entry = xa_load(&core_threads, topology_id);
>>>>> + if (!entry) {
>>>>> + xa_store(&core_threads, topology_id,
>>>>> + xa_mk_value(1), GFP_KERNEL);
>>>>> + } else {
>>>>> + thread_num = xa_to_value(entry);
>>>>> + thread_num++;
>>>>> + xa_store(&core_threads, topology_id,
>>>>> + xa_mk_value(thread_num), GFP_KERNEL);
>>>>> +
>>>>> + if (thread_num > max_smt_thread_num)
>>>>> + max_smt_thread_num = thread_num;
>>>>> + }
>>>>> } else {
>>>>> cpu_topology[cpu].thread_id = -1;
>>>>> cpu_topology[cpu].core_id = topology_id;
>>>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>>>> cpu_topology[cpu].package_id = topology_id;
>>>>> }
>>>>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>>>> +
>>>>> + xa_destroy(&core_threads);
>>>>> return 0;
>>>>> }
>>>>> #endif
>>>>
>>>> .
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-02 7:43 ` Pierre Gondois
@ 2024-09-03 12:44 ` Yicong Yang
2024-09-05 8:34 ` Pierre Gondois
0 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-09-03 12:44 UTC (permalink / raw)
To: Pierre Gondois
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/9/2 15:43, Pierre Gondois wrote:
> Hello Yicong
>
> On 8/30/24 11:35, Yicong Yang wrote:
>> On 2024/8/29 20:46, Pierre Gondois wrote:
>>> Hello Yicong,
>>>
>>> On 8/29/24 09:40, Yicong Yang wrote:
>>>> Hi Pierre,
>>>>
>>>> On 2024/8/27 23:40, Pierre Gondois wrote:
>>>>> Hello Yicong,
>>>>> IIUC we are looking for the maximum number of threads a CPU can have
>>>>> in the platform. But is is actually possible to have a platform with
>>>>> CPUs not having the same number of threads ?
>>>>>
>>>>
>>>> I was thinking of the heterogenous platforms, for example part of the
>>>> cores have SMT and others don't (I don't have any though, but there
>>>> should be some such platforms for arm64).
>>>>
>>>>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
>>>>> that the number of threads is either 1 or cpu_smt_max_threads (as
>>>>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
>>>>> a (hypothetical) platform with:
>>>>> - CPU0: 2 threads
>>>>> - CPU1: 4 threads
>>>>> should not be able to set the number of threads to 2, as
>>>>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>>>>>
>>>>
>>>> It's a bit more complex. If we enable/disable the SMT using on/off control
>>>> then we won't have this problem. We'll online all the CPUs on enabling and
>>>> offline CPUs which is not primary thread and don't care about the thread
>>>> number of each core.
>>>>
>>>> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
>>>> and only enabled on powerpc. I think this interface is not enough to handle
>>>> the hypothetical we assumed, since it assumes the homogenous platform that
>>>> all the CPUs have the same thread number. But this should be fine for
>>>> the platforms with SMT(with same thread number) and non-SMT cores, since it do
>>>> indicates the real thread number of the SMT cores.
>>>>
>>>>> So if there is an assumption that all the CPUs have the same number of
>>>>> threads, it should be sufficient to count the number of threads for one
>>>>> CPU right ?
>>>>>
>>>>
>>>> Naturally and conveniently I may think use of the threads number of CPU0 (boot
>>>> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
>>>> platform :(
>>>>
>>>>> In the other (unlikely) case (i.e. CPUs can have various number of threads),
>>>>> I think we should either:
>>>>> - use your method and check that all the CPUs have the same number of threads
>>>>> - get the number of threads for one CPU and check that all the CPUs are
>>>>> identical using the ACPI_PPTT_ACPI_IDENTICAL tag
>>>>
>>>> I think this won't be simpler since we still need to iterate all the CPUs to see
>>>> if they have the same hetero_id (checked how we're using this ACPI tag in
>>>> arm_acpi_register_pmu_device()). We could already know if they're identical by
>>>> comparing the thread number and do the update if necessary.
>>>>
>>>>> - have a per-cpu cpu_smt_max_threads value ?
>>>>
>>>> This should be unnecessary in currently stage if there's no platforms
>>>> with several kind cores have different thread number like in your assumption
>>>> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
>>>> a global cpu_smt_max_threads to enable the SMT control should be enough.
>>>> Does it make sense?
>>>
>>> Yes, I agree with all the things you said:
>>> - the current smt/control interface cannot handle asymmetric SMT platforms
>>> - I am not aware if such platform exist so far
>>>
>>> I think it would still be good to check all the CPUs have the same number of
>>> threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the
>>> patch attached (and to check CPUs have the same number of threads). Feel free
>>> to take what you like/ignore what is inside the attached patch, or let me know
>>> if I should submit a part in a separate patchset,
>>>
>>
>> Checked the changes, we can make this series as the basic support and a separate
>> series of yours as a further support of SMT control on arm64:
>> - support thread_id on ACPI based arm64 platform
>> - support partial SMT control by enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>
> I'm not sure I fully understand what you mean. I can send patches to
> enable SMT_NUM_THREADS_DYNAMIC on top of a v6 of yours IIUC. I let you pick
> the changes that you estimate to make more sense in your serie (if that makes
> sense) ? Please let met know if that works for you (or not).
>
yes it works for me :)
>>
>> some minor comments below.
>>
>>> ----------------------------
>>>
>>> [RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>>> - On arm64 ACPI systems, change the thread_id assignment to have
>>> increasing values starting from 0. This is already the case for DT
>>> based systems. Doing so allows to uniformly identify the n-th
>>> thread of a given CPU.
>>> - Check that all CPUs have the same number of threads (for DT/ACPI)
>>> - Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>>> On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
>>> for socket0 and 128 + (0,32,64,96) for socket1:
>>> $ cd /sys/devices/system/cpu/smt/
>>> $ cat ../online
>>> 0-255
>>> $ echo 2 > control
>>> $ cat ../online
>>> 0-63,128-191
>>> $ echo 3 > control
>>> $ cat ../online
>>> 0-95,128-223
>>> $ echo on > control
>>> $ cat ../online
>>> 0-255
>>>
>>
>> So it's a real SMT4 system, it does make sense to have this partial SMT control
>> support.
>
> Yes right
>
>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index bd3bc2f5e0ec..1d8521483065 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -239,6 +239,7 @@ config ARM64
>>> select HAVE_GENERIC_VDSO
>>> select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
>>> select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
>>> + select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
>>> select IRQ_DOMAIN
>>> select IRQ_FORCED_THREADING
>>> select KASAN_VMALLOC if KASAN
>>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>>> index 0f6ef432fb84..7dd211f81687 100644
>>> --- a/arch/arm64/include/asm/topology.h
>>> +++ b/arch/arm64/include/asm/topology.h
>>> @@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
>>> #define arch_scale_hw_pressure topology_get_hw_pressure
>>> #define arch_update_hw_pressure topology_update_hw_pressure
>>> +#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
>>> +#include <linux/cpu_smt.h>
>>> +static inline bool topology_smt_thread_allowed(unsigned int cpu)
>>> +{
>>> + return topology_thread_id(cpu) < cpu_smt_num_threads;
>>> +}
>>> +#endif
>>> +
>>> #include <asm-generic/topology.h>
>>> #endif /* _ASM_ARM_TOPOLOGY_H */
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index f72e1e55b05e..a83babe19972 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
>>> {
>>> int thread_num, max_smt_thread_num = 1;
>>> struct xarray core_threads;
>>> + bool have_thread = false;
>>> int cpu, topology_id;
>>> + unsigned long i;
>>> void *entry;
>>> if (acpi_disabled)
>>> @@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
>>> return topology_id;
>>> if (acpi_cpu_is_threaded(cpu)) {
>>> + have_thread = true;
>>> +
>>> cpu_topology[cpu].thread_id = topology_id;
>>> topology_id = find_acpi_cpu_topology(cpu, 1);
>>> cpu_topology[cpu].core_id = topology_id;
>>> @@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
>>> if (!entry) {
>>> xa_store(&core_threads, topology_id,
>>> xa_mk_value(1), GFP_KERNEL);
>>> + cpu_topology[cpu].thread_id = 0;
>>> } else {
>>> thread_num = xa_to_value(entry);
>>> - thread_num++;
>>> + cpu_topology[cpu].thread_id = thread_num++;
>>> xa_store(&core_threads, topology_id,
>>> xa_mk_value(thread_num), GFP_KERNEL);
>>> @@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
>>> cpu_topology[cpu].cluster_id = topology_id;
>>> topology_id = find_acpi_cpu_topology_package(cpu);
>>> cpu_topology[cpu].package_id = topology_id;
>>> +
>>> + pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
>>> + cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
>>> + cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
>>> }
>>> + if (have_thread)
>>
>> we could know this from max_smt_thread_num so have_thread maybe unnecessary.
>
> Yes right, I will change that.
>
>>
>>> + xa_for_each(&core_threads, i, entry)
>>> + if (xa_to_value(entry) != max_smt_thread_num)
>>> + pr_warn("Heterogeneous SMT topology not handled");\
>>
>> Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale
>> system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need
>> further think.
>
> I found this comments (not sure this is what you are refering to):
> - https://lore.kernel.org/linux-arm-kernel/20231011103303.00002d8f@Huawei.com/
> - https://lore.kernel.org/all/20230921150333.c2zqigs3xxwcg4ln@bogus/T/#m406c4c16871ca7ae431beb20feccfb5e14498452
>
> I don't see another way to do it right now. Also, I thing the complexity is in
> O(2n), which should be better than the original O(n**2),
>
yes it's less complex. I'm wondering build up the xarray in another way then we can avoid the
long loops. What about below:
From 5ff5d0100435982764cd85566a6fe006e60ee98e Mon Sep 17 00:00:00 2001
From: Yicong Yang <yangyicong@hisilicon.com>
Date: Fri, 20 Oct 2023 15:38:38 +0800
Subject: [PATCH] arm64: topology: Support SMT control on ACPI based system
For ACPI we'll build the topology from PPTT and we cannot directly
get the SMT number of each core. Instead using a temporary xarray
to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL)
and SMT information of the first core in its heterogeneous CPU cluster
when building the topology. Then we can know the largest SMT number
in the system. Warn if heterogeneous SMT topology exists (multiple
heterogeneous CPU clusters with different SMT thread number) since the
SMT control cannot handle this well. Then enable the support of SMT
control.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
arch/arm64/kernel/topology.c | 60 ++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..f6ec30fae70e 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -15,8 +15,10 @@
#include <linux/arch_topology.h>
#include <linux/cacheinfo.h>
#include <linux/cpufreq.h>
+#include <linux/cpu_smt.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/xarray.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
@@ -37,17 +39,29 @@ static bool __init acpi_cpu_is_threaded(int cpu)
return !!is_threaded;
}
+struct cpu_smt_info {
+ int thread_num;
+ int core_id;
+ int cpu;
+};
+
/*
* Propagate the topology information of the processor_topology_node tree to the
* cpu_topology array.
*/
int __init parse_acpi_topology(void)
{
+ int max_smt_thread_num = 1;
+ struct cpu_smt_info *entry;
+ struct xarray hetero_cpu;
+ unsigned long hetero_id;
int cpu, topology_id;
if (acpi_disabled)
return 0;
+ xa_init(&hetero_cpu);
+
for_each_possible_cpu(cpu) {
topology_id = find_acpi_cpu_topology(cpu, 0);
if (topology_id < 0)
@@ -57,6 +71,30 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].thread_id = topology_id;
topology_id = find_acpi_cpu_topology(cpu, 1);
cpu_topology[cpu].core_id = topology_id;
+
+ /*
+ * Build up the XArray using the heterogeneous ID of
+ * the CPU cluster. Store the CPU and SMT information
+ * of the first appeared CPU in the CPU cluster of this
+ * heterogeneous ID since the SMT information should be
+ * the same in this CPU cluster. Then we can know the
+ * SMT information of each heterogeneous CPUs in the
+ * system.
+ */
+ hetero_id = find_acpi_cpu_topology_hetero_id(cpu);
+ entry = (struct cpu_smt_info *)xa_load(&hetero_cpu, hetero_id);
+ if (!entry) {
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ WARN_ON(!entry);
+
+ entry->cpu = cpu;
+ entry->core_id = topology_id;
+ entry->thread_num = 1;
+ xa_store(&hetero_cpu, hetero_id,
+ entry, GFP_KERNEL);
+ } else if (entry->core_id == topology_id) {
+ entry->thread_num++;
+ }
} else {
cpu_topology[cpu].thread_id = -1;
cpu_topology[cpu].core_id = topology_id;
@@ -67,6 +105,28 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].package_id = topology_id;
}
+ /*
+ * This should be a short loop depending on the number of heterogeneous
+ * CPU clusters. Typically on a homogeneous system there's only one
+ * entry in the XArray.
+ */
+ xa_for_each(&hetero_cpu, hetero_id, entry) {
+ if (entry->thread_num == 1)
+ continue;
+
+ if (entry->thread_num != max_smt_thread_num &&
+ max_smt_thread_num != 1)
+ pr_warn("Heterogeneous SMT topology not handled");
+
+ if (entry->thread_num > max_smt_thread_num)
+ max_smt_thread_num = entry->thread_num;
+
+ xa_erase(&hetero_cpu, hetero_id);
+ kfree(entry);
+ }
+
+ cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+ xa_destroy(&hetero_cpu);
return 0;
}
#endif
--
2.24.0
> Regards,
> Pierre
>
>>
>> Thanks.
>>
>>> +
>>> cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>> xa_destroy(&core_threads);
>>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>>> index 95513abd664f..20d7f5b72ddd 100644
>>> --- a/drivers/base/arch_topology.c
>>> +++ b/drivers/base/arch_topology.c
>>> @@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
>>> return cpu;
>>> }
>>> -static void __init update_smt_num_threads(unsigned int num_threads)
>>> +static void __init update_smt_num_threads(int num_threads)
>>> {
>>> - static unsigned int max_smt_thread_num = 1;
>>> + static int max_smt_thread_num = -1;
>>> - if (num_threads > max_smt_thread_num) {
>>> + if (max_smt_thread_num < 0) {
>>> max_smt_thread_num = num_threads;
>>> cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>> + } else if (num_threads != max_smt_thread_num) {
>>> + pr_warn("Heterogeneous SMT topology not handled");
>>> }
>>> }
>>> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
>>> index b721f360d759..afdfdc64a0a1 100644
>>> --- a/include/linux/arch_topology.h
>>> +++ b/include/linux/arch_topology.h
>>> @@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
>>> #define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
>>> #define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
>>> #define topology_core_id(cpu) (cpu_topology[cpu].core_id)
>>> +#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id)
>>> #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
>>> #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
>>> #define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
>>>
>>> ----------------------------
>>>
>>>
>>> Regards,
>>> Pierre
>>>
>>>>
>>>> Thanks,
>>>> Yicong
>>>>
>>>>>
>>>>> Same comment for the DT patch. If there is an assumption that all CPUs have
>>>>> the same number of threads, then update_smt_num_threads() could only be called
>>>>> once I suppose,
>>>>>
>>>>> Regards,
>>>>> Pierre
>>>>>
>>>>>
>>>>> On 8/6/24 10:53, Yicong Yang wrote:
>>>>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>>>>
>>>>>> For ACPI we'll build the topology from PPTT and we cannot directly
>>>>>> get the SMT number of each core. Instead using a temporary xarray
>>>>>> to record the SMT number of each core when building the topology
>>>>>> and we can know the largest SMT number in the system. Then we can
>>>>>> enable the support of SMT control.
>>>>>>
>>>>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>>>>> ---
>>>>>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>>>>> 1 file changed, 24 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>>>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>>>>>> --- a/arch/arm64/kernel/topology.c
>>>>>> +++ b/arch/arm64/kernel/topology.c
>>>>>> @@ -15,8 +15,10 @@
>>>>>> #include <linux/arch_topology.h>
>>>>>> #include <linux/cacheinfo.h>
>>>>>> #include <linux/cpufreq.h>
>>>>>> +#include <linux/cpu_smt.h>
>>>>>> #include <linux/init.h>
>>>>>> #include <linux/percpu.h>
>>>>>> +#include <linux/xarray.h>
>>>>>> #include <asm/cpu.h>
>>>>>> #include <asm/cputype.h>
>>>>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>>>>> */
>>>>>> int __init parse_acpi_topology(void)
>>>>>> {
>>>>>> + int thread_num, max_smt_thread_num = 1;
>>>>>> + struct xarray core_threads;
>>>>>> int cpu, topology_id;
>>>>>> + void *entry;
>>>>>> if (acpi_disabled)
>>>>>> return 0;
>>>>>> + xa_init(&core_threads);
>>>>>> +
>>>>>> for_each_possible_cpu(cpu) {
>>>>>> topology_id = find_acpi_cpu_topology(cpu, 0);
>>>>>> if (topology_id < 0)
>>>>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>>>>> cpu_topology[cpu].thread_id = topology_id;
>>>>>> topology_id = find_acpi_cpu_topology(cpu, 1);
>>>>>> cpu_topology[cpu].core_id = topology_id;
>>>>>> +
>>>>>> + entry = xa_load(&core_threads, topology_id);
>>>>>> + if (!entry) {
>>>>>> + xa_store(&core_threads, topology_id,
>>>>>> + xa_mk_value(1), GFP_KERNEL);
>>>>>> + } else {
>>>>>> + thread_num = xa_to_value(entry);
>>>>>> + thread_num++;
>>>>>> + xa_store(&core_threads, topology_id,
>>>>>> + xa_mk_value(thread_num), GFP_KERNEL);
>>>>>> +
>>>>>> + if (thread_num > max_smt_thread_num)
>>>>>> + max_smt_thread_num = thread_num;
>>>>>> + }
>>>>>> } else {
>>>>>> cpu_topology[cpu].thread_id = -1;
>>>>>> cpu_topology[cpu].core_id = topology_id;
>>>>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>>>>> cpu_topology[cpu].package_id = topology_id;
>>>>>> }
>>>>>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>>>>> +
>>>>>> + xa_destroy(&core_threads);
>>>>>> return 0;
>>>>>> }
>>>>>> #endif
>>>>>
>>>>> .
>
> .
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-03 12:44 ` Yicong Yang
@ 2024-09-05 8:34 ` Pierre Gondois
2024-09-05 12:02 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Pierre Gondois @ 2024-09-05 8:34 UTC (permalink / raw)
To: Yicong Yang
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
Hello Yicong,
>>> Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale
>>> system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need
>>> further think.
>>
>> I found this comments (not sure this is what you are refering to):
>> - https://lore.kernel.org/linux-arm-kernel/20231011103303.00002d8f@Huawei.com/
>> - https://lore.kernel.org/all/20230921150333.c2zqigs3xxwcg4ln@bogus/T/#m406c4c16871ca7ae431beb20feccfb5e14498452
>>
>> I don't see another way to do it right now. Also, I thing the complexity is in
>> O(2n), which should be better than the original O(n**2),
>>
>
> yes it's less complex. I'm wondering build up the xarray in another way then we can avoid the
> long loops. What about below:
I tried the patch on a ThunderX2 with 4 threads per CPU. PPTT topology describes
2 clusters of 128 cores each. Each cluster is described as independent (i.e. there
is no root node in the PPTT). The PPTT is of revision 1, so the IDENTICAL
flag is not available on the platform.
I also tried it on a faked SMT asymmetric platform and there might be a small
correction required.
>
> From 5ff5d0100435982764cd85566a6fe006e60ee98e Mon Sep 17 00:00:00 2001
> From: Yicong Yang <yangyicong@hisilicon.com>
> Date: Fri, 20 Oct 2023 15:38:38 +0800
> Subject: [PATCH] arm64: topology: Support SMT control on ACPI based system
>
> For ACPI we'll build the topology from PPTT and we cannot directly
> get the SMT number of each core. Instead using a temporary xarray
> to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL)
> and SMT information of the first core in its heterogeneous CPU cluster
> when building the topology. Then we can know the largest SMT number
> in the system. Warn if heterogeneous SMT topology exists (multiple
> heterogeneous CPU clusters with different SMT thread number) since the
> SMT control cannot handle this well. Then enable the support of SMT
> control.
>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> arch/arm64/kernel/topology.c | 60 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 60 insertions(+)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..f6ec30fae70e 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -15,8 +15,10 @@
> #include <linux/arch_topology.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpufreq.h>
> +#include <linux/cpu_smt.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/xarray.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> @@ -37,17 +39,29 @@ static bool __init acpi_cpu_is_threaded(int cpu)
> return !!is_threaded;
> }
>
> +struct cpu_smt_info {
> + int thread_num;
> + int core_id;
> + int cpu;
> +};
> +
> /*
> * Propagate the topology information of the processor_topology_node tree to the
> * cpu_topology array.
> */
> int __init parse_acpi_topology(void)
> {
> + int max_smt_thread_num = 1;
> + struct cpu_smt_info *entry;
> + struct xarray hetero_cpu;
> + unsigned long hetero_id;
> int cpu, topology_id;
>
> if (acpi_disabled)
> return 0;
>
> + xa_init(&hetero_cpu);
> +
> for_each_possible_cpu(cpu) {
> topology_id = find_acpi_cpu_topology(cpu, 0);
> if (topology_id < 0)
> @@ -57,6 +71,30 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].thread_id = topology_id;
> topology_id = find_acpi_cpu_topology(cpu, 1);
> cpu_topology[cpu].core_id = topology_id;
> +
> + /*
> + * Build up the XArray using the heterogeneous ID of
> + * the CPU cluster. Store the CPU and SMT information
> + * of the first appeared CPU in the CPU cluster of this
> + * heterogeneous ID since the SMT information should be
> + * the same in this CPU cluster. Then we can know the
> + * SMT information of each heterogeneous CPUs in the
> + * system.
> + */
> + hetero_id = find_acpi_cpu_topology_hetero_id(cpu);
> + entry = (struct cpu_smt_info *)xa_load(&hetero_cpu, hetero_id);
> + if (!entry) {
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + WARN_ON(!entry);
> +
> + entry->cpu = cpu;
> + entry->core_id = topology_id;
> + entry->thread_num = 1;
> + xa_store(&hetero_cpu, hetero_id,
> + entry, GFP_KERNEL);
> + } else if (entry->core_id == topology_id) {
> + entry->thread_num++;
> + }
> } else {
> cpu_topology[cpu].thread_id = -1;
> cpu_topology[cpu].core_id = topology_id;
> @@ -67,6 +105,28 @@ int __init parse_acpi_topology(void)
> cpu_topology[cpu].package_id = topology_id;
> }
>
> + /*
> + * This should be a short loop depending on the number of heterogeneous
> + * CPU clusters. Typically on a homogeneous system there's only one
> + * entry in the XArray.
> + */
> + xa_for_each(&hetero_cpu, hetero_id, entry) {
> + if (entry->thread_num == 1)
> + continue;
If a platform has CPUs with:
- 1 thread
- X (!= 1) threads
Then I think that the asymmetry is not detected
> +
> + if (entry->thread_num != max_smt_thread_num &&
> + max_smt_thread_num != 1)
> + pr_warn("Heterogeneous SMT topology not handled");
> +
> + if (entry->thread_num > max_smt_thread_num)
> + max_smt_thread_num = entry->thread_num;
> +
> + xa_erase(&hetero_cpu, hetero_id);
> + kfree(entry);
> + }
> +
> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
> + xa_destroy(&hetero_cpu);
> return 0;
> }
> #endif
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-05 8:34 ` Pierre Gondois
@ 2024-09-05 12:02 ` Yicong Yang
2024-09-06 7:06 ` Morten Rasmussen
0 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-09-05 12:02 UTC (permalink / raw)
To: Pierre Gondois
Cc: yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/9/5 16:34, Pierre Gondois wrote:
> Hello Yicong,
>
>>>> Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale
>>>> system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need
>>>> further think.
>>>
>>> I found this comments (not sure this is what you are refering to):
>>> - https://lore.kernel.org/linux-arm-kernel/20231011103303.00002d8f@Huawei.com/
>>> - https://lore.kernel.org/all/20230921150333.c2zqigs3xxwcg4ln@bogus/T/#m406c4c16871ca7ae431beb20feccfb5e14498452
>>>
>>> I don't see another way to do it right now. Also, I thing the complexity is in
>>> O(2n), which should be better than the original O(n**2),
>>>
>>
>> yes it's less complex. I'm wondering build up the xarray in another way then we can avoid the
>> long loops. What about below:
>
> I tried the patch on a ThunderX2 with 4 threads per CPU. PPTT topology describes
> 2 clusters of 128 cores each. Each cluster is described as independent (i.e. there
> is no root node in the PPTT). The PPTT is of revision 1, so the IDENTICAL
> flag is not available on the platform.
>
I think the SMT detection should work on this case, does it? In this case there should be
2 entries for each cluster respectively in the hetero_cpu array. Since the thread number
are the same, we'll detect a symmetric SMT topology.
> I also tried it on a faked SMT asymmetric platform and there might be a small
> correction required.
>
>>
>> From 5ff5d0100435982764cd85566a6fe006e60ee98e Mon Sep 17 00:00:00 2001
>> From: Yicong Yang <yangyicong@hisilicon.com>
>> Date: Fri, 20 Oct 2023 15:38:38 +0800
>> Subject: [PATCH] arm64: topology: Support SMT control on ACPI based system
>>
>> For ACPI we'll build the topology from PPTT and we cannot directly
>> get the SMT number of each core. Instead using a temporary xarray
>> to record the heterogeneous information (from ACPI_PPTT_ACPI_IDENTICAL)
>> and SMT information of the first core in its heterogeneous CPU cluster
>> when building the topology. Then we can know the largest SMT number
>> in the system. Warn if heterogeneous SMT topology exists (multiple
>> heterogeneous CPU clusters with different SMT thread number) since the
>> SMT control cannot handle this well. Then enable the support of SMT
>> control.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>> arch/arm64/kernel/topology.c | 60 ++++++++++++++++++++++++++++++++++++
>> 1 file changed, 60 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 1a2c72f3e7f8..f6ec30fae70e 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -15,8 +15,10 @@
>> #include <linux/arch_topology.h>
>> #include <linux/cacheinfo.h>
>> #include <linux/cpufreq.h>
>> +#include <linux/cpu_smt.h>
>> #include <linux/init.h>
>> #include <linux/percpu.h>
>> +#include <linux/xarray.h>
>>
>> #include <asm/cpu.h>
>> #include <asm/cputype.h>
>> @@ -37,17 +39,29 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>> return !!is_threaded;
>> }
>>
>> +struct cpu_smt_info {
>> + int thread_num;
>> + int core_id;
>> + int cpu;
>> +};
>> +
>> /*
>> * Propagate the topology information of the processor_topology_node tree to the
>> * cpu_topology array.
>> */
>> int __init parse_acpi_topology(void)
>> {
>> + int max_smt_thread_num = 1;
>> + struct cpu_smt_info *entry;
>> + struct xarray hetero_cpu;
>> + unsigned long hetero_id;
>> int cpu, topology_id;
>>
>> if (acpi_disabled)
>> return 0;
>>
>> + xa_init(&hetero_cpu);
>> +
>> for_each_possible_cpu(cpu) {
>> topology_id = find_acpi_cpu_topology(cpu, 0);
>> if (topology_id < 0)
>> @@ -57,6 +71,30 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].thread_id = topology_id;
>> topology_id = find_acpi_cpu_topology(cpu, 1);
>> cpu_topology[cpu].core_id = topology_id;
>> +
>> + /*
>> + * Build up the XArray using the heterogeneous ID of
>> + * the CPU cluster. Store the CPU and SMT information
>> + * of the first appeared CPU in the CPU cluster of this
>> + * heterogeneous ID since the SMT information should be
>> + * the same in this CPU cluster. Then we can know the
>> + * SMT information of each heterogeneous CPUs in the
>> + * system.
>> + */
>> + hetero_id = find_acpi_cpu_topology_hetero_id(cpu);
>> + entry = (struct cpu_smt_info *)xa_load(&hetero_cpu, hetero_id);
>> + if (!entry) {
>> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
>> + WARN_ON(!entry);
>> +
>> + entry->cpu = cpu;
>> + entry->core_id = topology_id;
>> + entry->thread_num = 1;
>> + xa_store(&hetero_cpu, hetero_id,
>> + entry, GFP_KERNEL);
>> + } else if (entry->core_id == topology_id) {
>> + entry->thread_num++;
>> + }
>> } else {
>> cpu_topology[cpu].thread_id = -1;
>> cpu_topology[cpu].core_id = topology_id;
>> @@ -67,6 +105,28 @@ int __init parse_acpi_topology(void)
>> cpu_topology[cpu].package_id = topology_id;
>> }
>>
>> + /*
>> + * This should be a short loop depending on the number of heterogeneous
>> + * CPU clusters. Typically on a homogeneous system there's only one
>> + * entry in the XArray.
>> + */
>> + xa_for_each(&hetero_cpu, hetero_id, entry) {
>> + if (entry->thread_num == 1)
>> + continue;
>
> If a platform has CPUs with:
> - 1 thread
> - X (!= 1) threads
> Then I think that the asymmetry is not detected
Ah ok, I only handle the case where there are several thread numbers except no SMT CPUs in the
system. For this case I was thinking we don't need to handle this since there's only one kind
of SMT core in the system, control should works fine for the SMT CPU clusters and we may not
care about the CPUs with no SMT.
Below code should handle the case if we initialize the max_smt_thread_num to 0. I also
reword the warning messages to match the fact. For heterogeneous SMT topology we still
support control SMT by on/off toggle but not fully support setting the thread number.
int max_smt_thread_num = 0;
[...]
/*
* This should be a short loop depending on the number of heterogeneous
* CPU clusters. Typically on a homogeneous system there's only one
* entry in the XArray.
*/
xa_for_each(&hetero_cpu, hetero_id, entry) {
/*
* If max_smt_thread_num has been initialized and doesn't match
* the thread number of this entry, then the system has
* heterogeneous SMT topology.
*/
if (entry->thread_num != max_smt_thread_num && max_smt_thread_num)
pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n");
if (entry->thread_num > max_smt_thread_num)
max_smt_thread_num = entry->thread_num;
xa_erase(&hetero_cpu, hetero_id);
kfree(entry);
}
Thanks.
>
>> +
>> + if (entry->thread_num != max_smt_thread_num &&
>> + max_smt_thread_num != 1)
>> + pr_warn("Heterogeneous SMT topology not handled");
>> +
>> + if (entry->thread_num > max_smt_thread_num)
>> + max_smt_thread_num = entry->thread_num;
>> +
>> + xa_erase(&hetero_cpu, hetero_id);
>> + kfree(entry);
>> + }
>> +
>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> + xa_destroy(&hetero_cpu);
>> return 0;
>> }
>> #endif
>
> .
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-05 12:02 ` Yicong Yang
@ 2024-09-06 7:06 ` Morten Rasmussen
2024-09-06 8:36 ` Yicong Yang
0 siblings, 1 reply; 26+ messages in thread
From: Morten Rasmussen @ 2024-09-06 7:06 UTC (permalink / raw)
To: Yicong Yang
Cc: Pierre Gondois, yangyicong, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
Hi Yicong,
On Thu, Sep 05, 2024 at 08:02:20PM +0800, Yicong Yang wrote:
> On 2024/9/5 16:34, Pierre Gondois wrote:
> > Hello Yicong,
> >
> > If a platform has CPUs with:
> > - 1 thread
> > - X (!= 1) threads
> > Then I think that the asymmetry is not detected
>
> Ah ok, I only handle the case where there are several thread numbers except no SMT CPUs in the
> system. For this case I was thinking we don't need to handle this since there's only one kind
> of SMT core in the system, control should works fine for the SMT CPU clusters and we may not
> care about the CPUs with no SMT.
>
> Below code should handle the case if we initialize the max_smt_thread_num to 0. I also
> reword the warning messages to match the fact. For heterogeneous SMT topology we still
> support control SMT by on/off toggle but not fully support setting the thread number.
>
> int max_smt_thread_num = 0;
> [...]
> /*
> * This should be a short loop depending on the number of heterogeneous
> * CPU clusters. Typically on a homogeneous system there's only one
> * entry in the XArray.
> */
> xa_for_each(&hetero_cpu, hetero_id, entry) {
> /*
> * If max_smt_thread_num has been initialized and doesn't match
> * the thread number of this entry, then the system has
> * heterogeneous SMT topology.
> */
> if (entry->thread_num != max_smt_thread_num && max_smt_thread_num)
> pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n");
What does 'partly supported' mean here?
If the SMT control doesn't work as intended for this topology, I don't
think it should be enabled for it.
Morten
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-06 7:06 ` Morten Rasmussen
@ 2024-09-06 8:36 ` Yicong Yang
2024-09-12 11:59 ` Morten Rasmussen
0 siblings, 1 reply; 26+ messages in thread
From: Yicong Yang @ 2024-09-06 8:36 UTC (permalink / raw)
To: Morten Rasmussen
Cc: yangyicong, Pierre Gondois, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
On 2024/9/6 15:06, Morten Rasmussen wrote:
> Hi Yicong,
>
> On Thu, Sep 05, 2024 at 08:02:20PM +0800, Yicong Yang wrote:
>> On 2024/9/5 16:34, Pierre Gondois wrote:
>>> Hello Yicong,
>>>
>>> If a platform has CPUs with:
>>> - 1 thread
>>> - X (!= 1) threads
>>> Then I think that the asymmetry is not detected
>>
>> Ah ok, I only handle the case where there are several thread numbers except no SMT CPUs in the
>> system. For this case I was thinking we don't need to handle this since there's only one kind
>> of SMT core in the system, control should works fine for the SMT CPU clusters and we may not
>> care about the CPUs with no SMT.
>>
>> Below code should handle the case if we initialize the max_smt_thread_num to 0. I also
>> reword the warning messages to match the fact. For heterogeneous SMT topology we still
>> support control SMT by on/off toggle but not fully support setting the thread number.
>>
>> int max_smt_thread_num = 0;
>> [...]
>> /*
>> * This should be a short loop depending on the number of heterogeneous
>> * CPU clusters. Typically on a homogeneous system there's only one
>> * entry in the XArray.
>> */
>> xa_for_each(&hetero_cpu, hetero_id, entry) {
>> /*
>> * If max_smt_thread_num has been initialized and doesn't match
>> * the thread number of this entry, then the system has
>> * heterogeneous SMT topology.
>> */
>> if (entry->thread_num != max_smt_thread_num && max_smt_thread_num)
>> pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n");
>
> What does 'partly supported' mean here?
>
> If the SMT control doesn't work as intended for this topology, I don't
> think it should be enabled for it.
>
The /sys/devices/system/cpu/smt/control supports 2 kind of controls [1]
(a) enable/disable SMT entirely by writing on/off
(b) enable/disable SMT partially by writing a valid thread number (CONFIG_SMT_NUM_THREADS_DYNAMIC)
We assume 3 kind of SMT topology:
1. homogeneous SMT topology, all the CPUs support SMT or not
2.1 heterogeneous SMT topology, part of CPU clusters have SMT and others not. Clusters support SMT
have the same SMT thread number
2.2 heterogeneous SMT topology, part of CPU clusters have SMT and the thread number may differs
(e.g. cluster 1 is of SMT 2 and cluster 2 is of SMT 4. not sure there's such a real system)
For any of above SMT topology, control (a) should work as expected. When enabling SMT by writing "on"
the SMT is disabled for those CPU clusters who have SMT. Same for disabling SMT by writing "off".
But control (b) may not work for case 2.2 since the SMT thread number is not the same across
the system.
For this series alone we won't met this since CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled.
So control (b) only supports write 1/max_thread which behaves same as write off/on and will
work as intended for all the 3 cases. As discussed Pierre will add support for
CONFIG_SMT_NUM_THREADS_DYNAMIC since thunderX2 is a symmetric SMT4 machine and
CONFIG_SMT_NUM_THREADS_DYNAMIC would be useful. We thought a warning should be useful
if running on a system of case 2.2.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-devices-system-cpu#n542
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-06 8:36 ` Yicong Yang
@ 2024-09-12 11:59 ` Morten Rasmussen
2024-09-12 12:53 ` Michal Suchánek
0 siblings, 1 reply; 26+ messages in thread
From: Morten Rasmussen @ 2024-09-12 11:59 UTC (permalink / raw)
To: Yicong Yang
Cc: yangyicong, Pierre Gondois, linuxppc-dev, bp, dave.hansen, mingo,
linux-arm-kernel, mpe, peterz, tglx, sudeep.holla, will,
catalin.marinas, x86, linux-kernel, dietmar.eggemann, gregkh,
rafael, jonathan.cameron, prime.zeng, linuxarm, xuwei5, guohanjun
On Fri, Sep 06, 2024 at 04:36:30PM +0800, Yicong Yang wrote:
> On 2024/9/6 15:06, Morten Rasmussen wrote:
> > Hi Yicong,
> >
> > On Thu, Sep 05, 2024 at 08:02:20PM +0800, Yicong Yang wrote:
> >> On 2024/9/5 16:34, Pierre Gondois wrote:
> >>> Hello Yicong,
> >>>
> >>> If a platform has CPUs with:
> >>> - 1 thread
> >>> - X (!= 1) threads
> >>> Then I think that the asymmetry is not detected
> >>
> >> Ah ok, I only handle the case where there are several thread numbers except no SMT CPUs in the
> >> system. For this case I was thinking we don't need to handle this since there's only one kind
> >> of SMT core in the system, control should works fine for the SMT CPU clusters and we may not
> >> care about the CPUs with no SMT.
> >>
> >> Below code should handle the case if we initialize the max_smt_thread_num to 0. I also
> >> reword the warning messages to match the fact. For heterogeneous SMT topology we still
> >> support control SMT by on/off toggle but not fully support setting the thread number.
> >>
> >> int max_smt_thread_num = 0;
> >> [...]
> >> /*
> >> * This should be a short loop depending on the number of heterogeneous
> >> * CPU clusters. Typically on a homogeneous system there's only one
> >> * entry in the XArray.
> >> */
> >> xa_for_each(&hetero_cpu, hetero_id, entry) {
> >> /*
> >> * If max_smt_thread_num has been initialized and doesn't match
> >> * the thread number of this entry, then the system has
> >> * heterogeneous SMT topology.
> >> */
> >> if (entry->thread_num != max_smt_thread_num && max_smt_thread_num)
> >> pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n");
> >
> > What does 'partly supported' mean here?
> >
> > If the SMT control doesn't work as intended for this topology, I don't
> > think it should be enabled for it.
> >
>
> The /sys/devices/system/cpu/smt/control supports 2 kind of controls [1]
> (a) enable/disable SMT entirely by writing on/off
> (b) enable/disable SMT partially by writing a valid thread number (CONFIG_SMT_NUM_THREADS_DYNAMIC)
>
> We assume 3 kind of SMT topology:
> 1. homogeneous SMT topology, all the CPUs support SMT or not
> 2.1 heterogeneous SMT topology, part of CPU clusters have SMT and others not. Clusters support SMT
> have the same SMT thread number
> 2.2 heterogeneous SMT topology, part of CPU clusters have SMT and the thread number may differs
> (e.g. cluster 1 is of SMT 2 and cluster 2 is of SMT 4. not sure there's such a real system)
>
> For any of above SMT topology, control (a) should work as expected. When enabling SMT by writing "on"
> the SMT is disabled for those CPU clusters who have SMT. Same for disabling SMT by writing "off".
> But control (b) may not work for case 2.2 since the SMT thread number is not the same across
> the system.
>
> For this series alone we won't met this since CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled.
> So control (b) only supports write 1/max_thread which behaves same as write off/on and will
> work as intended for all the 3 cases. As discussed Pierre will add support for
> CONFIG_SMT_NUM_THREADS_DYNAMIC since thunderX2 is a symmetric SMT4 machine and
> CONFIG_SMT_NUM_THREADS_DYNAMIC would be useful. We thought a warning should be useful
> if running on a system of case 2.2.
Thanks for explaining the situation.
So IIUC, for case 2.2 there will be _no_ failures if someone writes a
value different from 1 or max_threads?
The SMT control code can handle that max_threads isn't the correct
number of threads for all cores in the system?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
2024-09-12 11:59 ` Morten Rasmussen
@ 2024-09-12 12:53 ` Michal Suchánek
0 siblings, 0 replies; 26+ messages in thread
From: Michal Suchánek @ 2024-09-12 12:53 UTC (permalink / raw)
To: Morten Rasmussen
Cc: Yicong Yang, yangyicong, Pierre Gondois, linuxppc-dev, bp,
dave.hansen, mingo, linux-arm-kernel, mpe, peterz, tglx,
sudeep.holla, will, catalin.marinas, x86, linux-kernel,
dietmar.eggemann, gregkh, rafael, jonathan.cameron, prime.zeng,
linuxarm, xuwei5, guohanjun
Hello,
On Thu, Sep 12, 2024 at 01:59:16PM +0200, Morten Rasmussen wrote:
> On Fri, Sep 06, 2024 at 04:36:30PM +0800, Yicong Yang wrote:
> > On 2024/9/6 15:06, Morten Rasmussen wrote:
> > > Hi Yicong,
> > >
> > > On Thu, Sep 05, 2024 at 08:02:20PM +0800, Yicong Yang wrote:
> > >> On 2024/9/5 16:34, Pierre Gondois wrote:
> > >>> Hello Yicong,
> > >>>
> > >>> If a platform has CPUs with:
> > >>> - 1 thread
> > >>> - X (!= 1) threads
> > >>> Then I think that the asymmetry is not detected
> > >>
> > >> Ah ok, I only handle the case where there are several thread numbers except no SMT CPUs in the
> > >> system. For this case I was thinking we don't need to handle this since there's only one kind
> > >> of SMT core in the system, control should works fine for the SMT CPU clusters and we may not
> > >> care about the CPUs with no SMT.
> > >>
> > >> Below code should handle the case if we initialize the max_smt_thread_num to 0. I also
> > >> reword the warning messages to match the fact. For heterogeneous SMT topology we still
> > >> support control SMT by on/off toggle but not fully support setting the thread number.
> > >>
> > >> int max_smt_thread_num = 0;
> > >> [...]
> > >> /*
> > >> * This should be a short loop depending on the number of heterogeneous
> > >> * CPU clusters. Typically on a homogeneous system there's only one
> > >> * entry in the XArray.
> > >> */
> > >> xa_for_each(&hetero_cpu, hetero_id, entry) {
> > >> /*
> > >> * If max_smt_thread_num has been initialized and doesn't match
> > >> * the thread number of this entry, then the system has
> > >> * heterogeneous SMT topology.
> > >> */
> > >> if (entry->thread_num != max_smt_thread_num && max_smt_thread_num)
> > >> pr_warn_once("Heterogeneous SMT topology is partly supported by SMT control\n");
> > >
> > > What does 'partly supported' mean here?
> > >
> > > If the SMT control doesn't work as intended for this topology, I don't
> > > think it should be enabled for it.
> > >
> >
> > The /sys/devices/system/cpu/smt/control supports 2 kind of controls [1]
> > (a) enable/disable SMT entirely by writing on/off
> > (b) enable/disable SMT partially by writing a valid thread number (CONFIG_SMT_NUM_THREADS_DYNAMIC)
> >
> > We assume 3 kind of SMT topology:
> > 1. homogeneous SMT topology, all the CPUs support SMT or not
> > 2.1 heterogeneous SMT topology, part of CPU clusters have SMT and others not. Clusters support SMT
> > have the same SMT thread number
> > 2.2 heterogeneous SMT topology, part of CPU clusters have SMT and the thread number may differs
> > (e.g. cluster 1 is of SMT 2 and cluster 2 is of SMT 4. not sure there's such a real system)
> >
> > For any of above SMT topology, control (a) should work as expected. When enabling SMT by writing "on"
> > the SMT is disabled for those CPU clusters who have SMT. Same for disabling SMT by writing "off".
> > But control (b) may not work for case 2.2 since the SMT thread number is not the same across
> > the system.
> >
> > For this series alone we won't met this since CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled.
> > So control (b) only supports write 1/max_thread which behaves same as write off/on and will
> > work as intended for all the 3 cases. As discussed Pierre will add support for
> > CONFIG_SMT_NUM_THREADS_DYNAMIC since thunderX2 is a symmetric SMT4 machine and
> > CONFIG_SMT_NUM_THREADS_DYNAMIC would be useful. We thought a warning should be useful
> > if running on a system of case 2.2.
>
> Thanks for explaining the situation.
>
> So IIUC, for case 2.2 there will be _no_ failures if someone writes a
> value different from 1 or max_threads?
>
> The SMT control code can handle that max_threads isn't the correct
> number of threads for all cores in the system?
Hello,
I suppose a number can be interpreted as 'up to this number'. If that's
what the user wanted is another question, on a hypothetical heterogenous
system with different number of threads in different CPUs it is
questionalble what this achieves.
Arguably once such system is found and the desired configurations
undestood this can be expanded to handle them.
Thanks
Michal
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2024-09-12 14:01 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-06 8:53 [PATCH v5 0/4] Support SMT control on arm64 Yicong Yang
2024-08-06 8:53 ` [PATCH v5 1/4] cpu/SMT: Provide a default topology_is_primary_thread() Yicong Yang
2024-08-16 15:54 ` Dietmar Eggemann
2024-08-19 7:25 ` Yicong Yang
2024-08-06 8:53 ` [PATCH v5 2/4] arch_topology: Support SMT control for OF based system Yicong Yang
2024-08-16 15:55 ` Dietmar Eggemann
2024-08-19 7:18 ` Yicong Yang
2024-08-06 8:53 ` [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI " Yicong Yang
2024-08-16 15:55 ` Dietmar Eggemann
2024-08-19 7:03 ` Yicong Yang
2024-08-22 7:19 ` Dietmar Eggemann
2024-08-27 15:40 ` Pierre Gondois
2024-08-29 7:40 ` Yicong Yang
2024-08-29 12:46 ` Pierre Gondois
2024-08-30 9:35 ` Yicong Yang
2024-09-02 7:43 ` Pierre Gondois
2024-09-03 12:44 ` Yicong Yang
2024-09-05 8:34 ` Pierre Gondois
2024-09-05 12:02 ` Yicong Yang
2024-09-06 7:06 ` Morten Rasmussen
2024-09-06 8:36 ` Yicong Yang
2024-09-12 11:59 ` Morten Rasmussen
2024-09-12 12:53 ` Michal Suchánek
2024-08-06 8:53 ` [PATCH v5 4/4] arm64: Kconfig: Enable HOTPLUG_SMT Yicong Yang
2024-08-27 15:40 ` Pierre Gondois
2024-08-29 6:50 ` Yicong Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).