public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
@ 2026-03-25 18:13 Christian Loehle
  2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Christian Loehle @ 2026-03-25 18:13 UTC (permalink / raw)
  To: arighi
  Cc: peterz, vincent.guittot, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis, Christian Loehle

The scheduler currently handles CPU performance asymmetry via either:

- SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
- SD_ASYM_CPUCAPACITY: capacity-aware scheduling

On arm64, capacity-aware scheduling is used for any detected capacity
differences.

Some systems expose small per-CPU performance differences via CPPC
highest_perf (e.g. due to chip binning), resulting in slightly different
capacities (<~5%). These differences are sufficient to trigger
SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
symmetric.

For such small deltas, capacity-aware scheduling is unnecessarily
complex. A simpler priority-based approach, similar to x86 ITMT, is
sufficient.

This series introduces support for using asymmetric packing in that case:

- derive per-CPU priorities from CPPC highest_perf
- detect when CPUs differ but not enough to form distinct capacity classes
- suppress SD_ASYM_CPUCAPACITY for such domains
- enable SD_ASYM_PACKING and use CPPC-based priority ordering instead

The asympacking flag is exposed at all topology levels; domains with
equal priorities are unaffected, while domains spanning CPUs with
different priorities can honor the ordering.

RFC:
I'm not entirely sure if this is the best way to implement this.
Currently this is baked into CPPC and arm64, while neither are strictly
necessary, we could also use cpu_capacity directly to derive the
ordering and enable this for non-CPPC and/or non-arm64.
RFT:
Andrea, please give this a try. This should perform better in particular
for single-threaded workloads and workloads that do not utilize all
cores (all the time anyway).
Capacity-aware scheduling wakeup works very different to the SMP path
used now, some workloads will benefit, some regress, it would be nice
to get some test results for these.
We already discussed DCPerf MediaWiki seems to benefit from
capacity-aware scheduling wakeup behavior, but others (most?) should
benefit from this series.

I don't know if we can also be clever about ordering amongst SMT siblings.
That would be dependent on the uarch and I don't have a platform to
experiment with this though, so consider this series orthogonal to the
idle-core SMT considerations.
On platforms with SMT though asympacking makes a lot more sense than
capacity-aware scheduling, because arguing about capacity without
considering utilization of the sibling(s) (and the resulting potential
'stolen' capacity we perceive) isn't theoretically sound.

Christian Loehle (3):
  sched/topology: Introduce arch hooks for asympacking
  arch_topology: Export CPPC-based asympacking prios
  arm64/sched: Enable CPPC-based asympacking

 arch/arm64/include/asm/topology.h |  6 +++++
 arch/arm64/kernel/topology.c      | 34 ++++++++++++++++++++++++++
 drivers/base/arch_topology.c      | 40 +++++++++++++++++++++++++++++++
 include/linux/arch_topology.h     | 24 +++++++++++++++++++
 include/linux/sched/topology.h    |  9 +++++++
 kernel/sched/fair.c               | 16 -------------
 kernel/sched/topology.c           | 34 ++++++++++++++++++++------
 7 files changed, 140 insertions(+), 23 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
  2026-03-25 18:13 [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Christian Loehle
@ 2026-03-25 18:13 ` Christian Loehle
  2026-03-26 13:23   ` kernel test robot
                     ` (2 more replies)
  2026-03-25 18:13 ` [PATCH 2/3] arch_topology: Export CPPC-based asympacking prios Christian Loehle
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 23+ messages in thread
From: Christian Loehle @ 2026-03-25 18:13 UTC (permalink / raw)
  To: arighi
  Cc: peterz, vincent.guittot, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis, Christian Loehle

Prepare for arch-specifc asympacking logic.

No functional impact intended.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 include/linux/arch_topology.h  | 24 ++++++++++++++++++++++++
 include/linux/sched/topology.h |  9 +++++++++
 kernel/sched/fair.c            | 16 ----------------
 kernel/sched/topology.c        |  8 ++++++++
 4 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index ebd7f8935f96..3ab571b287ef 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -94,6 +94,11 @@ void remove_cpu_topology(unsigned int cpuid);
 void reset_cpu_topology(void);
 int parse_acpi_topology(void);
 void freq_inv_set_max_ratio(int cpu, u64 max_rate);
+void arch_topology_init_cppc_asym(void);
+
+#ifdef CONFIG_ACPI_CPPC_LIB
+bool topology_init_cppc_asym_packing(int __percpu *priority_var);
+#endif
 
 /*
  * Architectures like ARM64 don't have reliable architectural way to get SMT
@@ -105,10 +110,29 @@ static inline bool topology_core_has_smt(int cpu)
 	return cpu_topology[cpu].thread_id != -1;
 }
 
+#ifdef CONFIG_ARM64
+#undef arch_sched_asym_flags
+#define arch_sched_asym_flags arm64_arch_sched_asym_flags
+int arm64_arch_asym_cpu_priority(int cpu);
+int arm64_arch_sched_asym_flags(void);
+#endif
+
 #else
 
 static inline bool topology_core_has_smt(int cpu) { return false; }
 
 #endif /* CONFIG_GENERIC_ARCH_TOPOLOGY */
 
+/*
+ * Architectures may override this to provide a custom CPU priority for
+ * asymmetric packing.
+ */
+#ifndef arch_asym_cpu_priority
+#define arch_asym_cpu_priority topology_arch_asym_cpu_priority
+static inline int topology_arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+#endif
+
 #endif /* _LINUX_ARCH_TOPOLOGY_H_ */
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 45c0022b91ce..48cfa89df0fc 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -50,6 +50,15 @@ extern const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl,
 extern const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cpu);
 
 extern int arch_asym_cpu_priority(int cpu);
+extern int arch_sched_asym_flags(void);
+
+/*
+ * The margin used when comparing CPU capacities.
+ * is 'cap1' noticeably greater than 'cap2'
+ *
+ * (default: ~5%)
+ */
+#define capacity_greater(cap1, cap2) ((cap1) * 1024 > (cap2) * 1078)
 
 struct sched_domain_attr {
 	int relax_domain_level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..c5f8aa3ad535 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -88,14 +88,6 @@ static int __init setup_sched_thermal_decay_shift(char *str)
 }
 __setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
 
-/*
- * For asym packing, by default the lower numbered CPU has higher priority.
- */
-int __weak arch_asym_cpu_priority(int cpu)
-{
-	return -cpu;
-}
-
 /*
  * The margin used when comparing utilization with CPU capacity.
  *
@@ -103,14 +95,6 @@ int __weak arch_asym_cpu_priority(int cpu)
  */
 #define fits_capacity(cap, max)	((cap) * 1280 < (max) * 1024)
 
-/*
- * The margin used when comparing CPU capacities.
- * is 'cap1' noticeably greater than 'cap2'
- *
- * (default: ~5%)
- */
-#define capacity_greater(cap1, cap2) ((cap1) * 1024 > (cap2) * 1078)
-
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 32dcddaead82..b0c590dfdb01 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1742,6 +1742,14 @@ sd_init(struct sched_domain_topology_level *tl,
 	return sd;
 }
 
+#ifndef arch_sched_asym_flags
+#define arch_sched_asym_flags topology_arch_sched_asym_flags
+static inline int topology_arch_sched_asym_flags(void)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_SCHED_SMT
 int cpu_smt_flags(void)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] arch_topology: Export CPPC-based asympacking prios
  2026-03-25 18:13 [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Christian Loehle
  2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
@ 2026-03-25 18:13 ` Christian Loehle
  2026-03-25 18:13 ` [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking Christian Loehle
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Christian Loehle @ 2026-03-25 18:13 UTC (permalink / raw)
  To: arighi
  Cc: peterz, vincent.guittot, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis, Christian Loehle

Determine asymmetric packing priorities by reading CPPC's
highest_perf so it can be read by arch code later to enable
asympacking if the difference between the minimum and
maximum highest_perf is small (<5%).

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 drivers/base/arch_topology.c | 40 ++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 8c5e47c28d9a..c80c782e5eb2 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -321,8 +321,47 @@ void __weak freq_inv_set_max_ratio(int cpu, u64 max_rate)
 {
 }
 
+void __weak arch_topology_init_cppc_asym(void)
+{
+}
+
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
+#include <linux/limits.h>
+
+/**
+ * topology_init_cppc_asym_packing() - Detect CPPC-based asymmetric packing
+ * @priority_var: Per-CPU variable to store CPU priorities
+ *
+ * Query CPPC highest_perf for all CPUs and determine if asymmetric packing
+ * should be enabled based on minor performance differences (~5% threshold).
+ *
+ * Return: true if asympacking should be enabled, false otherwise
+ */
+bool topology_init_cppc_asym_packing(int __percpu *priority_var)
+{
+	struct cppc_perf_caps perf_caps;
+	u32 max_perf = 0, min_perf = U32_MAX;
+	int cpu;
+
+	if (!acpi_cpc_valid())
+		return false;
+
+	for_each_possible_cpu(cpu) {
+		if (cppc_get_perf_caps(cpu, &perf_caps))
+			return false;
+		if (perf_caps.highest_perf < perf_caps.nominal_perf ||
+		    perf_caps.highest_perf < perf_caps.lowest_perf)
+			return false;
+
+		*per_cpu_ptr(priority_var, cpu) = perf_caps.highest_perf;
+		max_perf = max(max_perf, perf_caps.highest_perf);
+		min_perf = min(min_perf, perf_caps.highest_perf);
+	}
+
+	return (max_perf != min_perf) && !capacity_greater(max_perf, min_perf);
+}
+EXPORT_SYMBOL_GPL(topology_init_cppc_asym_packing);
 
 static inline void topology_init_cpu_capacity_cppc(void)
 {
@@ -369,6 +408,7 @@ static inline void topology_init_cpu_capacity_cppc(void)
 			cpu, topology_get_cpu_scale(cpu));
 	}
 
+	arch_topology_init_cppc_asym();
 	schedule_work(&update_topology_flags_work);
 	pr_debug("cpu_capacity: cpu_capacity initialization done\n");
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking
  2026-03-25 18:13 [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Christian Loehle
  2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
  2026-03-25 18:13 ` [PATCH 2/3] arch_topology: Export CPPC-based asympacking prios Christian Loehle
@ 2026-03-25 18:13 ` Christian Loehle
  2026-03-26 15:47   ` kernel test robot
                     ` (2 more replies)
  2026-03-26  7:53 ` [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Vincent Guittot
  2026-03-26  8:11 ` Andrea Righi
  4 siblings, 3 replies; 23+ messages in thread
From: Christian Loehle @ 2026-03-25 18:13 UTC (permalink / raw)
  To: arighi
  Cc: peterz, vincent.guittot, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis, Christian Loehle

To handle minor capacity differences (<5%) use asym-packing to steer
tasks towards higher performance CPUs, replacing capacity-aware
scheduling in those cases and skip setting SD_ASYM_CPUCAPACITY.
This is implemented by using highest_perf values as priorities to steer
towards.
highest_perf-based asympacking is a global ordering that is applied
at all levels of the hierarchy for now.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 arch/arm64/include/asm/topology.h |  6 ++++++
 arch/arm64/kernel/topology.c      | 34 +++++++++++++++++++++++++++++++
 kernel/sched/topology.c           | 26 ++++++++++++++++-------
 3 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index b9eaf4ad7085..e0b039e1a5bb 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -39,6 +39,12 @@ void update_freq_counters_refs(void);
 #undef arch_cpu_is_threaded
 #define arch_cpu_is_threaded() (read_cpuid_mpidr() & MPIDR_MT_BITMASK)
 
+#undef arch_asym_cpu_priority
+#define arch_asym_cpu_priority arm64_arch_asym_cpu_priority
+#define arch_sched_asym_flags arm64_arch_sched_asym_flags
+extern int arm64_arch_asym_cpu_priority(int cpu);
+extern int arm64_arch_sched_asym_flags(void);
+
 #include <asm-generic/topology.h>
 
 #endif /* _ASM_ARM_TOPOLOGY_H */
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index b32f13358fbb..4e3582d44a26 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -19,6 +19,7 @@
 #include <linux/init.h>
 #include <linux/percpu.h>
 #include <linux/sched/isolation.h>
+#include <linux/sched/topology.h>
 #include <linux/xarray.h>
 
 #include <asm/cpu.h>
@@ -373,6 +374,26 @@ core_initcall(init_amu_fie);
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
 
+static bool __read_mostly sched_cppc_asym_active;
+DEFINE_PER_CPU_READ_MOSTLY(int, sched_cppc_priority);
+
+int arm64_arch_asym_cpu_priority(int cpu)
+{
+	if (!READ_ONCE(sched_cppc_asym_active))
+		return -cpu;
+	return per_cpu(sched_cppc_priority, cpu);
+}
+
+int arm64_arch_sched_asym_flags(void)
+{
+	return READ_ONCE(sched_cppc_asym_active) ? SD_ASYM_PACKING : 0;
+}
+
+void arch_topology_init_cppc_asym(void)
+{
+	WRITE_ONCE(sched_cppc_asym_active, topology_init_cppc_asym_packing(&sched_cppc_priority));
+}
+
 static void cpu_read_corecnt(void *val)
 {
 	/*
@@ -473,4 +494,17 @@ int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
 {
 	return -EOPNOTSUPP;
 }
+
+#else
+int arm64_arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+
+int arm64_arch_sched_asym_flags(void)
+{
+	return 0;
+}
+
+void arch_topology_init_cppc_asym(void) { }
 #endif /* CONFIG_ACPI_CPPC_LIB */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b0c590dfdb01..758b8796b62d 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1396,6 +1396,8 @@ asym_cpu_capacity_classify(const struct cpumask *sd_span,
 			   const struct cpumask *cpu_map)
 {
 	struct asym_cap_data *entry;
+	unsigned long max_cap = 0, min_cap = ULONG_MAX;
+	bool has_cap = false;
 	int count = 0, miss = 0;
 
 	/*
@@ -1405,9 +1407,12 @@ asym_cpu_capacity_classify(const struct cpumask *sd_span,
 	 * skip those.
 	 */
 	list_for_each_entry(entry, &asym_cap_list, link) {
-		if (cpumask_intersects(sd_span, cpu_capacity_span(entry)))
+		if (cpumask_intersects(sd_span, cpu_capacity_span(entry))) {
 			++count;
-		else if (cpumask_intersects(cpu_map, cpu_capacity_span(entry)))
+			max_cap = max(max_cap, entry->capacity);
+			min_cap = min(min_cap, entry->capacity);
+			has_cap = true;
+		} else if (cpumask_intersects(cpu_map, cpu_capacity_span(entry)))
 			++miss;
 	}
 
@@ -1419,10 +1424,12 @@ asym_cpu_capacity_classify(const struct cpumask *sd_span,
 	/* Some of the available CPU capacity values have not been detected */
 	if (miss)
 		return SD_ASYM_CPUCAPACITY;
+	/* When asym packing is active, ignore small capacity differences. */
+	if (arch_sched_asym_flags() && has_cap && !capacity_greater(max_cap, min_cap))
+		return 0;
 
 	/* Full asymmetry */
 	return SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
-
 }
 
 static void free_asym_cap_entry(struct rcu_head *head)
@@ -1753,7 +1760,7 @@ static inline int topology_arch_sched_asym_flags(void)
 #ifdef CONFIG_SCHED_SMT
 int cpu_smt_flags(void)
 {
-	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
+	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC | arch_sched_asym_flags();
 }
 
 const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
@@ -1765,7 +1772,7 @@ const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cp
 #ifdef CONFIG_SCHED_CLUSTER
 int cpu_cluster_flags(void)
 {
-	return SD_CLUSTER | SD_SHARE_LLC;
+	return SD_CLUSTER | SD_SHARE_LLC | arch_sched_asym_flags();
 }
 
 const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
@@ -1777,7 +1784,7 @@ const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cp
 #ifdef CONFIG_SCHED_MC
 int cpu_core_flags(void)
 {
-	return SD_SHARE_LLC;
+	return SD_SHARE_LLC | arch_sched_asym_flags();
 }
 
 const struct cpumask *tl_mc_mask(struct sched_domain_topology_level *tl, int cpu)
@@ -1791,6 +1798,11 @@ const struct cpumask *tl_pkg_mask(struct sched_domain_topology_level *tl, int cp
 	return cpu_node_mask(cpu);
 }
 
+static int cpu_pkg_flags(void)
+{
+	return arch_sched_asym_flags();
+}
+
 /*
  * Topology list, bottom-up.
  */
@@ -1806,7 +1818,7 @@ static struct sched_domain_topology_level default_topology[] = {
 #ifdef CONFIG_SCHED_MC
 	SDTL_INIT(tl_mc_mask, cpu_core_flags, MC),
 #endif
-	SDTL_INIT(tl_pkg_mask, NULL, PKG),
+	SDTL_INIT(tl_pkg_mask, cpu_pkg_flags, PKG),
 	{ NULL, },
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-25 18:13 [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Christian Loehle
                   ` (2 preceding siblings ...)
  2026-03-25 18:13 ` [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking Christian Loehle
@ 2026-03-26  7:53 ` Vincent Guittot
  2026-03-26  8:16   ` Christian Loehle
  2026-03-26  8:20   ` Christian Loehle
  2026-03-26  8:11 ` Andrea Righi
  4 siblings, 2 replies; 23+ messages in thread
From: Vincent Guittot @ 2026-03-26  7:53 UTC (permalink / raw)
  To: Christian Loehle
  Cc: arighi, peterz, dietmar.eggemann, valentin.schneider, mingo,
	rostedt, segall, mgorman, catalin.marinas, will, sudeep.holla,
	rafael, linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
>
> The scheduler currently handles CPU performance asymmetry via either:
>
> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
>
> On arm64, capacity-aware scheduling is used for any detected capacity
> differences.
>
> Some systems expose small per-CPU performance differences via CPPC
> highest_perf (e.g. due to chip binning), resulting in slightly different
> capacities (<~5%). These differences are sufficient to trigger
> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
> symmetric.
>
> For such small deltas, capacity-aware scheduling is unnecessarily
> complex. A simpler priority-based approach, similar to x86 ITMT, is
> sufficient.

I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
move forward.

1st of all, do you target all kind of system or only SMT? It's not
clear in your cover letter

Moving on asym pack for !SMT doesn't make sense to me. If you don't
want EAS enabled, you can disable it with
/proc/sys/kernel/sched_energy_aware

For SMT system and small capacity difference, I would prefer that we
look at supporting SMT in SD_ASYM_CPUCAPACITY. Starting with
select_idle_capacity


>
> This series introduces support for using asymmetric packing in that case:
>
> - derive per-CPU priorities from CPPC highest_perf
> - detect when CPUs differ but not enough to form distinct capacity classes
> - suppress SD_ASYM_CPUCAPACITY for such domains
> - enable SD_ASYM_PACKING and use CPPC-based priority ordering instead
>
> The asympacking flag is exposed at all topology levels; domains with
> equal priorities are unaffected, while domains spanning CPUs with
> different priorities can honor the ordering.
>
> RFC:
> I'm not entirely sure if this is the best way to implement this.
> Currently this is baked into CPPC and arm64, while neither are strictly
> necessary, we could also use cpu_capacity directly to derive the
> ordering and enable this for non-CPPC and/or non-arm64.
> RFT:
> Andrea, please give this a try. This should perform better in particular
> for single-threaded workloads and workloads that do not utilize all
> cores (all the time anyway).
> Capacity-aware scheduling wakeup works very different to the SMP path
> used now, some workloads will benefit, some regress, it would be nice
> to get some test results for these.
> We already discussed DCPerf MediaWiki seems to benefit from
> capacity-aware scheduling wakeup behavior, but others (most?) should
> benefit from this series.
>
> I don't know if we can also be clever about ordering amongst SMT siblings.
> That would be dependent on the uarch and I don't have a platform to
> experiment with this though, so consider this series orthogonal to the
> idle-core SMT considerations.
> On platforms with SMT though asympacking makes a lot more sense than
> capacity-aware scheduling, because arguing about capacity without
> considering utilization of the sibling(s) (and the resulting potential
> 'stolen' capacity we perceive) isn't theoretically sound.
>
> Christian Loehle (3):
>   sched/topology: Introduce arch hooks for asympacking
>   arch_topology: Export CPPC-based asympacking prios
>   arm64/sched: Enable CPPC-based asympacking
>
>  arch/arm64/include/asm/topology.h |  6 +++++
>  arch/arm64/kernel/topology.c      | 34 ++++++++++++++++++++++++++
>  drivers/base/arch_topology.c      | 40 +++++++++++++++++++++++++++++++
>  include/linux/arch_topology.h     | 24 +++++++++++++++++++
>  include/linux/sched/topology.h    |  9 +++++++
>  kernel/sched/fair.c               | 16 -------------
>  kernel/sched/topology.c           | 34 ++++++++++++++++++++------
>  7 files changed, 140 insertions(+), 23 deletions(-)
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-25 18:13 [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Christian Loehle
                   ` (3 preceding siblings ...)
  2026-03-26  7:53 ` [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Vincent Guittot
@ 2026-03-26  8:11 ` Andrea Righi
  2026-03-26  8:20   ` Vincent Guittot
  4 siblings, 1 reply; 23+ messages in thread
From: Andrea Righi @ 2026-03-26  8:11 UTC (permalink / raw)
  To: Christian Loehle
  Cc: peterz, vincent.guittot, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis

Hi Christian,

On Wed, Mar 25, 2026 at 06:13:11PM +0000, Christian Loehle wrote:
...
> RFT:
> Andrea, please give this a try. This should perform better in particular
> for single-threaded workloads and workloads that do not utilize all
> cores (all the time anyway).
> Capacity-aware scheduling wakeup works very different to the SMP path
> used now, some workloads will benefit, some regress, it would be nice
> to get some test results for these.
> We already discussed DCPerf MediaWiki seems to benefit from
> capacity-aware scheduling wakeup behavior, but others (most?) should
> benefit from this series.
> 
> I don't know if we can also be clever about ordering amongst SMT siblings.
> That would be dependent on the uarch and I don't have a platform to
> experiment with this though, so consider this series orthogonal to the
> idle-core SMT considerations.
> On platforms with SMT though asympacking makes a lot more sense than
> capacity-aware scheduling, because arguing about capacity without
> considering utilization of the sibling(s) (and the resulting potential
> 'stolen' capacity we perceive) isn't theoretically sound.

I did some early testing with this patch set. On Vera I'm getting much
better performance that SD_ASYM_CPUCAPACITY of course (~1.5x avg speedup),
mostly because we avoid using both SMT siblings. It's still not the same
improvement that I get equalizing the capacity using the 5% threshold
(~1.8x speedup).

Of course I need to test with more workloads and I haven't tested it on
Grace yet, to check if we're regressing something, but in general it seems
functional.

Now it depends if SD_ASYM_PACKING is the route we want to take or if we
should start addressing SMT in SD_ASYM_CPUCAPACITY, as pointed by Vincent.
In general I think I agree with Vincent, independently on this particular
case, it'd be nice to start improving SD_ASYM_CPUCAPACITY to support SMT.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  7:53 ` [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Vincent Guittot
@ 2026-03-26  8:16   ` Christian Loehle
  2026-03-26  8:24     ` Vincent Guittot
  2026-03-26  8:20   ` Christian Loehle
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Loehle @ 2026-03-26  8:16 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: arighi, peterz, dietmar.eggemann, valentin.schneider, mingo,
	rostedt, segall, mgorman, catalin.marinas, will, sudeep.holla,
	rafael, linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On 3/26/26 07:53, Vincent Guittot wrote:
> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
>>
>> The scheduler currently handles CPU performance asymmetry via either:
>>
>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
>>
>> On arm64, capacity-aware scheduling is used for any detected capacity
>> differences.
>>
>> Some systems expose small per-CPU performance differences via CPPC
>> highest_perf (e.g. due to chip binning), resulting in slightly different
>> capacities (<~5%). These differences are sufficient to trigger
>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
>> symmetric.
>>
>> For such small deltas, capacity-aware scheduling is unnecessarily
>> complex. A simpler priority-based approach, similar to x86 ITMT, is
>> sufficient.
> 
> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
> move forward.
> t
> 1st of all, do you target all kind of system or only SMT? It's not
> clear in your cover letter

AFAIK only Andrea has access to an unreleased asymmetric SMT system,
I haven't done any tests on such a system (as the cover-letter mentions
under RFT section).

> 
> Moving on asym pack for !SMT doesn't make sense to me. If you don't
> want EAS enabled, you can disable it with
> /proc/sys/kernel/sched_energy_aware

Sorry, what's EAS got to do with it? The system I care about here
(primarily nvidia grace) has no EM.

> 
> For SMT system and small capacity difference, I would prefer that we
> look at supporting SMT in SD_ASYM_CPUCAPACITY. Starting with
> select_idle_capacity

This series is actually targeted for primarily the !SMT case, although
it may or may not be useful for some of the SMT woes, too!
(Again, I wouldn't know, I don't have such a system to test with)

>[snip]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  7:53 ` [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Vincent Guittot
  2026-03-26  8:16   ` Christian Loehle
@ 2026-03-26  8:20   ` Christian Loehle
  1 sibling, 0 replies; 23+ messages in thread
From: Christian Loehle @ 2026-03-26  8:20 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: arighi, peterz, dietmar.eggemann, valentin.schneider, mingo,
	rostedt, segall, mgorman, catalin.marinas, will, sudeep.holla,
	rafael, linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On 3/26/26 07:53, Vincent Guittot wrote:
> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
>>
>> The scheduler currently handles CPU performance asymmetry via either:
>>
>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
>>
>> On arm64, capacity-aware scheduling is used for any detected capacity
>> differences.
>>
>> Some systems expose small per-CPU performance differences via CPPC
>> highest_perf (e.g. due to chip binning), resulting in slightly different
>> capacities (<~5%). These differences are sufficient to trigger
>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
>> symmetric.
>>
>> For such small deltas, capacity-aware scheduling is unnecessarily
>> complex. A simpler priority-based approach, similar to x86 ITMT, is
>> sufficient.
> 
> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
> move forward.
> 
> 1st of all, do you target all kind of system or only SMT? It's not
> clear in your cover letter
> 
> Moving on asym pack for !SMT doesn't make sense to me. If you don't
> want EAS enabled, you can disable it with
> /proc/sys/kernel/sched_energy_aware
> 
> For SMT system and small capacity difference, I would prefer that we
> look at supporting SMT in SD_ASYM_CPUCAPACITY. Starting with
> select_idle_capacity

Quoting the cover letter below, I don't think SMT + SD_ASYM_CPUCAPACITY
can ever be theoretically sound and the results will become so wildly
different on a per-platform / uArch + workload basis, I'm not convinced
something useful would come out of it, but I'd be keen to see some
experiments on this.
IME a busy sibling steals so much more capacity than the difference I care
about here (<5%, busy SMT sibling is often 20-30%, sometimes up to 50% but
entirely dependent on workload and uarch as I've mentioned).
In any case, this series isn't (primarily) for SMT systems...

> [snip]
>> On platforms with SMT though asympacking makes a lot more sense than
>> capacity-aware scheduling, because arguing about capacity without
>> considering utilization of the sibling(s) (and the resulting potential
>> 'stolen' capacity we perceive) isn't theoretically sound.
>>
>> Christian Loehle (3):
>>   sched/topology: Introduce arch hooks for asympacking
>>   arch_topology: Export CPPC-based asympacking prios
>>   arm64/sched: Enable CPPC-based asympacking
>>
>>  arch/arm64/include/asm/topology.h |  6 +++++
>>  arch/arm64/kernel/topology.c      | 34 ++++++++++++++++++++++++++
>>  drivers/base/arch_topology.c      | 40 +++++++++++++++++++++++++++++++
>>  include/linux/arch_topology.h     | 24 +++++++++++++++++++
>>  include/linux/sched/topology.h    |  9 +++++++
>>  kernel/sched/fair.c               | 16 -------------
>>  kernel/sched/topology.c           | 34 ++++++++++++++++++++------
>>  7 files changed, 140 insertions(+), 23 deletions(-)
>>
>> --
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  8:11 ` Andrea Righi
@ 2026-03-26  8:20   ` Vincent Guittot
  2026-03-26  9:15     ` Andrea Righi
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2026-03-26  8:20 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Christian Loehle, peterz, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis

On Thu, 26 Mar 2026 at 09:12, Andrea Righi <arighi@nvidia.com> wrote:
>
> Hi Christian,
>
> On Wed, Mar 25, 2026 at 06:13:11PM +0000, Christian Loehle wrote:
> ...
> > RFT:
> > Andrea, please give this a try. This should perform better in particular
> > for single-threaded workloads and workloads that do not utilize all
> > cores (all the time anyway).
> > Capacity-aware scheduling wakeup works very different to the SMP path
> > used now, some workloads will benefit, some regress, it would be nice
> > to get some test results for these.
> > We already discussed DCPerf MediaWiki seems to benefit from
> > capacity-aware scheduling wakeup behavior, but others (most?) should
> > benefit from this series.
> >
> > I don't know if we can also be clever about ordering amongst SMT siblings.
> > That would be dependent on the uarch and I don't have a platform to
> > experiment with this though, so consider this series orthogonal to the
> > idle-core SMT considerations.
> > On platforms with SMT though asympacking makes a lot more sense than
> > capacity-aware scheduling, because arguing about capacity without
> > considering utilization of the sibling(s) (and the resulting potential
> > 'stolen' capacity we perceive) isn't theoretically sound.
>
> I did some early testing with this patch set. On Vera I'm getting much
> better performance that SD_ASYM_CPUCAPACITY of course (~1.5x avg speedup),
> mostly because we avoid using both SMT siblings. It's still not the same
> improvement that I get equalizing the capacity using the 5% threshold
> (~1.8x speedup).

IIRC the tests that you shared in your patch, you get an additonal
improvement when adding some SMT awarness to SD_ASYM_CPUCAPACITY
compared to equalizing the capacity

>
> Of course I need to test with more workloads and I haven't tested it on
> Grace yet, to check if we're regressing something, but in general it seems
> functional.
>
> Now it depends if SD_ASYM_PACKING is the route we want to take or if we
> should start addressing SMT in SD_ASYM_CPUCAPACITY, as pointed by Vincent.
> In general I think I agree with Vincent, independently on this particular
> case, it'd be nice to start improving SD_ASYM_CPUCAPACITY to support SMT.
>
> Thanks,
> -Andrea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  8:16   ` Christian Loehle
@ 2026-03-26  8:24     ` Vincent Guittot
  2026-03-26  9:24       ` Christian Loehle
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2026-03-26  8:24 UTC (permalink / raw)
  To: Christian Loehle
  Cc: arighi, peterz, dietmar.eggemann, valentin.schneider, mingo,
	rostedt, segall, mgorman, catalin.marinas, will, sudeep.holla,
	rafael, linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On Thu, 26 Mar 2026 at 09:16, Christian Loehle <christian.loehle@arm.com> wrote:
>
> On 3/26/26 07:53, Vincent Guittot wrote:
> > On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
> >>
> >> The scheduler currently handles CPU performance asymmetry via either:
> >>
> >> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
> >> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
> >>
> >> On arm64, capacity-aware scheduling is used for any detected capacity
> >> differences.
> >>
> >> Some systems expose small per-CPU performance differences via CPPC
> >> highest_perf (e.g. due to chip binning), resulting in slightly different
> >> capacities (<~5%). These differences are sufficient to trigger
> >> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
> >> symmetric.
> >>
> >> For such small deltas, capacity-aware scheduling is unnecessarily
> >> complex. A simpler priority-based approach, similar to x86 ITMT, is
> >> sufficient.
> >
> > I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
> > move forward.
> > t
> > 1st of all, do you target all kind of system or only SMT? It's not
> > clear in your cover letter
>
> AFAIK only Andrea has access to an unreleased asymmetric SMT system,
> I haven't done any tests on such a system (as the cover-letter mentions
> under RFT section).
>
> >
> > Moving on asym pack for !SMT doesn't make sense to me. If you don't
> > want EAS enabled, you can disable it with
> > /proc/sys/kernel/sched_energy_aware
>
> Sorry, what's EAS got to do with it? The system I care about here
> (primarily nvidia grace) has no EM.

I tried to understand the end goal of this patch

SD_ASYM_CPUCAPACITY works fine with !SMT system so why enabling
SD_ASYM_PACKING for <5% diff ?

That doesn't make sense to me

>
> >
> > For SMT system and small capacity difference, I would prefer that we
> > look at supporting SMT in SD_ASYM_CPUCAPACITY. Starting with
> > select_idle_capacity
>
> This series is actually targeted for primarily the !SMT case, although
> it may or may not be useful for some of the SMT woes, too!
> (Again, I wouldn't know, I don't have such a system to test with)
>
> >[snip]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  8:20   ` Vincent Guittot
@ 2026-03-26  9:15     ` Andrea Righi
  0 siblings, 0 replies; 23+ messages in thread
From: Andrea Righi @ 2026-03-26  9:15 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Christian Loehle, peterz, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis

On Thu, Mar 26, 2026 at 09:20:45AM +0100, Vincent Guittot wrote:
> On Thu, 26 Mar 2026 at 09:12, Andrea Righi <arighi@nvidia.com> wrote:
> >
> > Hi Christian,
> >
> > On Wed, Mar 25, 2026 at 06:13:11PM +0000, Christian Loehle wrote:
> > ...
> > > RFT:
> > > Andrea, please give this a try. This should perform better in particular
> > > for single-threaded workloads and workloads that do not utilize all
> > > cores (all the time anyway).
> > > Capacity-aware scheduling wakeup works very different to the SMP path
> > > used now, some workloads will benefit, some regress, it would be nice
> > > to get some test results for these.
> > > We already discussed DCPerf MediaWiki seems to benefit from
> > > capacity-aware scheduling wakeup behavior, but others (most?) should
> > > benefit from this series.
> > >
> > > I don't know if we can also be clever about ordering amongst SMT siblings.
> > > That would be dependent on the uarch and I don't have a platform to
> > > experiment with this though, so consider this series orthogonal to the
> > > idle-core SMT considerations.
> > > On platforms with SMT though asympacking makes a lot more sense than
> > > capacity-aware scheduling, because arguing about capacity without
> > > considering utilization of the sibling(s) (and the resulting potential
> > > 'stolen' capacity we perceive) isn't theoretically sound.
> >
> > I did some early testing with this patch set. On Vera I'm getting much
> > better performance that SD_ASYM_CPUCAPACITY of course (~1.5x avg speedup),
> > mostly because we avoid using both SMT siblings. It's still not the same
> > improvement that I get equalizing the capacity using the 5% threshold
> > (~1.8x speedup).
> 
> IIRC the tests that you shared in your patch, you get an additonal
> improvement when adding some SMT awarness to SD_ASYM_CPUCAPACITY
> compared to equalizing the capacity

Yes, adding SMT awareness to SD_ASYM_CPUCAPACITY is still the apparoach
that gives me the best performance so far on Vera (~1.9x avg speedup),
among all those that I've tested.

I'll post the updated patch set that I'm using, so we can also elaborate
more on that approach as well.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  8:24     ` Vincent Guittot
@ 2026-03-26  9:24       ` Christian Loehle
  2026-03-26 13:04         ` Vincent Guittot
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Loehle @ 2026-03-26  9:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: arighi, peterz, dietmar.eggemann, valentin.schneider, mingo,
	rostedt, segall, mgorman, catalin.marinas, will, sudeep.holla,
	rafael, linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On 3/26/26 08:24, Vincent Guittot wrote:
> On Thu, 26 Mar 2026 at 09:16, Christian Loehle <christian.loehle@arm.com> wrote:
>>
>> On 3/26/26 07:53, Vincent Guittot wrote:
>>> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
>>>>
>>>> The scheduler currently handles CPU performance asymmetry via either:
>>>>
>>>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
>>>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
>>>>
>>>> On arm64, capacity-aware scheduling is used for any detected capacity
>>>> differences.
>>>>
>>>> Some systems expose small per-CPU performance differences via CPPC
>>>> highest_perf (e.g. due to chip binning), resulting in slightly different
>>>> capacities (<~5%). These differences are sufficient to trigger
>>>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
>>>> symmetric.
>>>>
>>>> For such small deltas, capacity-aware scheduling is unnecessarily
>>>> complex. A simpler priority-based approach, similar to x86 ITMT, is
>>>> sufficient.
>>>
>>> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
>>> move forward.
>>> t
>>> 1st of all, do you target all kind of system or only SMT? It's not
>>> clear in your cover letter
>>
>> AFAIK only Andrea has access to an unreleased asymmetric SMT system,
>> I haven't done any tests on such a system (as the cover-letter mentions
>> under RFT section).
>>
>>>
>>> Moving on asym pack for !SMT doesn't make sense to me. If you don't
>>> want EAS enabled, you can disable it with
>>> /proc/sys/kernel/sched_energy_aware
>>
>> Sorry, what's EAS got to do with it? The system I care about here
>> (primarily nvidia grace) has no EM.
> 
> I tried to understand the end goal of this patch
> 
> SD_ASYM_CPUCAPACITY works fine with !SMT system so why enabling
> SD_ASYM_PACKING for <5% diff ?
> 
> That doesn't make sense to me
I don't know if "works fine" describes the situation accurately.
I guess I should've included the context in the cover letter, but you
are aware of them (you've replied to them anyway):
https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com/
https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com/

Andrea sees an improvement even when force-equalizing CPUs to remove
SD_ASYM_CPUCAPACITY, so I'd argue it doesn't "work fine" on these platforms.
To me it seems more reasonable to attempt to get these minor improvements
of minor asymmetries through asympacking and leave SD_ASYM_CPUCAPACITY
to the actual 'true' asymmetry (e.g. different uArch or vastly different
performance levels).
SD_ASYM_CPUCAPACITY handling is also arguably broken if no CPU pair in
the system fulfills capacity_greater(), the call sites in fair.c give
a good overview.
Is $subject the right approach to deal with these platforms instead?
I don't know, that's why it's marked RFC and RFT.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26  9:24       ` Christian Loehle
@ 2026-03-26 13:04         ` Vincent Guittot
  2026-03-26 13:45           ` Andrea Righi
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2026-03-26 13:04 UTC (permalink / raw)
  To: Christian Loehle
  Cc: arighi, peterz, dietmar.eggemann, valentin.schneider, mingo,
	rostedt, segall, mgorman, catalin.marinas, will, sudeep.holla,
	rafael, linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On Thu, 26 Mar 2026 at 10:24, Christian Loehle <christian.loehle@arm.com> wrote:
>
> On 3/26/26 08:24, Vincent Guittot wrote:
> > On Thu, 26 Mar 2026 at 09:16, Christian Loehle <christian.loehle@arm.com> wrote:
> >>
> >> On 3/26/26 07:53, Vincent Guittot wrote:
> >>> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
> >>>>
> >>>> The scheduler currently handles CPU performance asymmetry via either:
> >>>>
> >>>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
> >>>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
> >>>>
> >>>> On arm64, capacity-aware scheduling is used for any detected capacity
> >>>> differences.
> >>>>
> >>>> Some systems expose small per-CPU performance differences via CPPC
> >>>> highest_perf (e.g. due to chip binning), resulting in slightly different
> >>>> capacities (<~5%). These differences are sufficient to trigger
> >>>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
> >>>> symmetric.
> >>>>
> >>>> For such small deltas, capacity-aware scheduling is unnecessarily
> >>>> complex. A simpler priority-based approach, similar to x86 ITMT, is
> >>>> sufficient.
> >>>
> >>> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
> >>> move forward.
> >>> t
> >>> 1st of all, do you target all kind of system or only SMT? It's not
> >>> clear in your cover letter
> >>
> >> AFAIK only Andrea has access to an unreleased asymmetric SMT system,
> >> I haven't done any tests on such a system (as the cover-letter mentions
> >> under RFT section).
> >>
> >>>
> >>> Moving on asym pack for !SMT doesn't make sense to me. If you don't
> >>> want EAS enabled, you can disable it with
> >>> /proc/sys/kernel/sched_energy_aware
> >>
> >> Sorry, what's EAS got to do with it? The system I care about here
> >> (primarily nvidia grace) has no EM.
> >
> > I tried to understand the end goal of this patch
> >
> > SD_ASYM_CPUCAPACITY works fine with !SMT system so why enabling
> > SD_ASYM_PACKING for <5% diff ?
> >
> > That doesn't make sense to me
> I don't know if "works fine" describes the situation accurately.
> I guess I should've included the context in the cover letter, but you
> are aware of them (you've replied to them anyway):
> https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com/
> https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com/
>
> Andrea sees an improvement even when force-equalizing CPUs to remove
> SD_ASYM_CPUCAPACITY, so I'd argue it doesn't "work fine" on these platforms.

IIUC this was for SMT systems not for !SMT ones but I might have
missed some emails in the thread.

> To me it seems more reasonable to attempt to get these minor improvements
> of minor asymmetries through asympacking and leave SD_ASYM_CPUCAPACITY
> to the actual 'true' asymmetry (e.g. different uArch or vastly different
> performance levels).
> SD_ASYM_CPUCAPACITY handling is also arguably broken if no CPU pair in
> the system fulfills capacity_greater(), the call sites in fair.c give
> a good overview.
> Is $subject the right approach to deal with these platforms instead?
> I don't know, that's why it's marked RFC and RFT.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
  2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
@ 2026-03-26 13:23   ` kernel test robot
  2026-03-26 15:26   ` kernel test robot
  2026-03-26 16:40   ` kernel test robot
  2 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-26 13:23 UTC (permalink / raw)
  To: Christian Loehle, arighi
  Cc: llvm, oe-kbuild-all, peterz, vincent.guittot, dietmar.eggemann,
	valentin.schneider, mingo, rostedt, segall, mgorman,
	catalin.marinas, will, sudeep.holla, rafael, linux-pm,
	linux-kernel, juri.lelli, kobak, fabecassis, Christian Loehle

Hi Christian,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on arm64/for-next/core driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus peterz-queue/sched/core linus/master v7.0-rc5 next-20260325]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christian-Loehle/sched-topology-Introduce-arch-hooks-for-asympacking/20260326-145644
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260325181314.3875909-2-christian.loehle%40arm.com
patch subject: [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260326/202603261419.mAKkKckS-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260326/202603261419.mAKkKckS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603261419.mAKkKckS-lkp@intel.com/

All errors (new ones prefixed by >>):

>> arch/x86/kernel/itmt.c:168:5: error: redefinition of 'topology_arch_asym_cpu_priority'
     168 | int arch_asym_cpu_priority(int cpu)
         |     ^
   include/linux/arch_topology.h:131:32: note: expanded from macro 'arch_asym_cpu_priority'
     131 | #define arch_asym_cpu_priority topology_arch_asym_cpu_priority
         |                                ^
   include/linux/arch_topology.h:132:19: note: previous definition is here
     132 | static inline int topology_arch_asym_cpu_priority(int cpu)
         |                   ^
   1 error generated.


vim +/topology_arch_asym_cpu_priority +168 arch/x86/kernel/itmt.c

5e76b2ab36b40c Tim Chen 2016-11-22  167  
5e76b2ab36b40c Tim Chen 2016-11-22 @168  int arch_asym_cpu_priority(int cpu)
5e76b2ab36b40c Tim Chen 2016-11-22  169  {
5e76b2ab36b40c Tim Chen 2016-11-22  170  	return per_cpu(sched_core_priority, cpu);
5e76b2ab36b40c Tim Chen 2016-11-22  171  }
5e76b2ab36b40c Tim Chen 2016-11-22  172  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26 13:04         ` Vincent Guittot
@ 2026-03-26 13:45           ` Andrea Righi
  2026-03-26 15:55             ` Christian Loehle
  0 siblings, 1 reply; 23+ messages in thread
From: Andrea Righi @ 2026-03-26 13:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Christian Loehle, peterz, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis

On Thu, Mar 26, 2026 at 02:04:42PM +0100, Vincent Guittot wrote:
> On Thu, 26 Mar 2026 at 10:24, Christian Loehle <christian.loehle@arm.com> wrote:
> >
> > On 3/26/26 08:24, Vincent Guittot wrote:
> > > On Thu, 26 Mar 2026 at 09:16, Christian Loehle <christian.loehle@arm.com> wrote:
> > >>
> > >> On 3/26/26 07:53, Vincent Guittot wrote:
> > >>> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
> > >>>>
> > >>>> The scheduler currently handles CPU performance asymmetry via either:
> > >>>>
> > >>>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
> > >>>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
> > >>>>
> > >>>> On arm64, capacity-aware scheduling is used for any detected capacity
> > >>>> differences.
> > >>>>
> > >>>> Some systems expose small per-CPU performance differences via CPPC
> > >>>> highest_perf (e.g. due to chip binning), resulting in slightly different
> > >>>> capacities (<~5%). These differences are sufficient to trigger
> > >>>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
> > >>>> symmetric.
> > >>>>
> > >>>> For such small deltas, capacity-aware scheduling is unnecessarily
> > >>>> complex. A simpler priority-based approach, similar to x86 ITMT, is
> > >>>> sufficient.
> > >>>
> > >>> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
> > >>> move forward.
> > >>> t
> > >>> 1st of all, do you target all kind of system or only SMT? It's not
> > >>> clear in your cover letter
> > >>
> > >> AFAIK only Andrea has access to an unreleased asymmetric SMT system,
> > >> I haven't done any tests on such a system (as the cover-letter mentions
> > >> under RFT section).
> > >>
> > >>>
> > >>> Moving on asym pack for !SMT doesn't make sense to me. If you don't
> > >>> want EAS enabled, you can disable it with
> > >>> /proc/sys/kernel/sched_energy_aware
> > >>
> > >> Sorry, what's EAS got to do with it? The system I care about here
> > >> (primarily nvidia grace) has no EM.
> > >
> > > I tried to understand the end goal of this patch
> > >
> > > SD_ASYM_CPUCAPACITY works fine with !SMT system so why enabling
> > > SD_ASYM_PACKING for <5% diff ?
> > >
> > > That doesn't make sense to me
> > I don't know if "works fine" describes the situation accurately.
> > I guess I should've included the context in the cover letter, but you
> > are aware of them (you've replied to them anyway):
> > https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com/
> > https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com/
> >
> > Andrea sees an improvement even when force-equalizing CPUs to remove
> > SD_ASYM_CPUCAPACITY, so I'd argue it doesn't "work fine" on these platforms.
> 
> IIUC this was for SMT systems not for !SMT ones but I might have
> missed some emails in the thread.

Right, the issue I'm trying to solve is SD_ASYM_CPUCAPACITY + SMT. Removing
SD_ASYM_CPUCAPACITY from the equation fixes my issue, because we fall back
into the regular idle CPU selection policy, which avoids allocating both
SMT siblings when possible.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
  2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
  2026-03-26 13:23   ` kernel test robot
@ 2026-03-26 15:26   ` kernel test robot
  2026-03-26 16:40   ` kernel test robot
  2 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-26 15:26 UTC (permalink / raw)
  To: Christian Loehle, arighi
  Cc: oe-kbuild-all, peterz, vincent.guittot, dietmar.eggemann,
	valentin.schneider, mingo, rostedt, segall, mgorman,
	catalin.marinas, will, sudeep.holla, rafael, linux-pm,
	linux-kernel, juri.lelli, kobak, fabecassis, Christian Loehle

Hi Christian,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on arm64/for-next/core driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus peterz-queue/sched/core next-20260325]
[cannot apply to linus/master v6.16-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christian-Loehle/sched-topology-Introduce-arch-hooks-for-asympacking/20260326-145644
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260325181314.3875909-2-christian.loehle%40arm.com
patch subject: [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260326/202603261609.fazAo3mx-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260326/202603261609.fazAo3mx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603261609.fazAo3mx-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/topology.h:30,
                    from include/linux/sched/topology.h:5,
                    from include/linux/cpuset.h:13,
                    from arch/x86/kernel/itmt.c:21:
>> include/linux/arch_topology.h:131:32: error: redefinition of 'topology_arch_asym_cpu_priority'
     131 | #define arch_asym_cpu_priority topology_arch_asym_cpu_priority
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/kernel/itmt.c:168:5: note: in expansion of macro 'arch_asym_cpu_priority'
     168 | int arch_asym_cpu_priority(int cpu)
         |     ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/arch_topology.h:132:19: note: previous definition of 'topology_arch_asym_cpu_priority' with type 'int(int)'
     132 | static inline int topology_arch_asym_cpu_priority(int cpu)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/topology_arch_asym_cpu_priority +131 include/linux/arch_topology.h

   125	
   126	/*
   127	 * Architectures may override this to provide a custom CPU priority for
   128	 * asymmetric packing.
   129	 */
   130	#ifndef arch_asym_cpu_priority
 > 131	#define arch_asym_cpu_priority topology_arch_asym_cpu_priority
   132	static inline int topology_arch_asym_cpu_priority(int cpu)
   133	{
   134		return -cpu;
   135	}
   136	#endif
   137	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking
  2026-03-25 18:13 ` [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking Christian Loehle
@ 2026-03-26 15:47   ` kernel test robot
  2026-03-26 15:47   ` kernel test robot
  2026-03-27 15:44   ` Valentin Schneider
  2 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-26 15:47 UTC (permalink / raw)
  To: Christian Loehle, arighi
  Cc: oe-kbuild-all, peterz, vincent.guittot, dietmar.eggemann,
	valentin.schneider, mingo, rostedt, segall, mgorman,
	catalin.marinas, will, sudeep.holla, rafael, linux-pm,
	linux-kernel, juri.lelli, kobak, fabecassis, Christian Loehle

Hi Christian,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on arm64/for-next/core driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus peterz-queue/sched/core linus/master v7.0-rc5 next-20260325]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christian-Loehle/sched-topology-Introduce-arch-hooks-for-asympacking/20260326-145644
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260325181314.3875909-4-christian.loehle%40arm.com
patch subject: [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20260326/202603262311.bSaX71dF-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260326/202603262311.bSaX71dF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603262311.bSaX71dF-lkp@intel.com/

All errors (new ones prefixed by >>):

   or1k-linux-ld: kernel/sched/build_utility.o: in function `sd_init.constprop.0':
   build_utility.c:(.text+0x1840): undefined reference to `arch_sched_asym_flags'
>> build_utility.c:(.text+0x1840): relocation truncated to fit: R_OR1K_INSN_REL_26 against undefined symbol `arch_sched_asym_flags'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking
  2026-03-25 18:13 ` [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking Christian Loehle
  2026-03-26 15:47   ` kernel test robot
@ 2026-03-26 15:47   ` kernel test robot
  2026-03-27 15:44   ` Valentin Schneider
  2 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-26 15:47 UTC (permalink / raw)
  To: Christian Loehle, arighi
  Cc: oe-kbuild-all, peterz, vincent.guittot, dietmar.eggemann,
	valentin.schneider, mingo, rostedt, segall, mgorman,
	catalin.marinas, will, sudeep.holla, rafael, linux-pm,
	linux-kernel, juri.lelli, kobak, fabecassis, Christian Loehle

Hi Christian,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on arm64/for-next/core driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus peterz-queue/sched/core linus/master v7.0-rc5 next-20260325]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christian-Loehle/sched-topology-Introduce-arch-hooks-for-asympacking/20260326-145644
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260325181314.3875909-4-christian.loehle%40arm.com
patch subject: [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking
config: parisc-allnoconfig (https://download.01.org/0day-ci/archive/20260326/202603262307.63Wed8OI-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260326/202603262307.63Wed8OI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603262307.63Wed8OI-lkp@intel.com/

All errors (new ones prefixed by >>):

   hppa-linux-ld: kernel/sched/build_utility.o: in function `sd_init.constprop.0':
>> (.text+0x17b8): undefined reference to `arch_sched_asym_flags'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26 13:45           ` Andrea Righi
@ 2026-03-26 15:55             ` Christian Loehle
  2026-03-26 16:00               ` Andrea Righi
  2026-03-27  9:53               ` Andrea Righi
  0 siblings, 2 replies; 23+ messages in thread
From: Christian Loehle @ 2026-03-26 15:55 UTC (permalink / raw)
  To: Andrea Righi, Vincent Guittot
  Cc: peterz, dietmar.eggemann, valentin.schneider, mingo, rostedt,
	segall, mgorman, catalin.marinas, will, sudeep.holla, rafael,
	linux-pm, linux-kernel, juri.lelli, kobak, fabecassis

On 3/26/26 13:45, Andrea Righi wrote:
> On Thu, Mar 26, 2026 at 02:04:42PM +0100, Vincent Guittot wrote:
>> On Thu, 26 Mar 2026 at 10:24, Christian Loehle <christian.loehle@arm.com> wrote:
>>>
>>> On 3/26/26 08:24, Vincent Guittot wrote:
>>>> On Thu, 26 Mar 2026 at 09:16, Christian Loehle <christian.loehle@arm.com> wrote:
>>>>>
>>>>> On 3/26/26 07:53, Vincent Guittot wrote:
>>>>>> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
>>>>>>>
>>>>>>> The scheduler currently handles CPU performance asymmetry via either:
>>>>>>>
>>>>>>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
>>>>>>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
>>>>>>>
>>>>>>> On arm64, capacity-aware scheduling is used for any detected capacity
>>>>>>> differences.
>>>>>>>
>>>>>>> Some systems expose small per-CPU performance differences via CPPC
>>>>>>> highest_perf (e.g. due to chip binning), resulting in slightly different
>>>>>>> capacities (<~5%). These differences are sufficient to trigger
>>>>>>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
>>>>>>> symmetric.
>>>>>>>
>>>>>>> For such small deltas, capacity-aware scheduling is unnecessarily
>>>>>>> complex. A simpler priority-based approach, similar to x86 ITMT, is
>>>>>>> sufficient.
>>>>>>
>>>>>> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
>>>>>> move forward.
>>>>>> t
>>>>>> 1st of all, do you target all kind of system or only SMT? It's not
>>>>>> clear in your cover letter
>>>>>
>>>>> AFAIK only Andrea has access to an unreleased asymmetric SMT system,
>>>>> I haven't done any tests on such a system (as the cover-letter mentions
>>>>> under RFT section).
>>>>>
>>>>>>
>>>>>> Moving on asym pack for !SMT doesn't make sense to me. If you don't
>>>>>> want EAS enabled, you can disable it with
>>>>>> /proc/sys/kernel/sched_energy_aware
>>>>>
>>>>> Sorry, what's EAS got to do with it? The system I care about here
>>>>> (primarily nvidia grace) has no EM.
>>>>
>>>> I tried to understand the end goal of this patch
>>>>
>>>> SD_ASYM_CPUCAPACITY works fine with !SMT system so why enabling
>>>> SD_ASYM_PACKING for <5% diff ?
>>>>
>>>> That doesn't make sense to me
>>> I don't know if "works fine" describes the situation accurately.
>>> I guess I should've included the context in the cover letter, but you
>>> are aware of them (you've replied to them anyway):
>>> https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com/
>>> https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com/
>>>
>>> Andrea sees an improvement even when force-equalizing CPUs to remove
>>> SD_ASYM_CPUCAPACITY, so I'd argue it doesn't "work fine" on these platforms.
>>
>> IIUC this was for SMT systems not for !SMT ones but I might have
>> missed some emails in the thread.
> 
> Right, the issue I'm trying to solve is SD_ASYM_CPUCAPACITY + SMT. Removing
> SD_ASYM_CPUCAPACITY from the equation fixes my issue, because we fall back
> into the regular idle CPU selection policy, which avoids allocating both
> SMT siblings when possible.
> 
> Thanks,
> -Andrea

Could you also report how Grace baseline vs ASYM_PACKING works for your
benchmark? (or Vera nosmt)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26 15:55             ` Christian Loehle
@ 2026-03-26 16:00               ` Andrea Righi
  2026-03-27  9:53               ` Andrea Righi
  1 sibling, 0 replies; 23+ messages in thread
From: Andrea Righi @ 2026-03-26 16:00 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Vincent Guittot, peterz, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis

On Thu, Mar 26, 2026 at 03:55:54PM +0000, Christian Loehle wrote:
> On 3/26/26 13:45, Andrea Righi wrote:
> > On Thu, Mar 26, 2026 at 02:04:42PM +0100, Vincent Guittot wrote:
> >> On Thu, 26 Mar 2026 at 10:24, Christian Loehle <christian.loehle@arm.com> wrote:
> >>>
> >>> On 3/26/26 08:24, Vincent Guittot wrote:
> >>>> On Thu, 26 Mar 2026 at 09:16, Christian Loehle <christian.loehle@arm.com> wrote:
> >>>>>
> >>>>> On 3/26/26 07:53, Vincent Guittot wrote:
> >>>>>> On Wed, 25 Mar 2026 at 19:13, Christian Loehle <christian.loehle@arm.com> wrote:
> >>>>>>>
> >>>>>>> The scheduler currently handles CPU performance asymmetry via either:
> >>>>>>>
> >>>>>>> - SD_ASYM_PACKING: simple priority-based task placement (x86 ITMT)
> >>>>>>> - SD_ASYM_CPUCAPACITY: capacity-aware scheduling
> >>>>>>>
> >>>>>>> On arm64, capacity-aware scheduling is used for any detected capacity
> >>>>>>> differences.
> >>>>>>>
> >>>>>>> Some systems expose small per-CPU performance differences via CPPC
> >>>>>>> highest_perf (e.g. due to chip binning), resulting in slightly different
> >>>>>>> capacities (<~5%). These differences are sufficient to trigger
> >>>>>>> SD_ASYM_CPUCAPACITY, even though the system is otherwise effectively
> >>>>>>> symmetric.
> >>>>>>>
> >>>>>>> For such small deltas, capacity-aware scheduling is unnecessarily
> >>>>>>> complex. A simpler priority-based approach, similar to x86 ITMT, is
> >>>>>>> sufficient.
> >>>>>>
> >>>>>> I'm not convinced that moving to  SD_ASYM_PACKING is the right way to
> >>>>>> move forward.
> >>>>>> t
> >>>>>> 1st of all, do you target all kind of system or only SMT? It's not
> >>>>>> clear in your cover letter
> >>>>>
> >>>>> AFAIK only Andrea has access to an unreleased asymmetric SMT system,
> >>>>> I haven't done any tests on such a system (as the cover-letter mentions
> >>>>> under RFT section).
> >>>>>
> >>>>>>
> >>>>>> Moving on asym pack for !SMT doesn't make sense to me. If you don't
> >>>>>> want EAS enabled, you can disable it with
> >>>>>> /proc/sys/kernel/sched_energy_aware
> >>>>>
> >>>>> Sorry, what's EAS got to do with it? The system I care about here
> >>>>> (primarily nvidia grace) has no EM.
> >>>>
> >>>> I tried to understand the end goal of this patch
> >>>>
> >>>> SD_ASYM_CPUCAPACITY works fine with !SMT system so why enabling
> >>>> SD_ASYM_PACKING for <5% diff ?
> >>>>
> >>>> That doesn't make sense to me
> >>> I don't know if "works fine" describes the situation accurately.
> >>> I guess I should've included the context in the cover letter, but you
> >>> are aware of them (you've replied to them anyway):
> >>> https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com/
> >>> https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com/
> >>>
> >>> Andrea sees an improvement even when force-equalizing CPUs to remove
> >>> SD_ASYM_CPUCAPACITY, so I'd argue it doesn't "work fine" on these platforms.
> >>
> >> IIUC this was for SMT systems not for !SMT ones but I might have
> >> missed some emails in the thread.
> > 
> > Right, the issue I'm trying to solve is SD_ASYM_CPUCAPACITY + SMT. Removing
> > SD_ASYM_CPUCAPACITY from the equation fixes my issue, because we fall back
> > into the regular idle CPU selection policy, which avoids allocating both
> > SMT siblings when possible.
> > 
> > Thanks,
> > -Andrea
> 
> Could you also report how Grace baseline vs ASYM_PACKING works for your
> benchmark? (or Vera nosmt)
> 

Sure, I'll try testing both and report back.

-Andrea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
  2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
  2026-03-26 13:23   ` kernel test robot
  2026-03-26 15:26   ` kernel test robot
@ 2026-03-26 16:40   ` kernel test robot
  2 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2026-03-26 16:40 UTC (permalink / raw)
  To: Christian Loehle, arighi
  Cc: oe-kbuild-all, peterz, vincent.guittot, dietmar.eggemann,
	valentin.schneider, mingo, rostedt, segall, mgorman,
	catalin.marinas, will, sudeep.holla, rafael, linux-pm,
	linux-kernel, juri.lelli, kobak, fabecassis, Christian Loehle

Hi Christian,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on arm64/for-next/core driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus peterz-queue/sched/core linus/master v7.0-rc5 next-20260325]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christian-Loehle/sched-topology-Introduce-arch-hooks-for-asympacking/20260326-145644
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20260325181314.3875909-2-christian.loehle%40arm.com
patch subject: [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking
config: powerpc-randconfig-001-20260326 (https://download.01.org/0day-ci/archive/20260327/202603270017.gHZBIwsQ-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260327/202603270017.gHZBIwsQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603270017.gHZBIwsQ-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/topology.h:30,
                    from include/linux/gfp.h:8,
                    from include/linux/sched/mm.h:9,
                    from arch/powerpc/kernel/smp.c:18:
>> include/linux/arch_topology.h:131:32: error: redefinition of 'topology_arch_asym_cpu_priority'
     131 | #define arch_asym_cpu_priority topology_arch_asym_cpu_priority
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/kernel/smp.c:1767:5: note: in expansion of macro 'arch_asym_cpu_priority'
    1767 | int arch_asym_cpu_priority(int cpu)
         |     ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/arch_topology.h:132:19: note: previous definition of 'topology_arch_asym_cpu_priority' was here
     132 | static inline int topology_arch_asym_cpu_priority(int cpu)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/topology_arch_asym_cpu_priority +131 include/linux/arch_topology.h

   125	
   126	/*
   127	 * Architectures may override this to provide a custom CPU priority for
   128	 * asymmetric packing.
   129	 */
   130	#ifndef arch_asym_cpu_priority
 > 131	#define arch_asym_cpu_priority topology_arch_asym_cpu_priority
   132	static inline int topology_arch_asym_cpu_priority(int cpu)
   133	{
   134		return -cpu;
   135	}
   136	#endif
   137	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry
  2026-03-26 15:55             ` Christian Loehle
  2026-03-26 16:00               ` Andrea Righi
@ 2026-03-27  9:53               ` Andrea Righi
  1 sibling, 0 replies; 23+ messages in thread
From: Andrea Righi @ 2026-03-27  9:53 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Vincent Guittot, peterz, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis

Hi Christian,

On Thu, Mar 26, 2026 at 03:55:54PM +0000, Christian Loehle wrote:
...
> > Right, the issue I'm trying to solve is SD_ASYM_CPUCAPACITY + SMT. Removing
> > SD_ASYM_CPUCAPACITY from the equation fixes my issue, because we fall back
> > into the regular idle CPU selection policy, which avoids allocating both
> > SMT siblings when possible.
> > 
> > Thanks,
> > -Andrea
> 
> Could you also report how Grace baseline vs ASYM_PACKING works for your
> benchmark? (or Vera nosmt)
> 

I've done some tests with Vera nosmt. I don't see much difference with
ASYM_PACKING vs ASYM_CPUCAPACITY (baseline), pretty much in error range (I
see around 1-2% difference across runs, but there's not a clear bias
between the two solutions).

I'll try to find a Grace system and repeat the tests there as well.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking
  2026-03-25 18:13 ` [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking Christian Loehle
  2026-03-26 15:47   ` kernel test robot
  2026-03-26 15:47   ` kernel test robot
@ 2026-03-27 15:44   ` Valentin Schneider
  2 siblings, 0 replies; 23+ messages in thread
From: Valentin Schneider @ 2026-03-27 15:44 UTC (permalink / raw)
  To: Christian Loehle, arighi
  Cc: peterz, vincent.guittot, dietmar.eggemann, valentin.schneider,
	mingo, rostedt, segall, mgorman, catalin.marinas, will,
	sudeep.holla, rafael, linux-pm, linux-kernel, juri.lelli, kobak,
	fabecassis, Christian Loehle

On 25/03/26 18:13, Christian Loehle wrote:
> @@ -1753,7 +1760,7 @@ static inline int topology_arch_sched_asym_flags(void)
>  #ifdef CONFIG_SCHED_SMT
>  int cpu_smt_flags(void)
>  {
> -	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC;
> +	return SD_SHARE_CPUCAPACITY | SD_SHARE_LLC | arch_sched_asym_flags();
>  }
>  
>  const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cpu)
> @@ -1765,7 +1772,7 @@ const struct cpumask *tl_smt_mask(struct sched_domain_topology_level *tl, int cp
>  #ifdef CONFIG_SCHED_CLUSTER
>  int cpu_cluster_flags(void)
>  {
> -	return SD_CLUSTER | SD_SHARE_LLC;
> +	return SD_CLUSTER | SD_SHARE_LLC | arch_sched_asym_flags();
>  }
>  
>  const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cpu)
> @@ -1777,7 +1784,7 @@ const struct cpumask *tl_cls_mask(struct sched_domain_topology_level *tl, int cp
>  #ifdef CONFIG_SCHED_MC
>  int cpu_core_flags(void)
>  {
> -	return SD_SHARE_LLC;
> +	return SD_SHARE_LLC | arch_sched_asym_flags();
>  }
>

So while the binning problem applies to more than one architecture, I'm not
sure we want this to be generally applied to all topology levels. This is
/technically/ not a problem since even if a topology level has
SD_ASYM_PACKING, all CPUs at that level can have the same priority, but
in that case it's a bit wasteful.

I don't have any better ideas ATM to keep this arch-specific via
set_sched_topology(), like how x86 and powerpc handle asym packing.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-03-27 15:44 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 18:13 [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Christian Loehle
2026-03-25 18:13 ` [PATCH 1/3] sched/topology: Introduce arch hooks for asympacking Christian Loehle
2026-03-26 13:23   ` kernel test robot
2026-03-26 15:26   ` kernel test robot
2026-03-26 16:40   ` kernel test robot
2026-03-25 18:13 ` [PATCH 2/3] arch_topology: Export CPPC-based asympacking prios Christian Loehle
2026-03-25 18:13 ` [PATCH 3/3] arm64/sched: Enable CPPC-based asympacking Christian Loehle
2026-03-26 15:47   ` kernel test robot
2026-03-26 15:47   ` kernel test robot
2026-03-27 15:44   ` Valentin Schneider
2026-03-26  7:53 ` [RFC][RFT][PATCH 0/3] arm64: Enable asympacking for minor CPPC asymmetry Vincent Guittot
2026-03-26  8:16   ` Christian Loehle
2026-03-26  8:24     ` Vincent Guittot
2026-03-26  9:24       ` Christian Loehle
2026-03-26 13:04         ` Vincent Guittot
2026-03-26 13:45           ` Andrea Righi
2026-03-26 15:55             ` Christian Loehle
2026-03-26 16:00               ` Andrea Righi
2026-03-27  9:53               ` Andrea Righi
2026-03-26  8:20   ` Christian Loehle
2026-03-26  8:11 ` Andrea Righi
2026-03-26  8:20   ` Vincent Guittot
2026-03-26  9:15     ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox