* [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise @ 2026-03-24 0:55 Andrea Righi 2026-03-24 7:39 ` Vincent Guittot 0 siblings, 1 reply; 17+ messages in thread From: Andrea Righi @ 2026-03-24 0:55 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Christian Loehle, linux-kernel, Felix Abecassis On some platforms, the firmware may expose per-CPU performance differences (e.g., via ACPI CPPC highest_perf) even when the system is effectively symmetric. These small variations, typically due to silicon binning, are reflected in arch_scale_cpu_capacity() and end up being interpreted as real capacity asymmetry. As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, triggering asymmetry-specific behaviors, even though all CPUs have comparable performance. Prevent this by treating CPU capacities within 20% of the maximum value as equivalent when building the asymmetry topology. This filters out firmware noise, while preserving correct behavior on real heterogeneous systems, where capacity differences are significantly larger. Reported-by: Felix Abecassis <fabecassis@nvidia.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> --- kernel/sched/topology.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 061f8c85f5552..fe71ea9f3bda7 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1432,9 +1432,8 @@ static void free_asym_cap_entry(struct rcu_head *head) kfree(entry); } -static inline void asym_cpu_capacity_update_data(int cpu) +static inline void asym_cpu_capacity_update_data(int cpu, unsigned long capacity) { - unsigned long capacity = arch_scale_cpu_capacity(cpu); struct asym_cap_data *insert_entry = NULL; struct asym_cap_data *entry; @@ -1471,13 +1470,27 @@ static inline void asym_cpu_capacity_update_data(int cpu) static void asym_cpu_capacity_scan(void) { struct asym_cap_data *entry, *next; + unsigned long max_cap = 0; + unsigned long capacity; int cpu; list_for_each_entry(entry, &asym_cap_list, link) cpumask_clear(cpu_capacity_span(entry)); for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) - asym_cpu_capacity_update_data(cpu); + max_cap = max(max_cap, arch_scale_cpu_capacity(cpu)); + + /* + * Treat small capacity differences (< 20% max capacity) as noise, + * to prevent enabling SD_ASYM_CPUCAPACITY when it's not really + * needed. + */ + for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) { + capacity = arch_scale_cpu_capacity(cpu); + if (capacity * 5 >= max_cap * 4) + capacity = max_cap; + asym_cpu_capacity_update_data(cpu, capacity); + } list_for_each_entry_safe(entry, next, &asym_cap_list, link) { if (cpumask_empty(cpu_capacity_span(entry))) { -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 0:55 [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise Andrea Righi @ 2026-03-24 7:39 ` Vincent Guittot 2026-03-24 7:55 ` Christian Loehle 2026-03-24 9:39 ` Andrea Righi 0 siblings, 2 replies; 17+ messages in thread From: Vincent Guittot @ 2026-03-24 7:39 UTC (permalink / raw) To: Andrea Righi Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Christian Loehle, linux-kernel, Felix Abecassis On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > On some platforms, the firmware may expose per-CPU performance > differences (e.g., via ACPI CPPC highest_perf) even when the system is > effectively symmetric. These small variations, typically due to silicon > binning, are reflected in arch_scale_cpu_capacity() and end up being > interpreted as real capacity asymmetry. > > As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, > triggering asymmetry-specific behaviors, even though all CPUs have > comparable performance. > > Prevent this by treating CPU capacities within 20% of the maximum value 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of 871 but we still want to keep them different Why would 5% not be enough? > as equivalent when building the asymmetry topology. This filters out > firmware noise, while preserving correct behavior on real heterogeneous > systems, where capacity differences are significantly larger. > > Reported-by: Felix Abecassis <fabecassis@nvidia.com> > Signed-off-by: Andrea Righi <arighi@nvidia.com> > --- > kernel/sched/topology.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 061f8c85f5552..fe71ea9f3bda7 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -1432,9 +1432,8 @@ static void free_asym_cap_entry(struct rcu_head *head) > kfree(entry); > } > > -static inline void asym_cpu_capacity_update_data(int cpu) > +static inline void asym_cpu_capacity_update_data(int cpu, unsigned long capacity) > { > - unsigned long capacity = arch_scale_cpu_capacity(cpu); > struct asym_cap_data *insert_entry = NULL; > struct asym_cap_data *entry; > > @@ -1471,13 +1470,27 @@ static inline void asym_cpu_capacity_update_data(int cpu) > static void asym_cpu_capacity_scan(void) > { > struct asym_cap_data *entry, *next; > + unsigned long max_cap = 0; > + unsigned long capacity; > int cpu; > > list_for_each_entry(entry, &asym_cap_list, link) > cpumask_clear(cpu_capacity_span(entry)); > > for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) > - asym_cpu_capacity_update_data(cpu); > + max_cap = max(max_cap, arch_scale_cpu_capacity(cpu)); > + > + /* > + * Treat small capacity differences (< 20% max capacity) as noise, > + * to prevent enabling SD_ASYM_CPUCAPACITY when it's not really > + * needed. > + */ > + for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) { > + capacity = arch_scale_cpu_capacity(cpu); > + if (capacity * 5 >= max_cap * 4) > + capacity = max_cap; > + asym_cpu_capacity_update_data(cpu, capacity); > + } > > list_for_each_entry_safe(entry, next, &asym_cap_list, link) { > if (cpumask_empty(cpu_capacity_span(entry))) { > -- > 2.53.0 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 7:39 ` Vincent Guittot @ 2026-03-24 7:55 ` Christian Loehle 2026-03-24 8:08 ` Christian Loehle 2026-03-24 9:39 ` Andrea Righi 1 sibling, 1 reply; 17+ messages in thread From: Christian Loehle @ 2026-03-24 7:55 UTC (permalink / raw) To: Vincent Guittot, Andrea Righi Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On 3/24/26 07:39, Vincent Guittot wrote: > On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: >> >> On some platforms, the firmware may expose per-CPU performance >> differences (e.g., via ACPI CPPC highest_perf) even when the system is >> effectively symmetric. These small variations, typically due to silicon >> binning, are reflected in arch_scale_cpu_capacity() and end up being >> interpreted as real capacity asymmetry. >> >> As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, >> triggering asymmetry-specific behaviors, even though all CPUs have >> comparable performance. >> >> Prevent this by treating CPU capacities within 20% of the maximum value > > 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of > 871 but we still want to keep them different > > Why would 5% not be enough? I've also used 5%, or rather the existing capacity_greater() macro. >[snip] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 7:55 ` Christian Loehle @ 2026-03-24 8:08 ` Christian Loehle 2026-03-24 9:46 ` Andrea Righi 0 siblings, 1 reply; 17+ messages in thread From: Christian Loehle @ 2026-03-24 8:08 UTC (permalink / raw) To: Vincent Guittot, Andrea Righi Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On 3/24/26 07:55, Christian Loehle wrote: > On 3/24/26 07:39, Vincent Guittot wrote: >> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: >>> >>> On some platforms, the firmware may expose per-CPU performance >>> differences (e.g., via ACPI CPPC highest_perf) even when the system is >>> effectively symmetric. These small variations, typically due to silicon >>> binning, are reflected in arch_scale_cpu_capacity() and end up being >>> interpreted as real capacity asymmetry. >>> >>> As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, >>> triggering asymmetry-specific behaviors, even though all CPUs have >>> comparable performance. >>> >>> Prevent this by treating CPU capacities within 20% of the maximum value >> >> 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of >> 871 but we still want to keep them different >> >> Why would 5% not be enough? > > I've also used 5%, or rather the existing capacity_greater() macro. Also, given that this patch even mentions this as "noise" one might ask why the firmware wouldn't force-equalise this. Anyway let me finally send out those asympacking patches which would make that issue obsolete because we actually make use of the highest_perf information from the firmware. > >> [snip] > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 8:08 ` Christian Loehle @ 2026-03-24 9:46 ` Andrea Righi 2026-03-24 10:29 ` Dietmar Eggemann 0 siblings, 1 reply; 17+ messages in thread From: Andrea Righi @ 2026-03-24 9:46 UTC (permalink / raw) To: Christian Loehle Cc: Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis Hi Christian, On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: > On 3/24/26 07:55, Christian Loehle wrote: > > On 3/24/26 07:39, Vincent Guittot wrote: > >> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > >>> > >>> On some platforms, the firmware may expose per-CPU performance > >>> differences (e.g., via ACPI CPPC highest_perf) even when the system is > >>> effectively symmetric. These small variations, typically due to silicon > >>> binning, are reflected in arch_scale_cpu_capacity() and end up being > >>> interpreted as real capacity asymmetry. > >>> > >>> As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, > >>> triggering asymmetry-specific behaviors, even though all CPUs have > >>> comparable performance. > >>> > >>> Prevent this by treating CPU capacities within 20% of the maximum value > >> > >> 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of > >> 871 but we still want to keep them different > >> > >> Why would 5% not be enough? > > > > I've also used 5%, or rather the existing capacity_greater() macro. > > Also, given that this patch even mentions this as "noise" one might ask > why the firmware wouldn't force-equalise this. I think it's reasonable to consider that as "noise" from a scheduler perspective, but from a hardware/firmware point of view I don't have strong arguments to propose equalizing the highest_perf values. At the end, at least in my case, it seems all compliant with the ACPI/CPPC specs and suggesting to equalize them because "the kernel doesn't handle it well" doesn't seem like a solid motivation... > Anyway let me finally send out those asympacking patches which would make > that issue obsolete because we actually make use of the highest_perf > information from the firmware. Looking forward to that. :) Thanks, -Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 9:46 ` Andrea Righi @ 2026-03-24 10:29 ` Dietmar Eggemann 2026-03-24 11:01 ` Andrea Righi 0 siblings, 1 reply; 17+ messages in thread From: Dietmar Eggemann @ 2026-03-24 10:29 UTC (permalink / raw) To: Andrea Righi, Christian Loehle Cc: Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On 24.03.26 10:46, Andrea Righi wrote: > Hi Christian, > > On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: >> On 3/24/26 07:55, Christian Loehle wrote: >>> On 3/24/26 07:39, Vincent Guittot wrote: >>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: [...] >>>> 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of >>>> 871 but we still want to keep them different >>>> >>>> Why would 5% not be enough? >>> >>> I've also used 5%, or rather the existing capacity_greater() macro. >> >> Also, given that this patch even mentions this as "noise" one might ask >> why the firmware wouldn't force-equalise this. > > I think it's reasonable to consider that as "noise" from a scheduler > perspective, but from a hardware/firmware point of view I don't have strong > arguments to propose equalizing the highest_perf values. At the end, at > least in my case, it seems all compliant with the ACPI/CPPC specs and > suggesting to equalize them because "the kernel doesn't handle it well" > doesn't seem like a solid motivation... The first time we observed this on NVIDIA Grace, we wondered whether there might be functionality outside the task scheduler that makes use of these slightly heterogeneous CPU capacity values from CPPC—and whether the dependency on task scheduling was simply an overlooked phenomenon. And then there was DCPerf Mediawiki on 72 CPUs system always scoring better with sched_asym_cpucap_active() = TRUE (mentioned already by Chris L. in: https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com >> Anyway let me finally send out those asympacking patches which would make >> that issue obsolete because we actually make use of the highest_perf >> information from the firmware. > > Looking forward to that. :) > > Thanks, > -Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 10:29 ` Dietmar Eggemann @ 2026-03-24 11:01 ` Andrea Righi 2026-03-25 9:23 ` Dietmar Eggemann 0 siblings, 1 reply; 17+ messages in thread From: Andrea Righi @ 2026-03-24 11:01 UTC (permalink / raw) To: Dietmar Eggemann Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis Hi Dietmar, On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: > On 24.03.26 10:46, Andrea Righi wrote: > > Hi Christian, > > > > On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: > >> On 3/24/26 07:55, Christian Loehle wrote: > >>> On 3/24/26 07:39, Vincent Guittot wrote: > >>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > [...] > > >>>> 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of > >>>> 871 but we still want to keep them different > >>>> > >>>> Why would 5% not be enough? > >>> > >>> I've also used 5%, or rather the existing capacity_greater() macro. > >> > >> Also, given that this patch even mentions this as "noise" one might ask > >> why the firmware wouldn't force-equalise this. > > > > I think it's reasonable to consider that as "noise" from a scheduler > > perspective, but from a hardware/firmware point of view I don't have strong > > arguments to propose equalizing the highest_perf values. At the end, at > > least in my case, it seems all compliant with the ACPI/CPPC specs and > > suggesting to equalize them because "the kernel doesn't handle it well" > > doesn't seem like a solid motivation... > > The first time we observed this on NVIDIA Grace, we wondered whether > there might be functionality outside the task scheduler that makes use > of these slightly heterogeneous CPU capacity values from CPPC—and > whether the dependency on task scheduling was simply an overlooked > phenomenon. > > And then there was DCPerf Mediawiki on 72 CPUs system always scoring > better with sched_asym_cpucap_active() = TRUE (mentioned already by > Chris L. in: > https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com Yeah, I think Chris' asym-packing approach might be the safest thing to do. At the same time it would be nice to improve asym-capacity to introduce some concept of SMT awareness, that was my original attempt with https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com, since we may see similar asym-capacity benefits on Vera (that has SMT, unlike Grace). What do you think? Thanks, -Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 11:01 ` Andrea Righi @ 2026-03-25 9:23 ` Dietmar Eggemann 2026-03-25 9:32 ` Andrea Righi 0 siblings, 1 reply; 17+ messages in thread From: Dietmar Eggemann @ 2026-03-25 9:23 UTC (permalink / raw) To: Andrea Righi Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On 24.03.26 12:01, Andrea Righi wrote: > Hi Dietmar, > > On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: >> On 24.03.26 10:46, Andrea Righi wrote: >>> Hi Christian, >>> >>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: >>>> On 3/24/26 07:55, Christian Loehle wrote: >>>>> On 3/24/26 07:39, Vincent Guittot wrote: >>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: [...] >> The first time we observed this on NVIDIA Grace, we wondered whether >> there might be functionality outside the task scheduler that makes use >> of these slightly heterogeneous CPU capacity values from CPPC—and >> whether the dependency on task scheduling was simply an overlooked >> phenomenon. >> >> And then there was DCPerf Mediawiki on 72 CPUs system always scoring >> better with sched_asym_cpucap_active() = TRUE (mentioned already by >> Chris L. in: >> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com > > Yeah, I think Chris' asym-packing approach might be the safest thing to do. > > At the same time it would be nice to improve asym-capacity to introduce > some concept of SMT awareness, that was my original attempt with > https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com, > since we may see similar asym-capacity benefits on Vera (that has SMT, > unlike Grace). What do you think? We never found a good way to specify a CPU capacity in the SMT case (EAS and energy model included). So comparing CPU capacity w/ utilization, CPU overutilization detection etc. definitions get more blurry. But in case you now want to hide these small CPU capacity differences from asym-cpucap setup you won't run into this 'SD_SHARE_CPUCAPACITY + SD_ASYM_CPUCAPACITY'. You still will have small differences in sched group capacities but this is covered by load-balance. BTW, you should have seen on Vera ?: sd_int() [kernel/sched/.topology.c] 1720 WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) == 1721 (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY), 1722 "CPU capacity asymmetry not supported on SMT\n"); ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 9:23 ` Dietmar Eggemann @ 2026-03-25 9:32 ` Andrea Righi 2026-03-25 11:16 ` Dietmar Eggemann 2026-03-25 12:48 ` Phil Auld 0 siblings, 2 replies; 17+ messages in thread From: Andrea Righi @ 2026-03-25 9:32 UTC (permalink / raw) To: Dietmar Eggemann Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote: > On 24.03.26 12:01, Andrea Righi wrote: > > Hi Dietmar, > > > > On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: > >> On 24.03.26 10:46, Andrea Righi wrote: > >>> Hi Christian, > >>> > >>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: > >>>> On 3/24/26 07:55, Christian Loehle wrote: > >>>>> On 3/24/26 07:39, Vincent Guittot wrote: > >>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > [...] > > >> The first time we observed this on NVIDIA Grace, we wondered whether > >> there might be functionality outside the task scheduler that makes use > >> of these slightly heterogeneous CPU capacity values from CPPC—and > >> whether the dependency on task scheduling was simply an overlooked > >> phenomenon. > >> > >> And then there was DCPerf Mediawiki on 72 CPUs system always scoring > >> better with sched_asym_cpucap_active() = TRUE (mentioned already by > >> Chris L. in: > >> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com > > > > Yeah, I think Chris' asym-packing approach might be the safest thing to do. > > > > At the same time it would be nice to improve asym-capacity to introduce > > some concept of SMT awareness, that was my original attempt with > > https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com, > > since we may see similar asym-capacity benefits on Vera (that has SMT, > > unlike Grace). What do you think? > > We never found a good way to specify a CPU capacity in the SMT case (EAS > and energy model included). So comparing CPU capacity w/ utilization, CPU > overutilization detection etc. definitions get more blurry. Hm... so should we just avoid calling select_idle_capacity() when SMT is enabled to prevent waking up tasks on both SMT siblings when there are fully-idle SMT cores? > > But in case you now want to hide these small CPU capacity differences from > asym-cpucap setup you won't run into this 'SD_SHARE_CPUCAPACITY + > SD_ASYM_CPUCAPACITY'. > > You still will have small differences in sched group capacities but this > is covered by load-balance. > > BTW, you should have seen on Vera ?: > > sd_int() [kernel/sched/.topology.c] > > 1720 WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) == > 1721 (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY), > 1722 "CPU capacity asymmetry not supported on SMT\n"); Yep, I've seen that. :) Thanks, -Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 9:32 ` Andrea Righi @ 2026-03-25 11:16 ` Dietmar Eggemann 2026-03-25 12:25 ` Andrea Righi 2026-03-25 12:48 ` Phil Auld 1 sibling, 1 reply; 17+ messages in thread From: Dietmar Eggemann @ 2026-03-25 11:16 UTC (permalink / raw) To: Andrea Righi Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On 25.03.26 10:32, Andrea Righi wrote: > On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote: >> On 24.03.26 12:01, Andrea Righi wrote: >>> Hi Dietmar, >>> >>> On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: >>>> On 24.03.26 10:46, Andrea Righi wrote: >>>>> Hi Christian, >>>>> >>>>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: >>>>>> On 3/24/26 07:55, Christian Loehle wrote: >>>>>>> On 3/24/26 07:39, Vincent Guittot wrote: >>>>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: >> >> [...] >> >>>> The first time we observed this on NVIDIA Grace, we wondered whether >>>> there might be functionality outside the task scheduler that makes use >>>> of these slightly heterogeneous CPU capacity values from CPPC—and >>>> whether the dependency on task scheduling was simply an overlooked >>>> phenomenon. >>>> >>>> And then there was DCPerf Mediawiki on 72 CPUs system always scoring >>>> better with sched_asym_cpucap_active() = TRUE (mentioned already by >>>> Chris L. in: >>>> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com >>> >>> Yeah, I think Chris' asym-packing approach might be the safest thing to do. >>> >>> At the same time it would be nice to improve asym-capacity to introduce >>> some concept of SMT awareness, that was my original attempt with >>> https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com, >>> since we may see similar asym-capacity benefits on Vera (that has SMT, >>> unlike Grace). What do you think? >> >> We never found a good way to specify a CPU capacity in the SMT case (EAS >> and energy model included). So comparing CPU capacity w/ utilization, CPU >> overutilization detection etc. definitions get more blurry. > > Hm... so should we just avoid calling select_idle_capacity() when SMT is > enabled to prevent waking up tasks on both SMT siblings when there are > fully-idle SMT cores? Yeah, pretty much. So prefer (2) over (1). IMHO, we do have a similar issue here. Can we say that a logical CPU is idle if its SMT sibling isn't? But at least we don't have to use any CPU cap/util comparison there. select_idle_sibling() 8132 if (sched_smt_active()) { 8133 has_idle_core = test_idle_cores(target); 8134 8135 if (!has_idle_core && cpus_share_cache(prev, target)) { <-- (1) 8136 i = select_idle_smt(p, sd, prev); 8137 if ((unsigned int)i < nr_cpumask_bits) 8138 return i; 8139 } 8140 } 8141 8142 i = select_idle_cpu(p, sd, has_idle_core, target); <-- (2a) 8143 if ((unsigned)i < nr_cpumask_bits) 8144 return i select_idle_cpu() 7926 for_each_cpu_wrap(cpu, cpus, target + 1) { 7927 if (has_idle_core) { 7928 i = select_idle_core(p, cpu, cpus, &idle_cpu); <-- (2b) 7929 if ((unsigned int)i < nr_cpumask_bits) 7930 return i; 7931 7932 } else { 7933 if (--nr <= 0) 7934 return -1; 7935 idle_cpu = __select_idle_cpu(cpu, p); 7936 if ((unsigned int)idle_cpu < nr_cpumask_bits) 7937 break; 7938 } 7939 } [...] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 11:16 ` Dietmar Eggemann @ 2026-03-25 12:25 ` Andrea Righi 2026-03-25 15:26 ` Dietmar Eggemann 0 siblings, 1 reply; 17+ messages in thread From: Andrea Righi @ 2026-03-25 12:25 UTC (permalink / raw) To: Dietmar Eggemann Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On Wed, Mar 25, 2026 at 12:16:59PM +0100, Dietmar Eggemann wrote: > On 25.03.26 10:32, Andrea Righi wrote: > > On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote: > >> On 24.03.26 12:01, Andrea Righi wrote: > >>> Hi Dietmar, > >>> > >>> On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: > >>>> On 24.03.26 10:46, Andrea Righi wrote: > >>>>> Hi Christian, > >>>>> > >>>>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: > >>>>>> On 3/24/26 07:55, Christian Loehle wrote: > >>>>>>> On 3/24/26 07:39, Vincent Guittot wrote: > >>>>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > >> > >> [...] > >> > >>>> The first time we observed this on NVIDIA Grace, we wondered whether > >>>> there might be functionality outside the task scheduler that makes use > >>>> of these slightly heterogeneous CPU capacity values from CPPC—and > >>>> whether the dependency on task scheduling was simply an overlooked > >>>> phenomenon. > >>>> > >>>> And then there was DCPerf Mediawiki on 72 CPUs system always scoring > >>>> better with sched_asym_cpucap_active() = TRUE (mentioned already by > >>>> Chris L. in: > >>>> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com > >>> > >>> Yeah, I think Chris' asym-packing approach might be the safest thing to do. > >>> > >>> At the same time it would be nice to improve asym-capacity to introduce > >>> some concept of SMT awareness, that was my original attempt with > >>> https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com, > >>> since we may see similar asym-capacity benefits on Vera (that has SMT, > >>> unlike Grace). What do you think? > >> > >> We never found a good way to specify a CPU capacity in the SMT case (EAS > >> and energy model included). So comparing CPU capacity w/ utilization, CPU > >> overutilization detection etc. definitions get more blurry. > > > > Hm... so should we just avoid calling select_idle_capacity() when SMT is > > enabled to prevent waking up tasks on both SMT siblings when there are > > fully-idle SMT cores? > > Yeah, pretty much. So prefer (2) over (1). > > IMHO, we do have a similar issue here. Can we say that a logical CPU is idle > if its SMT sibling isn't? But at least we don't have to use any CPU cap/util > comparison there. > > select_idle_sibling() > > 8132 if (sched_smt_active()) { > 8133 has_idle_core = test_idle_cores(target); > 8134 > 8135 if (!has_idle_core && cpus_share_cache(prev, target)) { <-- (1) > 8136 i = select_idle_smt(p, sd, prev); > 8137 if ((unsigned int)i < nr_cpumask_bits) > 8138 return i; > 8139 } > 8140 } > 8141 > 8142 i = select_idle_cpu(p, sd, has_idle_core, target); <-- (2a) > 8143 if ((unsigned)i < nr_cpumask_bits) > 8144 return i > > select_idle_cpu() > > 7926 for_each_cpu_wrap(cpu, cpus, target + 1) { > 7927 if (has_idle_core) { > 7928 i = select_idle_core(p, cpu, cpus, &idle_cpu); <-- (2b) > 7929 if ((unsigned int)i < nr_cpumask_bits) > 7930 return i; > 7931 > 7932 } else { > 7933 if (--nr <= 0) > 7934 return -1; > 7935 idle_cpu = __select_idle_cpu(cpu, p); > 7936 if ((unsigned int)idle_cpu < nr_cpumask_bits) > 7937 break; > 7938 } > 7939 } Exactly, we already prefer fully-idle cores over partially-idle cores with asym-capacity disabled, but in that case the idle selection logic stays in a world of idle bits, without cap/util math, so it's a bit easier. And it's probably fine also when we have both asym-capacity + SMT (at least it seems better than what we have now, ignoring the SMT part). Essentially having somethig like the following (which already gives better performance on Vera): kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d57c02e82f3a1..534634f813fca 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8086,7 +8086,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) * For asymmetric CPU capacity systems, our domain of interest is * sd_asym_cpucapacity rather than sd_llc. */ - if (sched_asym_cpucap_active()) { + if (sched_asym_cpucap_active() && !sched_smt_active()) { sd = rcu_dereference_all(per_cpu(sd_asym_cpucapacity, target)); /* * On an asymmetric CPU capacity system where an exclusive Thanks, -Andrea ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 12:25 ` Andrea Righi @ 2026-03-25 15:26 ` Dietmar Eggemann 2026-03-25 16:50 ` Andrea Righi 0 siblings, 1 reply; 17+ messages in thread From: Dietmar Eggemann @ 2026-03-25 15:26 UTC (permalink / raw) To: Andrea Righi Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On 25.03.26 13:25, Andrea Righi wrote: > On Wed, Mar 25, 2026 at 12:16:59PM +0100, Dietmar Eggemann wrote: >> On 25.03.26 10:32, Andrea Righi wrote: >>> On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote: >>>> On 24.03.26 12:01, Andrea Righi wrote: >>>>> Hi Dietmar, >>>>> >>>>> On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: >>>>>> On 24.03.26 10:46, Andrea Righi wrote: >>>>>>> Hi Christian, >>>>>>> >>>>>>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: >>>>>>>> On 3/24/26 07:55, Christian Loehle wrote: >>>>>>>>> On 3/24/26 07:39, Vincent Guittot wrote: >>>>>>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: [...] > Exactly, we already prefer fully-idle cores over partially-idle cores with > asym-capacity disabled, but in that case the idle selection logic stays in > a world of idle bits, without cap/util math, so it's a bit easier. And it's > probably fine also when we have both asym-capacity + SMT (at least it seems > better than what we have now, ignoring the SMT part). > > Essentially having somethig like the following (which already gives better > performance on Vera): > > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d57c02e82f3a1..534634f813fca 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8086,7 +8086,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) > * For asymmetric CPU capacity systems, our domain of interest is > * sd_asym_cpucapacity rather than sd_llc. > */ > - if (sched_asym_cpucap_active()) { > + if (sched_asym_cpucap_active() && !sched_smt_active()) { > sd = rcu_dereference_all(per_cpu(sd_asym_cpucapacity, target)); > /* > * On an asymmetric CPU capacity system where an exclusive Ah, I thought we were talking !sched_asym_cpucap_active() case, either by letting CPPC return the same value for all CPUs or by introducing this 20%/5% threshold into asym_cpu_capacity_scan(). ASYM_CPUCAP + SHARE_CPUCAP vs SHARE_CPUCAP would still behave slightly differently because of asym_fits_cpu() in all those early bailout conditions (1) in sis(). select_idle_sibling() if (choose_idle_cpu(target, p) && asym_fits_cpu(task_util, util_min, util_max, target)) <-- (1) return target; ... And you would still have misfit_task load balance enabled. Those subtle differences may influence behavior compared to a simpler homogeneous CPU capacity model, but it’s unclear whether they justify introducing yet another variant alongside the existing homogeneous and fully heterogeneous (non-SMT) approaches. IMHO, we should only consider allowing this if there is clear evidence of significant benefits across a representative range of benchmarks and workloads. [...] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 15:26 ` Dietmar Eggemann @ 2026-03-25 16:50 ` Andrea Righi 0 siblings, 0 replies; 17+ messages in thread From: Andrea Righi @ 2026-03-25 16:50 UTC (permalink / raw) To: Dietmar Eggemann Cc: Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On Wed, Mar 25, 2026 at 04:26:44PM +0100, Dietmar Eggemann wrote: > On 25.03.26 13:25, Andrea Righi wrote: > > On Wed, Mar 25, 2026 at 12:16:59PM +0100, Dietmar Eggemann wrote: > >> On 25.03.26 10:32, Andrea Righi wrote: > >>> On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote: > >>>> On 24.03.26 12:01, Andrea Righi wrote: > >>>>> Hi Dietmar, > >>>>> > >>>>> On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: > >>>>>> On 24.03.26 10:46, Andrea Righi wrote: > >>>>>>> Hi Christian, > >>>>>>> > >>>>>>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: > >>>>>>>> On 3/24/26 07:55, Christian Loehle wrote: > >>>>>>>>> On 3/24/26 07:39, Vincent Guittot wrote: > >>>>>>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > [...] > > > Exactly, we already prefer fully-idle cores over partially-idle cores with > > asym-capacity disabled, but in that case the idle selection logic stays in > > a world of idle bits, without cap/util math, so it's a bit easier. And it's > > probably fine also when we have both asym-capacity + SMT (at least it seems > > better than what we have now, ignoring the SMT part). > > > > Essentially having somethig like the following (which already gives better > > performance on Vera): > > > > kernel/sched/fair.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index d57c02e82f3a1..534634f813fca 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8086,7 +8086,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) > > * For asymmetric CPU capacity systems, our domain of interest is > > * sd_asym_cpucapacity rather than sd_llc. > > */ > > - if (sched_asym_cpucap_active()) { > > + if (sched_asym_cpucap_active() && !sched_smt_active()) { > > sd = rcu_dereference_all(per_cpu(sd_asym_cpucapacity, target)); > > /* > > * On an asymmetric CPU capacity system where an exclusive > > Ah, I thought we were talking !sched_asym_cpucap_active() case, either > by letting CPPC return the same value for all CPUs or by introducing > this 20%/5% threshold into asym_cpu_capacity_scan(). Sure, we can also equalize capacity via CPPC, but I tought we were worried about potential regressions with other systems that don't have SMT and may actually benefit from the asym-capacity logic. Moreover, if any other platform with SMT enables asym CPU by slightly exceeding the 5% margin, we may face the same issue again. > > ASYM_CPUCAP + SHARE_CPUCAP vs SHARE_CPUCAP would still behave slightly > differently because of asym_fits_cpu() in all those early bailout > conditions (1) in sis(). > > select_idle_sibling() > > if (choose_idle_cpu(target, p) && > asym_fits_cpu(task_util, util_min, util_max, target)) <-- (1) > return target; > > ... Ah yes, this also needs to be changed... > > And you would still have misfit_task load balance enabled. Correct, in fact to get the optimal performance on Vera with asym-capacity enabled, I also need to fix the misfit logic to prioritize fully-idle SMT cores. Same with find_new_ilb() and potentially other places. With these I get almost 2x improvement in some cases, which is pretty big. But I get similar results also disabling asym-capacity via the 5% threshold. > > Those subtle differences may influence behavior compared to a simpler > homogeneous CPU capacity model, but it’s unclear whether they justify > introducing yet another variant alongside the existing homogeneous and > fully heterogeneous (non-SMT) approaches. > > IMHO, we should only consider allowing this if there is clear evidence > of significant benefits across a representative range of benchmarks and > workloads. Totally agree. But there's still the fact that select_idle_capacity() is not compatible with SMT, so it should be avoided when SMT is enabled, in a way or another. Thanks, -Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 9:32 ` Andrea Righi 2026-03-25 11:16 ` Dietmar Eggemann @ 2026-03-25 12:48 ` Phil Auld 1 sibling, 0 replies; 17+ messages in thread From: Phil Auld @ 2026-03-25 12:48 UTC (permalink / raw) To: Andrea Righi Cc: Dietmar Eggemann, Christian Loehle, Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Felix Abecassis On Wed, Mar 25, 2026 at 10:32:28AM +0100 Andrea Righi wrote: > On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote: > > On 24.03.26 12:01, Andrea Righi wrote: > > > Hi Dietmar, > > > > > > On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote: > > >> On 24.03.26 10:46, Andrea Righi wrote: > > >>> Hi Christian, > > >>> > > >>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote: > > >>>> On 3/24/26 07:55, Christian Loehle wrote: > > >>>>> On 3/24/26 07:39, Vincent Guittot wrote: > > >>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > > > [...] > > > > >> The first time we observed this on NVIDIA Grace, we wondered whether > > >> there might be functionality outside the task scheduler that makes use > > >> of these slightly heterogeneous CPU capacity values from CPPC—and > > >> whether the dependency on task scheduling was simply an overlooked > > >> phenomenon. > > >> > > >> And then there was DCPerf Mediawiki on 72 CPUs system always scoring > > >> better with sched_asym_cpucap_active() = TRUE (mentioned already by > > >> Chris L. in: > > >> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com > > > > > > Yeah, I think Chris' asym-packing approach might be the safest thing to do. > > > > > > At the same time it would be nice to improve asym-capacity to introduce > > > some concept of SMT awareness, that was my original attempt with > > > https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com, > > > since we may see similar asym-capacity benefits on Vera (that has SMT, > > > unlike Grace). What do you think? > > > > We never found a good way to specify a CPU capacity in the SMT case (EAS > > and energy model included). So comparing CPU capacity w/ utilization, CPU > > overutilization detection etc. definitions get more blurry. > > Hm... so should we just avoid calling select_idle_capacity() when SMT is > enabled to prevent waking up tasks on both SMT siblings when there are > fully-idle SMT cores? > That might be a good idea. Especially if it's general and not tied to EAS/ASYM. I'm getting some requests for something like that. CHeers, Phil > > > > But in case you now want to hide these small CPU capacity differences from > > asym-cpucap setup you won't run into this 'SD_SHARE_CPUCAPACITY + > > SD_ASYM_CPUCAPACITY'. > > > > You still will have small differences in sched group capacities but this > > is covered by load-balance. > > > > BTW, you should have seen on Vera ?: > > > > sd_int() [kernel/sched/.topology.c] > > > > 1720 WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) == > > 1721 (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY), > > 1722 "CPU capacity asymmetry not supported on SMT\n"); > > Yep, I've seen that. :) > > Thanks, > -Andrea > -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 7:39 ` Vincent Guittot 2026-03-24 7:55 ` Christian Loehle @ 2026-03-24 9:39 ` Andrea Righi 2026-03-25 3:30 ` Koba Ko 1 sibling, 1 reply; 17+ messages in thread From: Andrea Righi @ 2026-03-24 9:39 UTC (permalink / raw) To: Vincent Guittot Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Christian Loehle, linux-kernel, Felix Abecassis Hi Vincent, On Tue, Mar 24, 2026 at 08:39:34AM +0100, Vincent Guittot wrote: > On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > > > On some platforms, the firmware may expose per-CPU performance > > differences (e.g., via ACPI CPPC highest_perf) even when the system is > > effectively symmetric. These small variations, typically due to silicon > > binning, are reflected in arch_scale_cpu_capacity() and end up being > > interpreted as real capacity asymmetry. > > > > As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, > > triggering asymmetry-specific behaviors, even though all CPUs have > > comparable performance. > > > > Prevent this by treating CPU capacities within 20% of the maximum value > > 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of > 871 but we still want to keep them different > > Why would 5% not be enough? Sure, 5% seems a more reasonable margin. I'll just reuse capacity_greater() as suggested by Christian. Thanks, -Andrea > > > > > as equivalent when building the asymmetry topology. This filters out > > firmware noise, while preserving correct behavior on real heterogeneous > > systems, where capacity differences are significantly larger. > > > > Reported-by: Felix Abecassis <fabecassis@nvidia.com> > > Signed-off-by: Andrea Righi <arighi@nvidia.com> > > --- > > kernel/sched/topology.c | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 061f8c85f5552..fe71ea9f3bda7 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -1432,9 +1432,8 @@ static void free_asym_cap_entry(struct rcu_head *head) > > kfree(entry); > > } > > > > -static inline void asym_cpu_capacity_update_data(int cpu) > > +static inline void asym_cpu_capacity_update_data(int cpu, unsigned long capacity) > > { > > - unsigned long capacity = arch_scale_cpu_capacity(cpu); > > struct asym_cap_data *insert_entry = NULL; > > struct asym_cap_data *entry; > > > > @@ -1471,13 +1470,27 @@ static inline void asym_cpu_capacity_update_data(int cpu) > > static void asym_cpu_capacity_scan(void) > > { > > struct asym_cap_data *entry, *next; > > + unsigned long max_cap = 0; > > + unsigned long capacity; > > int cpu; > > > > list_for_each_entry(entry, &asym_cap_list, link) > > cpumask_clear(cpu_capacity_span(entry)); > > > > for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) > > - asym_cpu_capacity_update_data(cpu); > > + max_cap = max(max_cap, arch_scale_cpu_capacity(cpu)); > > + > > + /* > > + * Treat small capacity differences (< 20% max capacity) as noise, > > + * to prevent enabling SD_ASYM_CPUCAPACITY when it's not really > > + * needed. > > + */ > > + for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) { > > + capacity = arch_scale_cpu_capacity(cpu); > > + if (capacity * 5 >= max_cap * 4) > > + capacity = max_cap; > > + asym_cpu_capacity_update_data(cpu, capacity); > > + } > > > > list_for_each_entry_safe(entry, next, &asym_cap_list, link) { > > if (cpumask_empty(cpu_capacity_span(entry))) { > > -- > > 2.53.0 > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-24 9:39 ` Andrea Righi @ 2026-03-25 3:30 ` Koba Ko 2026-03-25 12:29 ` Andrea Righi 0 siblings, 1 reply; 17+ messages in thread From: Koba Ko @ 2026-03-25 3:30 UTC (permalink / raw) To: Andrea Righi Cc: Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Christian Loehle, linux-kernel, Felix Abecassis On Tue, Mar 24, 2026 at 10:39:41AM +0100, Andrea Righi wrote: > Hi Vincent, > > On Tue, Mar 24, 2026 at 08:39:34AM +0100, Vincent Guittot wrote: > > On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > > > > > On some platforms, the firmware may expose per-CPU performance > > > differences (e.g., via ACPI CPPC highest_perf) even when the system is > > > effectively symmetric. These small variations, typically due to silicon > > > binning, are reflected in arch_scale_cpu_capacity() and end up being > > > interpreted as real capacity asymmetry. > > > > > > As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, > > > triggering asymmetry-specific behaviors, even though all CPUs have > > > comparable performance. > > > > > > Prevent this by treating CPU capacities within 20% of the maximum value > > > > 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of > > 871 but we still want to keep them different > > > > Why would 5% not be enough? > > Sure, 5% seems a more reasonable margin. I'll just reuse capacity_greater() > as suggested by Christian. > > Thanks, > -Andrea > How about modifying asym_cpu_capacity_update_data to group all CPUs within 5% capacity difference into the same group? ``` +#define capacity_greater(cap1, cap2) ((cap1) * 1024 > (cap2) * 1078) list_for_each_entry(entry, &asym_cap_list, link) { - if (capacity == entry->capacity) + if (!capacity_greater(capacity, entry->capacity) && + !capacity_greater(entry->capacity, capacity)) ``` > > > > > > > > > as equivalent when building the asymmetry topology. This filters out > > > firmware noise, while preserving correct behavior on real heterogeneous > > > systems, where capacity differences are significantly larger. > > > > > > Reported-by: Felix Abecassis <fabecassis@nvidia.com> > > > Signed-off-by: Andrea Righi <arighi@nvidia.com> > > > --- > > > kernel/sched/topology.c | 19 ++++++++++++++++--- > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > > index 061f8c85f5552..fe71ea9f3bda7 100644 > > > --- a/kernel/sched/topology.c > > > +++ b/kernel/sched/topology.c > > > @@ -1432,9 +1432,8 @@ static void free_asym_cap_entry(struct rcu_head *head) > > > kfree(entry); > > > } > > > > > > -static inline void asym_cpu_capacity_update_data(int cpu) > > > +static inline void asym_cpu_capacity_update_data(int cpu, unsigned long capacity) > > > { > > > - unsigned long capacity = arch_scale_cpu_capacity(cpu); > > > struct asym_cap_data *insert_entry = NULL; > > > struct asym_cap_data *entry; > > > > > > @@ -1471,13 +1470,27 @@ static inline void asym_cpu_capacity_update_data(int cpu) > > > static void asym_cpu_capacity_scan(void) > > > { > > > struct asym_cap_data *entry, *next; > > > + unsigned long max_cap = 0; > > > + unsigned long capacity; > > > int cpu; > > > > > > list_for_each_entry(entry, &asym_cap_list, link) > > > cpumask_clear(cpu_capacity_span(entry)); > > > > > > for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) > > > - asym_cpu_capacity_update_data(cpu); > > > + max_cap = max(max_cap, arch_scale_cpu_capacity(cpu)); > > > + > > > + /* > > > + * Treat small capacity differences (< 20% max capacity) as noise, > > > + * to prevent enabling SD_ASYM_CPUCAPACITY when it's not really > > > + * needed. > > > + */ > > > + for_each_cpu_and(cpu, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) { > > > + capacity = arch_scale_cpu_capacity(cpu); > > > + if (capacity * 5 >= max_cap * 4) > > > + capacity = max_cap; > > > + asym_cpu_capacity_update_data(cpu, capacity); > > > + } > > > > > > list_for_each_entry_safe(entry, next, &asym_cap_list, link) { > > > if (cpumask_empty(cpu_capacity_span(entry))) { > > > -- > > > 2.53.0 > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise 2026-03-25 3:30 ` Koba Ko @ 2026-03-25 12:29 ` Andrea Righi 0 siblings, 0 replies; 17+ messages in thread From: Andrea Righi @ 2026-03-25 12:29 UTC (permalink / raw) To: Koba Ko Cc: Vincent Guittot, Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Christian Loehle, linux-kernel, Felix Abecassis Hi Koba, On Wed, Mar 25, 2026 at 11:30:48AM +0800, Koba Ko wrote: > On Tue, Mar 24, 2026 at 10:39:41AM +0100, Andrea Righi wrote: > > Hi Vincent, > > > > On Tue, Mar 24, 2026 at 08:39:34AM +0100, Vincent Guittot wrote: > > > On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote: > > > > > > > > On some platforms, the firmware may expose per-CPU performance > > > > differences (e.g., via ACPI CPPC highest_perf) even when the system is > > > > effectively symmetric. These small variations, typically due to silicon > > > > binning, are reflected in arch_scale_cpu_capacity() and end up being > > > > interpreted as real capacity asymmetry. > > > > > > > > As a result, the scheduler incorrectly enables SD_ASYM_CPUCAPACITY, > > > > triggering asymmetry-specific behaviors, even though all CPUs have > > > > comparable performance. > > > > > > > > Prevent this by treating CPU capacities within 20% of the maximum value > > > > > > 20% is a bit high, my snapdragon rb5 has a mid CPU with a capacity of > > > 871 but we still want to keep them different > > > > > > Why would 5% not be enough? > > > > Sure, 5% seems a more reasonable margin. I'll just reuse capacity_greater() > > as suggested by Christian. > > > > Thanks, > > -Andrea > > > > How about modifying asym_cpu_capacity_update_data to group all CPUs within 5% capacity difference into the same group? > ``` > +#define capacity_greater(cap1, cap2) ((cap1) * 1024 > (cap2) * 1078) > > list_for_each_entry(entry, &asym_cap_list, link) { > - if (capacity == entry->capacity) > + if (!capacity_greater(capacity, entry->capacity) && > + !capacity_greater(entry->capacity, capacity)) Yeah, makes sense, I like this better than mine. But there's still the concern of potentially regressing other systems, nullifying the small asym-capacity benefits (as Chris mentioned here: https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com). Thanks, -Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-03-25 16:50 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-24 0:55 [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise Andrea Righi 2026-03-24 7:39 ` Vincent Guittot 2026-03-24 7:55 ` Christian Loehle 2026-03-24 8:08 ` Christian Loehle 2026-03-24 9:46 ` Andrea Righi 2026-03-24 10:29 ` Dietmar Eggemann 2026-03-24 11:01 ` Andrea Righi 2026-03-25 9:23 ` Dietmar Eggemann 2026-03-25 9:32 ` Andrea Righi 2026-03-25 11:16 ` Dietmar Eggemann 2026-03-25 12:25 ` Andrea Righi 2026-03-25 15:26 ` Dietmar Eggemann 2026-03-25 16:50 ` Andrea Righi 2026-03-25 12:48 ` Phil Auld 2026-03-24 9:39 ` Andrea Righi 2026-03-25 3:30 ` Koba Ko 2026-03-25 12:29 ` Andrea Righi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox