linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
@ 2025-08-08  2:55 Huang Shijie
  2025-08-11 15:13 ` Jeremy Linton
  0 siblings, 1 reply; 18+ messages in thread
From: Huang Shijie @ 2025-08-08  2:55 UTC (permalink / raw)
  To: catalin.marinas, will
  Cc: patches, cl, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel,
	Huang Shijie, Christoph Lameter

In the server, if some workload which will create lot of
tasks, and will have many task migrations, we can get better
performance when we enable the CONFIG_SCHED_CLUSTER.

For example, the Specjbb may have better performance:
    Critical-jops : 26%
    Max-jops      : 7%

So enable it by default.

Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 58f87d09366c..054c96ea2235 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -82,6 +82,7 @@ CONFIG_ARCH_VISCONTI=y
 CONFIG_ARCH_XGENE=y
 CONFIG_ARCH_ZYNQMP=y
 CONFIG_SCHED_MC=y
+CONFIG_SCHED_CLUSTER=y
 CONFIG_SCHED_SMT=y
 CONFIG_NUMA=y
 CONFIG_XEN=y
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-08  2:55 [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER Huang Shijie
@ 2025-08-11 15:13 ` Jeremy Linton
  2025-08-12  3:05   ` Shijie Huang
  2025-08-12 16:33   ` Christoph Lameter (Ampere)
  0 siblings, 2 replies; 18+ messages in thread
From: Jeremy Linton @ 2025-08-11 15:13 UTC (permalink / raw)
  To: Huang Shijie, catalin.marinas, will
  Cc: patches, cl, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel,
	Christoph Lameter

Hi,

On 8/7/25 9:55 PM, Huang Shijie wrote:
> In the server, if some workload which will create lot of
> tasks, and will have many task migrations, we can get better
> performance when we enable the CONFIG_SCHED_CLUSTER.
> 
> For example, the Specjbb may have better performance:
>      Critical-jops : 26%
>      Max-jops      : 7%
> 
> So enable it by default.

 From what I've seen, SCHED_CLUSTER seems to be a bit of give and take 
depending on benchmark and machine. I'm not sure if it should be default 
enabled or not, but it would really be nice to have at least a larger 
sweep of benchmarks/machines in order to be sure of the decision.


Thanks,


> 
> Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
>   arch/arm64/configs/defconfig | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 58f87d09366c..054c96ea2235 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -82,6 +82,7 @@ CONFIG_ARCH_VISCONTI=y
>   CONFIG_ARCH_XGENE=y
>   CONFIG_ARCH_ZYNQMP=y
>   CONFIG_SCHED_MC=y
> +CONFIG_SCHED_CLUSTER=y
>   CONFIG_SCHED_SMT=y
>   CONFIG_NUMA=y
>   CONFIG_XEN=y


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-11 15:13 ` Jeremy Linton
@ 2025-08-12  3:05   ` Shijie Huang
  2025-08-12 16:33   ` Christoph Lameter (Ampere)
  1 sibling, 0 replies; 18+ messages in thread
From: Shijie Huang @ 2025-08-12  3:05 UTC (permalink / raw)
  To: Jeremy Linton, Huang Shijie, catalin.marinas, will
  Cc: patches, cl, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel,
	Christoph Lameter


On 11/08/2025 23:13, Jeremy Linton wrote:
> Hi,
>
> On 8/7/25 9:55 PM, Huang Shijie wrote:
>> In the server, if some workload which will create lot of
>> tasks, and will have many task migrations, we can get better
>> performance when we enable the CONFIG_SCHED_CLUSTER.
>>
>> For example, the Specjbb may have better performance:
>>      Critical-jops : 26%
>>      Max-jops      : 7%
>>
>> So enable it by default.
>
> From what I've seen, SCHED_CLUSTER seems to be a bit of give and take 
> depending on benchmark and machine. I'm not sure if it should be 
> default enabled or not, but it would really be nice to have at least a 
> larger sweep of benchmarks/machines in order to be sure of the decision.

Yes, I agree.

Maybe I should create a patch only enable SCHED_CLUSTER for Ampere's 
machine.


Thanks


>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-11 15:13 ` Jeremy Linton
  2025-08-12  3:05   ` Shijie Huang
@ 2025-08-12 16:33   ` Christoph Lameter (Ampere)
  2025-08-12 17:32     ` Jeremy Linton
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-12 16:33 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Huang Shijie, catalin.marinas, will, patches, Shubhang,
	krzysztof.kozlowski, bjorn.andersson, geert+renesas, arnd, nm,
	ebiggers, nfraprado, prabhakar.mahadev-lad.rj, linux-arm-kernel,
	linux-kernel

On Mon, 11 Aug 2025, Jeremy Linton wrote:

> From what I've seen, SCHED_CLUSTER seems to be a bit of give and take
> depending on benchmark and machine. I'm not sure if it should be default
> enabled or not, but it would really be nice to have at least a larger sweep of
> benchmarks/machines in order to be sure of the decision.

If the hardware provides a clusterid then I think this clusterid should be
used for the sched domains. CONFIG_SCHED_CLUSTER does that. So it should
be the default.

If there is no cluster information then these domains should not be
created. I think that is already the case?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-12 16:33   ` Christoph Lameter (Ampere)
@ 2025-08-12 17:32     ` Jeremy Linton
  2025-08-13  9:28       ` Sudeep Holla
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Linton @ 2025-08-12 17:32 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Huang Shijie, catalin.marinas, will, patches, Shubhang,
	krzysztof.kozlowski, bjorn.andersson, geert+renesas, arnd, nm,
	ebiggers, nfraprado, prabhakar.mahadev-lad.rj, linux-arm-kernel,
	linux-kernel

On 8/12/25 11:33 AM, Christoph Lameter (Ampere) wrote:
> On Mon, 11 Aug 2025, Jeremy Linton wrote:
> 
>>  From what I've seen, SCHED_CLUSTER seems to be a bit of give and take
>> depending on benchmark and machine. I'm not sure if it should be default
>> enabled or not, but it would really be nice to have at least a larger sweep of
>> benchmarks/machines in order to be sure of the decision.
> 
> If the hardware provides a clusterid then I think this clusterid should be
> used for the sched domains. CONFIG_SCHED_CLUSTER does that. So it should
> be the default.

Hi,

The problem is that this information is being sourced from the ACPI 
PPTT. The ACPI specification (AFAIK) doesn't define a cluster, so the 
linux cluster information is being 'invented' based on however the 
firmware vendor choose to group CPU nodes in the PPTT. Which means its 
possible for them to unknowingly create clusters, or also fail to create 
them when they make sense.

> 
> If there is no cluster information then these domains should not be
> created. I think that is already the case?
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-12 17:32     ` Jeremy Linton
@ 2025-08-13  9:28       ` Sudeep Holla
  2025-08-13 15:55         ` Christoph Lameter (Ampere)
  0 siblings, 1 reply; 18+ messages in thread
From: Sudeep Holla @ 2025-08-13  9:28 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Christoph Lameter (Ampere), Sudeep Holla, Huang Shijie,
	catalin.marinas, will, patches, Shubhang, krzysztof.kozlowski,
	bjorn.andersson, geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Tue, Aug 12, 2025 at 12:32:36PM -0500, Jeremy Linton wrote:
> On 8/12/25 11:33 AM, Christoph Lameter (Ampere) wrote:
> > On Mon, 11 Aug 2025, Jeremy Linton wrote:
> > 
> > >  From what I've seen, SCHED_CLUSTER seems to be a bit of give and take
> > > depending on benchmark and machine. I'm not sure if it should be default
> > > enabled or not, but it would really be nice to have at least a larger sweep of
> > > benchmarks/machines in order to be sure of the decision.
> > 
> > If the hardware provides a clusterid then I think this clusterid should be
> > used for the sched domains. CONFIG_SCHED_CLUSTER does that. So it should
> > be the default.
> 
> Hi,
> 
> The problem is that this information is being sourced from the ACPI PPTT.
> The ACPI specification (AFAIK) doesn't define a cluster, so the linux
> cluster information is being 'invented' based on however the firmware vendor
> choose to group CPU nodes in the PPTT. Which means its possible for them to
> unknowingly create clusters, or also fail to create them when they make
> sense.

+1, completely agree. As Jeremy mentioned, it is hit or miss and cluster
is loosely defined and IIRC Huawei pushed this based on their platform at
the time and it did break some benchmarks on few other platforms. So it
is not a good idea to make it default config IMO.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-13  9:28       ` Sudeep Holla
@ 2025-08-13 15:55         ` Christoph Lameter (Ampere)
  2025-08-13 22:56           ` Christoph Lameter (Ampere)
  2025-08-14 10:07           ` Sudeep Holla
  0 siblings, 2 replies; 18+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-13 15:55 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Jeremy Linton, Huang Shijie, catalin.marinas, will, patches,
	Shubhang, krzysztof.kozlowski, bjorn.andersson, geert+renesas,
	arnd, nm, ebiggers, nfraprado, prabhakar.mahadev-lad.rj,
	linux-arm-kernel, linux-kernel

On Wed, 13 Aug 2025, Sudeep Holla wrote:

> > The problem is that this information is being sourced from the ACPI PPTT.
> > The ACPI specification (AFAIK) doesn't define a cluster, so the linux
> > cluster information is being 'invented' based on however the firmware vendor
> > choose to group CPU nodes in the PPTT. Which means its possible for them to
> > unknowingly create clusters, or also fail to create them when they make
> > sense.
>
> +1, completely agree. As Jeremy mentioned, it is hit or miss and cluster
> is loosely defined and IIRC Huawei pushed this based on their platform at
> the time and it did break some benchmarks on few other platforms. So it
> is not a good idea to make it default config IMO.

Can we figure out which platforms benchmarks were affected and why?

It seems the notion of a "cluster" on ARM64 is derived (I guess a better
word than "invented" hehe)  from sibling information instead of PPTT. But
using that information should work fine right?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-13 15:55         ` Christoph Lameter (Ampere)
@ 2025-08-13 22:56           ` Christoph Lameter (Ampere)
  2025-08-14 10:03             ` Sudeep Holla
  2025-08-14 10:07           ` Sudeep Holla
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-13 22:56 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Jeremy Linton, Huang Shijie, catalin.marinas, will, patches,
	Shubhang, krzysztof.kozlowski, bjorn.andersson, geert+renesas,
	arnd, nm, ebiggers, nfraprado, prabhakar.mahadev-lad.rj,
	linux-arm-kernel, linux-kernel

On Wed, 13 Aug 2025, Christoph Lameter (Ampere) wrote:

> Can we figure out which platforms benchmarks were affected and why?
>
> It seems the notion of a "cluster" on ARM64 is derived (I guess a better
> word than "invented" hehe)  from sibling information instead of PPTT. But
> using that information should work fine right?

Sorry no that is not correct. The cluster information is correctly read
from the ACPI tables and the cluster ids are avaialble in

	/sys/devices/system/cpu/cpuXX/topology/cluster_id

if CONFIG_SCHED_CLUSTER is enabled.

If there is an issue then it is a problem with the vendor firmware
providing cluster id configurations via ACPI that cause regressions.

We could add a blacklist for those platforms to avoid regressions but we
should not allow that to hinder us to enable full support for clustering
on ARM64.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-13 22:56           ` Christoph Lameter (Ampere)
@ 2025-08-14 10:03             ` Sudeep Holla
  2025-08-14 16:30               ` Christoph Lameter (Ampere)
  0 siblings, 1 reply; 18+ messages in thread
From: Sudeep Holla @ 2025-08-14 10:03 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Jeremy Linton, Sudeep Holla, Huang Shijie, catalin.marinas, will,
	patches, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Wed, Aug 13, 2025 at 03:56:47PM -0700, Christoph Lameter (Ampere) wrote:
> On Wed, 13 Aug 2025, Christoph Lameter (Ampere) wrote:
> 
> > Can we figure out which platforms benchmarks were affected and why?
> >
> > It seems the notion of a "cluster" on ARM64 is derived (I guess a better
> > word than "invented" hehe)  from sibling information instead of PPTT. But
> > using that information should work fine right?
> 
> Sorry no that is not correct. The cluster information is correctly read
> from the ACPI tables and the cluster ids are avaialble in
> 
> 	/sys/devices/system/cpu/cpuXX/topology/cluster_id
> 

Agreed, the parts of ACPI specification has added notion of cluster sprinkle
across various chapters(mostly added by Arm in the earlier days though the
Arm architecture specification itself doesn't have any standard definition
for the cluster). Also note it nicely adds a disclaimer:

  |  Different architectures use different terminology to denominate logically
  |  associated processors, but terms such as package, cluster, module, and
  |  socket are typical examples.

So how can one use these across architectures ? Package/Socket is quite
standard. Cluster can be group of processors or it can also be group of
processor clusters. One of the Arm vendors call it super cluster or something.
All these makes it super hard for a generic OS to interpret that information.
Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
realised doesn't match with some other notion of it.

We can enable it and I am sure someone will report a regression on their
platform and we need to disable it again. The benchmark doesn't purely
depend on just the "notion" of cluster but it is often related to the
private resource and how they are shared in the system. So even if you
strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
it will fail on systems where the private resources are shared across the
"cluster" boundaries or some variant configuration.

> if CONFIG_SCHED_CLUSTER is enabled.
> 
> If there is an issue then it is a problem with the vendor firmware
> providing cluster id configurations via ACPI that cause regressions.
> 

As mentioned, it is not strictly just the cluster id but other shared
resources that contribute to the issues/regressions.

> We could add a blacklist for those platforms to avoid regressions but we
> should not allow that to hinder us to enable full support for clustering
> on ARM64.
> 

Sure, but we need to improve the "cluster" definition in the ACPI and Arm
specification, get an agreement on what it means for other architecture
first IMO. We don't want to revisit the same topic again without these as
IIRC this is the second time we are discussion around this topic.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-13 15:55         ` Christoph Lameter (Ampere)
  2025-08-13 22:56           ` Christoph Lameter (Ampere)
@ 2025-08-14 10:07           ` Sudeep Holla
  1 sibling, 0 replies; 18+ messages in thread
From: Sudeep Holla @ 2025-08-14 10:07 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Jeremy Linton, Huang Shijie, Sudeep Holla, catalin.marinas, will,
	patches, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Wed, Aug 13, 2025 at 08:55:36AM -0700, Christoph Lameter (Ampere) wrote:
> On Wed, 13 Aug 2025, Sudeep Holla wrote:
> 
> > > The problem is that this information is being sourced from the ACPI PPTT.
> > > The ACPI specification (AFAIK) doesn't define a cluster, so the linux
> > > cluster information is being 'invented' based on however the firmware vendor
> > > choose to group CPU nodes in the PPTT. Which means its possible for them to
> > > unknowingly create clusters, or also fail to create them when they make
> > > sense.
> >
> > +1, completely agree. As Jeremy mentioned, it is hit or miss and cluster
> > is loosely defined and IIRC Huawei pushed this based on their platform at
> > the time and it did break some benchmarks on few other platforms. So it
> > is not a good idea to make it default config IMO.
> 
> Can we figure out which platforms benchmarks were affected and why?
> 

I am not sure on either. One way to figure out the affected platforms is
to merge this change and expect the platform users/maintainers to report.

> It seems the notion of a "cluster" on ARM64 is derived (I guess a better
> word than "invented" hehe)  from sibling information instead of PPTT. But
> using that information should work fine right?
> 

I have my doubts but I may be wrong. As mentioned in the other email in this
thread, "cluster" IMO is ill-defined both in ACPI and Arm architecture which
is the root cause for all the issue around it.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-14 10:03             ` Sudeep Holla
@ 2025-08-14 16:30               ` Christoph Lameter (Ampere)
  2025-08-15 10:48                 ` Sudeep Holla
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-14 16:30 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Jeremy Linton, Huang Shijie, catalin.marinas, will, patches,
	Shubhang, krzysztof.kozlowski, bjorn.andersson, geert+renesas,
	arnd, nm, ebiggers, nfraprado, prabhakar.mahadev-lad.rj,
	linux-arm-kernel, linux-kernel

On Thu, 14 Aug 2025, Sudeep Holla wrote:

>   |  Different architectures use different terminology to denominate logically
>   |  associated processors, but terms such as package, cluster, module, and
>   |  socket are typical examples.
>
> So how can one use these across architectures ? Package/Socket is quite
> standard. Cluster can be group of processors or it can also be group of
> processor clusters. One of the Arm vendors call it super cluster or something.
> All these makes it super hard for a generic OS to interpret that information.
> Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
> realised doesn't match with some other notion of it.

What the cluster actually is used for is up to the hardware. The linux
scheduler provides this functionality. How and when this feature is used
by firmware is a vendor issue. There was never a clear definition.

> We can enable it and I am sure someone will report a regression on their
> platform and we need to disable it again. The benchmark doesn't purely
> depend on just the "notion" of cluster but it is often related to the
> private resource and how they are shared in the system. So even if you
> strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
> it will fail on systems where the private resources are shared across the
> "cluster" boundaries or some variant configuration.

That is not our problem. If the vendor provides clustering information and
the scheduler uses that then the vendor can modify the firmware to not
enable clustering.

As mentioned before: We could create a blacklist to override the ACPI info
from the vendor to ensure that clustering is off.

What we should not do is disabling clustering for all.


> > We could add a blacklist for those platforms to avoid regressions but we
> > should not allow that to hinder us to enable full support for clustering
> > on ARM64.
> >
>
> Sure, but we need to improve the "cluster" definition in the ACPI and Arm
> specification, get an agreement on what it means for other architecture
> first IMO. We don't want to revisit the same topic again without these as
> IIRC this is the second time we are discussion around this topic.

The vendors need flexibility to use this feature when it makes sense.

Having a clear definition would limit the use of clustering feature and
limits innovation. Vendors can control the clustering via ACPI and the
firmware they provide with their system.

We could change definition but that but that would be a decadelong
process which will encounter resistance from vendors that make uses of the
clustering feature that does not fall into the stricter definition.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-14 16:30               ` Christoph Lameter (Ampere)
@ 2025-08-15 10:48                 ` Sudeep Holla
  2025-08-15 16:46                   ` Jeremy Linton
  0 siblings, 1 reply; 18+ messages in thread
From: Sudeep Holla @ 2025-08-15 10:48 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Jeremy Linton, Huang Shijie, Sudeep Holla, catalin.marinas, will,
	patches, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Thu, Aug 14, 2025 at 09:30:06AM -0700, Christoph Lameter (Ampere) wrote:
> On Thu, 14 Aug 2025, Sudeep Holla wrote:
> 
> >   |  Different architectures use different terminology to denominate logically
> >   |  associated processors, but terms such as package, cluster, module, and
> >   |  socket are typical examples.
> >
> > So how can one use these across architectures ? Package/Socket is quite
> > standard. Cluster can be group of processors or it can also be group of
> > processor clusters. One of the Arm vendors call it super cluster or something.
> > All these makes it super hard for a generic OS to interpret that information.
> > Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
> > realised doesn't match with some other notion of it.
> 
> What the cluster actually is used for is up to the hardware. The linux
> scheduler provides this functionality. How and when this feature is used
> by firmware is a vendor issue. There was never a clear definition.
> 

Sure, since it is left to architecture to define what it means, it could
work. But what happens if we have multiple chiplet inside a socket and
each chiplet has multiple cluster. Do you envision using this SCHED_CLUSTER
at chiplet level if that works best on the platform ?

That could work, but we need to document all these with the best of our
knowledge now so that it is easy to revisit in the future.

> > We can enable it and I am sure someone will report a regression on their
> > platform and we need to disable it again. The benchmark doesn't purely
> > depend on just the "notion" of cluster but it is often related to the
> > private resource and how they are shared in the system. So even if you
> > strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
> > it will fail on systems where the private resources are shared across the
> > "cluster" boundaries or some variant configuration.
> 
> That is not our problem. If the vendor provides clustering information and
> the scheduler uses that then the vendor can modify the firmware to not
> enable clustering.
> 

That is pure wrong. ACPI is describing the hardware. Deciding to put
clustering information in these tables only if it provides performance or
not hinder performance seem complete non-sense to me. That covering policy
in ACPI hardware description. Does ACPI spec mention anything about it ?
I mean remove some hardware description even if it is 100% accurate if it
hinders performance on one of the OSPM ? Doesn't sound correct at all.

> As mentioned before: We could create a blacklist to override the ACPI info
> from the vendor to ensure that clustering is off.
> 

Not a bad idea. We can see if allow or blocklist works as we start with one.

> What we should not do is disabling clustering for all.
> 

Not completely against it but I have concerns on how all these scale with
multiple chiplets within a socket or any such variants.

> > > We could add a blacklist for those platforms to avoid regressions but we
> > > should not allow that to hinder us to enable full support for clustering
> > > on ARM64.
> > >
> >
> > Sure, but we need to improve the "cluster" definition in the ACPI and Arm
> > specification, get an agreement on what it means for other architecture
> > first IMO. We don't want to revisit the same topic again without these as
> > IIRC this is the second time we are discussion around this topic.
> 
> The vendors need flexibility to use this feature when it makes sense.
> 

Sure, but too much flexibility might also hinder future changes when adding
some other feature(chiplet again is one thing I can think of now)

> Having a clear definition would limit the use of clustering feature and
> limits innovation. Vendors can control the clustering via ACPI and the
> firmware they provide with their system.
> 

Not sure if that should be right direction TBH, but again not against the
idea of enabling the feature on some platforms if we are going to enable it
by default.

> We could change definition but that but that would be a decadelong
> process which will encounter resistance from vendors that make uses of the
> clustering feature that does not fall into the stricter definition.
> 

I understand and get the point, but decadelong is bit of an exaggeration 😉.
Not discussing these in ACPI or similar forum is not a good idea as we know
there are new h/w features that are being added and current specification
may not provide ways to express all of those.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-15 10:48                 ` Sudeep Holla
@ 2025-08-15 16:46                   ` Jeremy Linton
  2025-08-18  9:33                     ` Sudeep Holla
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Linton @ 2025-08-15 16:46 UTC (permalink / raw)
  To: Sudeep Holla, Christoph Lameter (Ampere)
  Cc: Huang Shijie, catalin.marinas, will, patches, Shubhang,
	krzysztof.kozlowski, bjorn.andersson, geert+renesas, arnd, nm,
	ebiggers, nfraprado, prabhakar.mahadev-lad.rj, linux-arm-kernel,
	linux-kernel

Hi,


On 8/15/25 5:48 AM, Sudeep Holla wrote:
> On Thu, Aug 14, 2025 at 09:30:06AM -0700, Christoph Lameter (Ampere) wrote:
>> On Thu, 14 Aug 2025, Sudeep Holla wrote:
>>
>>>    |  Different architectures use different terminology to denominate logically
>>>    |  associated processors, but terms such as package, cluster, module, and
>>>    |  socket are typical examples.
>>>
>>> So how can one use these across architectures ? Package/Socket is quite
>>> standard. Cluster can be group of processors or it can also be group of
>>> processor clusters. One of the Arm vendors call it super cluster or something.
>>> All these makes it super hard for a generic OS to interpret that information.
>>> Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
>>> realised doesn't match with some other notion of it.
>>
>> What the cluster actually is used for is up to the hardware. The linux
>> scheduler provides this functionality. How and when this feature is used
>> by firmware is a vendor issue. There was never a clear definition.
>>
> 
> Sure, since it is left to architecture to define what it means, it could
> work. But what happens if we have multiple chiplet inside a socket and
> each chiplet has multiple cluster. Do you envision using this SCHED_CLUSTER
> at chiplet level if that works best on the platform ?
> 
> That could work, but we need to document all these with the best of our
> knowledge now so that it is easy to revisit in the future.
> 
>>> We can enable it and I am sure someone will report a regression on their
>>> platform and we need to disable it again. The benchmark doesn't purely
>>> depend on just the "notion" of cluster but it is often related to the
>>> private resource and how they are shared in the system. So even if you
>>> strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
>>> it will fail on systems where the private resources are shared across the
>>> "cluster" boundaries or some variant configuration.
>>
>> That is not our problem. If the vendor provides clustering information and
>> the scheduler uses that then the vendor can modify the firmware to not
>> enable clustering.
>>
> 
> That is pure wrong. ACPI is describing the hardware. Deciding to put
> clustering information in these tables only if it provides performance or
> not hinder performance seem complete non-sense to me. That covering policy
> in ACPI hardware description. Does ACPI spec mention anything about it ?
> I mean remove some hardware description even if it is 100% accurate if it
> hinders performance on one of the OSPM ? Doesn't sound correct at all.
> 
>> As mentioned before: We could create a blacklist to override the ACPI info
>> from the vendor to ensure that clustering is off.
>>
> 
> Not a bad idea. We can see if allow or blocklist works as we start with one.

 From a distro perspective it makes more sense to me to change it from a 
compile time option to a runtime kernel command line option with the 
default on/off set by this SCHED_CLUSTER flag rather than try to 
maintain a blocklist.


I agree the firmware needs a much clearer way to signal that these nodes 
represent something other than just side effects of the way the table is 
built. If the working group is hesitant to declare additional 
topological flags, maybe this idea of deriving additional topological 
information from nodes without caches is a reasonable spec 
clarification. That way some future 
NODE_IS_A_CLUSTER/DSU/CHIPLET/SUPERCLUSTER/RING/SLICE/WHATEVER doesn't 
turn the existing code into technical debt.

But returning to the original point, its not clear to me that the HW 
'cluster' information is really causing the performance boost vs, just 
having a medium size scheduling domain (aka just picking an arbitrary 
size 4-16 cores) under MC, or simply 'slicing' a L3 in the PPTT such 
that the MC domains are smaller, yields the same effect. I've seen a 
number of cases where 'lying' about the topology yields a better result 
in a benchmark. This is largely what is happening with these Firmware 
toggles that move/remove the NUMA domains too. Being able to manually 
reconfigure some of these scheduling levels at runtime might be useful...




> 
>> What we should not do is disabling clustering for all.
>>
> 
> Not completely against it but I have concerns on how all these scale with
> multiple chiplets within a socket or any such variants.
> 
>>>> We could add a blacklist for those platforms to avoid regressions but we
>>>> should not allow that to hinder us to enable full support for clustering
>>>> on ARM64.
>>>>
>>>
>>> Sure, but we need to improve the "cluster" definition in the ACPI and Arm
>>> specification, get an agreement on what it means for other architecture
>>> first IMO. We don't want to revisit the same topic again without these as
>>> IIRC this is the second time we are discussion around this topic.
>>
>> The vendors need flexibility to use this feature when it makes sense.
>>
> 
> Sure, but too much flexibility might also hinder future changes when adding
> some other feature(chiplet again is one thing I can think of now)
> 
>> Having a clear definition would limit the use of clustering feature and
>> limits innovation. Vendors can control the clustering via ACPI and the
>> firmware they provide with their system.
>>
> 
> Not sure if that should be right direction TBH, but again not against the
> idea of enabling the feature on some platforms if we are going to enable it
> by default.
> 
>> We could change definition but that but that would be a decadelong
>> process which will encounter resistance from vendors that make uses of the
>> clustering feature that does not fall into the stricter definition.
>>
> 
> I understand and get the point, but decadelong is bit of an exaggeration 😉.
> Not discussing these in ACPI or similar forum is not a good idea as we know
> there are new h/w features that are being added and current specification
> may not provide ways to express all of those.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-15 16:46                   ` Jeremy Linton
@ 2025-08-18  9:33                     ` Sudeep Holla
  2025-08-20 23:44                       ` Christoph Lameter (Ampere)
  2025-08-27  2:33                       ` Shijie Huang
  0 siblings, 2 replies; 18+ messages in thread
From: Sudeep Holla @ 2025-08-18  9:33 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Christoph Lameter (Ampere), Huang Shijie, Sudeep Holla,
	catalin.marinas, will, patches, Shubhang, krzysztof.kozlowski,
	bjorn.andersson, geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Fri, Aug 15, 2025 at 11:46:35AM -0500, Jeremy Linton wrote:
> Hi,
> 
> 
> On 8/15/25 5:48 AM, Sudeep Holla wrote:
> > On Thu, Aug 14, 2025 at 09:30:06AM -0700, Christoph Lameter (Ampere) wrote:
> > > On Thu, 14 Aug 2025, Sudeep Holla wrote:
> > > 
> > > >    |  Different architectures use different terminology to denominate logically
> > > >    |  associated processors, but terms such as package, cluster, module, and
> > > >    |  socket are typical examples.
> > > > 
> > > > So how can one use these across architectures ? Package/Socket is quite
> > > > standard. Cluster can be group of processors or it can also be group of
> > > > processor clusters. One of the Arm vendors call it super cluster or something.
> > > > All these makes it super hard for a generic OS to interpret that information.
> > > > Just CONFIG_SCHED_CLUSTER was added with one notion of cluster which was soon
> > > > realised doesn't match with some other notion of it.
> > > 
> > > What the cluster actually is used for is up to the hardware. The linux
> > > scheduler provides this functionality. How and when this feature is used
> > > by firmware is a vendor issue. There was never a clear definition.
> > > 
> > 
> > Sure, since it is left to architecture to define what it means, it could
> > work. But what happens if we have multiple chiplet inside a socket and
> > each chiplet has multiple cluster. Do you envision using this SCHED_CLUSTER
> > at chiplet level if that works best on the platform ?
> > 
> > That could work, but we need to document all these with the best of our
> > knowledge now so that it is easy to revisit in the future.
> > 
> > > > We can enable it and I am sure someone will report a regression on their
> > > > platform and we need to disable it again. The benchmark doesn't purely
> > > > depend on just the "notion" of cluster but it is often related to the
> > > > private resource and how they are shared in the system. So even if you
> > > > strictly follow the notion of cluster as supported by CONFIG_SCHED_CLUSTER
> > > > it will fail on systems where the private resources are shared across the
> > > > "cluster" boundaries or some variant configuration.
> > > 
> > > That is not our problem. If the vendor provides clustering information and
> > > the scheduler uses that then the vendor can modify the firmware to not
> > > enable clustering.
> > > 
> > 
> > That is pure wrong. ACPI is describing the hardware. Deciding to put
> > clustering information in these tables only if it provides performance or
> > not hinder performance seem complete non-sense to me. That covering policy
> > in ACPI hardware description. Does ACPI spec mention anything about it ?
> > I mean remove some hardware description even if it is 100% accurate if it
> > hinders performance on one of the OSPM ? Doesn't sound correct at all.
> > 
> > > As mentioned before: We could create a blacklist to override the ACPI info
> > > from the vendor to ensure that clustering is off.
> > > 
> > 
> > Not a bad idea. We can see if allow or blocklist works as we start with one.
> 
> From a distro perspective it makes more sense to me to change it from a
> compile time option to a runtime kernel command line option with the default
> on/off set by this SCHED_CLUSTER flag rather than try to maintain a
> blocklist.
>

Right, that makes complete sense to me.

> 
> I agree the firmware needs a much clearer way to signal that these nodes
> represent something other than just side effects of the way the table is
> built. If the working group is hesitant to declare additional topological
> flags, maybe this idea of deriving additional topological information from
> nodes without caches is a reasonable spec clarification. That way some
> future NODE_IS_A_CLUSTER/DSU/CHIPLET/SUPERCLUSTER/RING/SLICE/WHATEVER
> doesn't turn the existing code into technical debt.
> 

100% agreed.

> But returning to the original point, its not clear to me that the HW
> 'cluster' information is really causing the performance boost vs, just
> having a medium size scheduling domain (aka just picking an arbitrary size
> 4-16 cores) under MC, or simply 'slicing' a L3 in the PPTT such that the MC
> domains are smaller, yields the same effect. I've seen a number of cases
> where 'lying' about the topology yields a better result in a benchmark. This
> is largely what is happening with these Firmware toggles that move/remove
> the NUMA domains too. Being able to manually reconfigure some of these
> scheduling levels at runtime might be useful...
> 

I share your concern and hence completely again representation of any fake
data in the ACPI topology just to get improved performance. Yes we have seen
that in the past.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-18  9:33                     ` Sudeep Holla
@ 2025-08-20 23:44                       ` Christoph Lameter (Ampere)
  2025-08-21 12:11                         ` Sudeep Holla
  2025-08-27  2:33                       ` Shijie Huang
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-08-20 23:44 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Jeremy Linton, Huang Shijie, catalin.marinas, will, patches,
	Shubhang, krzysztof.kozlowski, bjorn.andersson, geert+renesas,
	arnd, nm, ebiggers, nfraprado, prabhakar.mahadev-lad.rj,
	linux-arm-kernel, linux-kernel

On Mon, 18 Aug 2025, Sudeep Holla wrote:

> > But returning to the original point, its not clear to me that the HW
> > 'cluster' information is really causing the performance boost vs, just
> > having a medium size scheduling domain (aka just picking an arbitrary size
> > 4-16 cores) under MC, or simply 'slicing' a L3 in the PPTT such that the MC
> > domains are smaller, yields the same effect. I've seen a number of cases
> > where 'lying' about the topology yields a better result in a benchmark. This
> > is largely what is happening with these Firmware toggles that move/remove
> > the NUMA domains too. Being able to manually reconfigure some of these
> > scheduling levels at runtime might be useful...
> >
>
> I share your concern and hence completely again representation of any fake
> data in the ACPI topology just to get improved performance. Yes we have seen
> that in the past.

Depends on what you call fake. There are microarchitectural issues
regarding the proximity of the processors that the customer may not know
about and therefore the data provides by the vendor may be considered
"fake". Certainly that is not the case for our processors.

This is a common feature and widely available on other platforms.
There is no need to do anything but enable the functionality. Having a
special version of ACPI for arm64 or a special handling for arm64 does not
make sense.

The ACPI subsystem provides the ability to add blacklists. If a vendor has
problems with their firmward providing data that reduces the performance
and is unable to fix it othereise then the vendor can use that feature to
disable these ACPI features for their platform by submitting a patch.

Please make arm64 work like the other Linux platforms and do not introduce
special handling for ARM64.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-20 23:44                       ` Christoph Lameter (Ampere)
@ 2025-08-21 12:11                         ` Sudeep Holla
  0 siblings, 0 replies; 18+ messages in thread
From: Sudeep Holla @ 2025-08-21 12:11 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Jeremy Linton, Huang Shijie, Sudeep Holla, catalin.marinas, will,
	patches, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Wed, Aug 20, 2025 at 04:44:29PM -0700, Christoph Lameter (Ampere) wrote:
> On Mon, 18 Aug 2025, Sudeep Holla wrote:
> 
> > > But returning to the original point, its not clear to me that the HW
> > > 'cluster' information is really causing the performance boost vs, just
> > > having a medium size scheduling domain (aka just picking an arbitrary size
> > > 4-16 cores) under MC, or simply 'slicing' a L3 in the PPTT such that the MC
> > > domains are smaller, yields the same effect. I've seen a number of cases
> > > where 'lying' about the topology yields a better result in a benchmark. This
> > > is largely what is happening with these Firmware toggles that move/remove
> > > the NUMA domains too. Being able to manually reconfigure some of these
> > > scheduling levels at runtime might be useful...
> > >
> >
> > I share your concern and hence completely again representation of any fake
> > data in the ACPI topology just to get improved performance. Yes we have seen
> > that in the past.
> 
> Depends on what you call fake. There are microarchitectural issues
> regarding the proximity of the processors that the customer may not know
> about and therefore the data provides by the vendor may be considered
> "fake". Certainly that is not the case for our processors.
> 

Fair enough. We have seen systems with fake data as those data seem to
accidentally improve performance.

> This is a common feature and widely available on other platforms.
> There is no need to do anything but enable the functionality. Having a
> special version of ACPI for arm64 or a special handling for arm64 does not
> make sense.
> 

I am not suggesting to make it any special on Arm64, we would never want
that unless absolutely needed and this is not one such thing.

> The ACPI subsystem provides the ability to add blacklists. If a vendor has
> problems with their firmward providing data that reduces the performance
> and is unable to fix it othereise then the vendor can use that feature to
> disable these ACPI features for their platform by submitting a patch.
> 

I don't have a strong opinion on that approach. IIUC, maintaining list
needs change in the kernel which Jeremy mentioned is not what distro
prefer over command line parameter. We can always have a parameter to
disable the feature and keep the build config enabled as in this patch.
So only platforms that have performance issue by enabling this will have
to add the command line parameter(hopefully that works for all)

> Please make arm64 work like the other Linux platforms and do not introduce
> special handling for ARM64.

Again I am not suggesting that and we never want that.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-18  9:33                     ` Sudeep Holla
  2025-08-20 23:44                       ` Christoph Lameter (Ampere)
@ 2025-08-27  2:33                       ` Shijie Huang
  2025-08-27 10:19                         ` Sudeep Holla
  1 sibling, 1 reply; 18+ messages in thread
From: Shijie Huang @ 2025-08-27  2:33 UTC (permalink / raw)
  To: Sudeep Holla, Jeremy Linton
  Cc: Christoph Lameter (Ampere), Huang Shijie, catalin.marinas, will,
	patches, Shubhang, krzysztof.kozlowski, bjorn.andersson,
	geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel


On 18/08/2025 17:33, Sudeep Holla wrote:
>>  From a distro perspective it makes more sense to me to change it from a
>> compile time option to a runtime kernel command line option with the default
>> on/off set by this SCHED_CLUSTER flag rather than try to maintain a
>> blocklist.
>>
> Right, that makes complete sense to me.

Anyway, Peter is also try to make the SCHED_CLUSTER as default for arm64 
platform, please see the email link:

https://patchew.org/linux/20250826041319.1284-1-kprateek.nayak@amd.com/20250826041319.1284-5-kprateek.nayak@amd.com/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER
  2025-08-27  2:33                       ` Shijie Huang
@ 2025-08-27 10:19                         ` Sudeep Holla
  0 siblings, 0 replies; 18+ messages in thread
From: Sudeep Holla @ 2025-08-27 10:19 UTC (permalink / raw)
  To: Shijie Huang
  Cc: Jeremy Linton, Christoph Lameter (Ampere), Huang Shijie,
	catalin.marinas, will, patches, Shubhang, krzysztof.kozlowski,
	bjorn.andersson, geert+renesas, arnd, nm, ebiggers, nfraprado,
	prabhakar.mahadev-lad.rj, linux-arm-kernel, linux-kernel

On Wed, Aug 27, 2025 at 10:33:17AM +0800, Shijie Huang wrote:
> 
> On 18/08/2025 17:33, Sudeep Holla wrote:
> > >  From a distro perspective it makes more sense to me to change it from a
> > > compile time option to a runtime kernel command line option with the default
> > > on/off set by this SCHED_CLUSTER flag rather than try to maintain a
> > > blocklist.
> > > 
> > Right, that makes complete sense to me.
> 
> Anyway, Peter is also try to make the SCHED_CLUSTER as default for arm64
> platform, please see the email link:
> 
> https://patchew.org/linux/20250826041319.1284-1-kprateek.nayak@amd.com/20250826041319.1284-5-kprateek.nayak@amd.com/
> 

Yes, I was cc-ed and I am following the discussions.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-08-27 10:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08  2:55 [PATCH] arm64: defconfig: enable CONFIG_SCHED_CLUSTER Huang Shijie
2025-08-11 15:13 ` Jeremy Linton
2025-08-12  3:05   ` Shijie Huang
2025-08-12 16:33   ` Christoph Lameter (Ampere)
2025-08-12 17:32     ` Jeremy Linton
2025-08-13  9:28       ` Sudeep Holla
2025-08-13 15:55         ` Christoph Lameter (Ampere)
2025-08-13 22:56           ` Christoph Lameter (Ampere)
2025-08-14 10:03             ` Sudeep Holla
2025-08-14 16:30               ` Christoph Lameter (Ampere)
2025-08-15 10:48                 ` Sudeep Holla
2025-08-15 16:46                   ` Jeremy Linton
2025-08-18  9:33                     ` Sudeep Holla
2025-08-20 23:44                       ` Christoph Lameter (Ampere)
2025-08-21 12:11                         ` Sudeep Holla
2025-08-27  2:33                       ` Shijie Huang
2025-08-27 10:19                         ` Sudeep Holla
2025-08-14 10:07           ` Sudeep Holla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).