linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data
@ 2025-06-13 13:03 James Morse
  2025-06-13 13:03 ` [PATCH 1/5] " James Morse
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: James Morse @ 2025-06-13 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki, sudeep.holla, Rob Herring,
	Ben Horgan, James Morse

This series adds support for cache-ids to device-tree systems.
These values are exposed to user-space via
/sys/devices/system/cpu/cpuX/cache/indexY/id
and are used to identify caches and their associated CPUs by kernel interfaces
such as resctrl.

Resctrl anticipates cache-ids are unique for a given cache level, but may
be sparse. See Documentation/filesystems/resctrl.rst's "Cache IDs" section.

Another user is PCIe's cache-steering hints, where an id provided by the
hardware would be needed. Today this expects a platform specific ACPI hook
the program that value into the PCIe root port registers. If DT platforms
are ever supported, it will likely need a kernel driver to convert the
user-space cache-id to whatever hardware value is needed.

Rob H previously preferred to generate a cache-id from the information DT
already has. (Rob: does the PCIe cache-steering use-case change this?)

This series generates a 32bit cache-id from the lowest CPU hardware id
of the CPU's associated with that cache. On arm64, CPU hardware ids may
be greater than 32bits, but can be swizzled into 32bits. An architecture
hook is provided to allow the architecture to swizzle the values into 32bits.

Finally, the MPAM driver needs to know the size of a cache, which is stored
in cacheinfo. Add a helper to retrieve that information.

This series is based on v6.16-rc1, and can be retreived from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/cacheinfo/v1

The MPAM driver that makes use of these can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.16-rc1

What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/


Bugs welcome,
Thanks,

James Morse (4):
  cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for
    cache-id
  arm64: cacheinfo: Provide helper to compress MPIDR value into u32
  cacheinfo: Expose the code to generate a cache-id from a device_node
  cacheinfo: Add helper to find the cache size from cpu+level

Rob Herring (1):
  cacheinfo: Set cache 'id' based on DT data

 arch/arm64/include/asm/cache.h | 14 +++++++++++
 arch/arm64/kernel/sleep.S      |  1 +
 drivers/base/cacheinfo.c       | 45 ++++++++++++++++++++++++++++++++++
 include/linux/cacheinfo.h      | 21 ++++++++++++++++
 4 files changed, 81 insertions(+)

-- 
2.39.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] cacheinfo: Set cache 'id' based on DT data
  2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
@ 2025-06-13 13:03 ` James Morse
  2025-06-17 16:03   ` Jonathan Cameron
  2025-06-13 13:03 ` [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id James Morse
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: James Morse @ 2025-06-13 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki, sudeep.holla, Rob Herring,
	Ben Horgan, James Morse

From: Rob Herring <robh@kernel.org>

Use the minimum CPU h/w id of the CPUs associated with the cache for the
cache 'id'. This will provide a stable id value for a given system. As
we need to check all possible CPUs, we can't use the shared_cpu_map
which is just online CPUs. As there's not a cache to CPUs mapping in DT,
we have to walk all CPU nodes and then walk cache levels.

The cache_id exposed to user-space has historically been 32 bits, and
is too late to change. Give up on assigning cache-id's if a CPU h/w
id greater than 32 bits is found.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
[ ben: converted to use the __free cleanup idiom ]
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[ morse: Add checks to give up if a value larger than 32 bits is seen. ]
Signed-off-by: James Morse <james.morse@arm.com>
---
Use as a 32bit value has been seen in DPDK patches here:
http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
---
 drivers/base/cacheinfo.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index cf0d455209d7..9888d87840a2 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -8,6 +8,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/acpi.h>
+#include <linux/bitfield.h>
 #include <linux/bitops.h>
 #include <linux/cacheinfo.h>
 #include <linux/compiler.h>
@@ -183,6 +184,37 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
 	return of_property_read_bool(np, "cache-unified");
 }
 
+static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
+{
+	struct device_node *cpu;
+	u32 min_id = ~0;
+
+	for_each_of_cpu_node(cpu) {
+		struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
+		u64 id = of_get_cpu_hwid(cpu, 0);
+
+		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
+			of_node_put(cpu);
+			return;
+		}
+		while (1) {
+			if (!cache_node)
+				break;
+			if (cache_node == np && id < min_id) {
+				min_id = id;
+				break;
+			}
+			struct device_node *prev __free(device_node) = cache_node;
+			cache_node = of_find_next_cache_node(cache_node);
+		}
+	}
+
+	if (min_id != ~0) {
+		this_leaf->id = min_id;
+		this_leaf->attributes |= CACHE_ID;
+	}
+}
+
 static void cache_of_set_props(struct cacheinfo *this_leaf,
 			       struct device_node *np)
 {
@@ -198,6 +230,7 @@ static void cache_of_set_props(struct cacheinfo *this_leaf,
 	cache_get_line_size(this_leaf, np);
 	cache_nr_sets(this_leaf, np);
 	cache_associativity(this_leaf);
+	cache_of_set_id(this_leaf, np);
 }
 
 static int cache_setup_of_node(unsigned int cpu)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
  2025-06-13 13:03 ` [PATCH 1/5] " James Morse
@ 2025-06-13 13:03 ` James Morse
  2025-06-17 16:05   ` Jonathan Cameron
  2025-06-23 14:48   ` Rob Herring
  2025-06-13 13:03 ` [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32 James Morse
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: James Morse @ 2025-06-13 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki, sudeep.holla, Rob Herring,
	Ben Horgan, James Morse

Filesystems like resctrl use the cache-id exposed via sysfs to identify
groups of CPUs. The value is also used for PCIe cache steering tags. On
DT platforms cache-id is not something that is described in the
device-tree, but instead generated from the smallest CPU h/w id of the
CPUs associated with that cache.

CPU h/w ids may be larger than 32 bits.

Add a hook to allow architectures to compress the value from the devicetree
into 32 bits. Returning the same value is always safe as cache_of_set_id()
will stop if a value larger than 32 bits is seen.

For example, on arm64 the value is the MPIDR affinity register, which only
has 32 bits of affinity data, but spread across the 64 bit field. An
arch-specific bit swizzle gives a 32 bit value.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/base/cacheinfo.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 9888d87840a2..d8e5b4c7156c 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -184,6 +184,10 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
 	return of_property_read_bool(np, "cache-unified");
 }
 
+#ifndef arch_compact_of_hwid
+#define arch_compact_of_hwid(_x)	(_x)
+#endif
+
 static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	struct device_node *cpu;
@@ -193,6 +197,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
 		struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
 		u64 id = of_get_cpu_hwid(cpu, 0);
 
+		id = arch_compact_of_hwid(id);
 		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
 			of_node_put(cpu);
 			return;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32
  2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
  2025-06-13 13:03 ` [PATCH 1/5] " James Morse
  2025-06-13 13:03 ` [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id James Morse
@ 2025-06-13 13:03 ` James Morse
  2025-06-17 16:14   ` Jonathan Cameron
  2025-06-13 13:03 ` [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: James Morse @ 2025-06-13 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki, sudeep.holla, Rob Herring,
	Ben Horgan, James Morse

Filesystems like resctrl use the cache-id exposed via sysfs to identify
groups of CPUs. The value is also used for PCIe cache steering tags. On
DT platforms cache-id is not something that is described in the
device-tree, but instead generated from the smallest MPIDR of the CPUs
associated with that cache. The cache-id exposed to user-space has
historically been 32 bits.

MPIDR values may be larger than 32 bits.

MPIDR only has 32 bits worth of affinity data, but the aff3 field lives
above 32bits. The corresponding lower bits are masked out by
MPIDR_HWID_BITMASK and contain an SMT flag and Uni-Processor flag.

Swizzzle the aff3 field into the bottom 32 bits and using that.

In case more affinity fields are added in the future, the upper RES0
area should be checked. Returning a value greater than 32 bits from
this helper will cause the caller to give up on allocating cache-ids.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/cache.h | 14 ++++++++++++++
 arch/arm64/kernel/sleep.S      |  1 +
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index 99cd6546e72e..f8798dc96364 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -42,6 +42,7 @@
 
 #include <asm/cputype.h>
 #include <asm/mte-def.h>
+#include <asm/suspend.h>
 #include <asm/sysreg.h>
 
 #ifdef CONFIG_KASAN_SW_TAGS
@@ -87,6 +88,19 @@ int cache_line_size(void);
 
 #define dma_get_cache_alignment	cache_line_size
 
+/* Compress a u64 MPIDR value into 32 bits. */
+static inline u64 arch_compact_of_hwid(u64 id)
+{
+	u64 aff3 = MPIDR_AFFINITY_LEVEL(id, 3);
+
+	/* These bits are expected to be RES0 */
+	if (FIELD_GET(GENMASK_ULL(63, 40), id))
+		return id;
+
+	return (aff3 << 24) | FIELD_GET(GENMASK_ULL(23, 0), id);
+}
+#define arch_compact_of_hwid	arch_compact_of_hwid
+
 /*
  * Read the effective value of CTR_EL0.
  *
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index f093cdf71be1..ebc23304d430 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -50,6 +50,7 @@
 	lsr	\mask ,\mask, \rs3
 	orr	\dst, \dst, \mask		// dst|=(aff3>>rs3)
 	.endm
+
 /*
  * Save CPU state in the provided sleep_stack_data area, and publish its
  * location for cpu_resume()'s use in sleep_save_stash.
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
                   ` (2 preceding siblings ...)
  2025-06-13 13:03 ` [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32 James Morse
@ 2025-06-13 13:03 ` James Morse
  2025-06-17 16:21   ` Jonathan Cameron
  2025-06-13 13:03 ` [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level James Morse
  2025-06-23 15:05 ` [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data Rob Herring
  5 siblings, 1 reply; 25+ messages in thread
From: James Morse @ 2025-06-13 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki, sudeep.holla, Rob Herring,
	Ben Horgan, James Morse

The MPAM driver identifies caches by id for use with resctrl. It
needs to know the cache-id when probe-ing, but the value isn't set
in cacheinfo until the corresponding CPU comes online.

Expose the code that generates the cache-id. This allows the MPAM
driver to determine the properties of the caches without waiting
for all CPUs to come online.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/base/cacheinfo.c  | 15 +++++++++++----
 include/linux/cacheinfo.h |  1 +
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index d8e5b4c7156c..6316d80abab8 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -188,7 +188,7 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
 #define arch_compact_of_hwid(_x)	(_x)
 #endif
 
-static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
+unsigned long cache_of_get_id(struct device_node *np)
 {
 	struct device_node *cpu;
 	u32 min_id = ~0;
@@ -200,7 +200,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
 		id = arch_compact_of_hwid(id);
 		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
 			of_node_put(cpu);
-			return;
+			return ~0UL;
 		}
 		while (1) {
 			if (!cache_node)
@@ -214,8 +214,15 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
 		}
 	}
 
-	if (min_id != ~0) {
-		this_leaf->id = min_id;
+	return min_id;
+}
+
+static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
+{
+	unsigned long id = cache_of_get_id(np);
+
+	if (id != ~0UL) {
+		this_leaf->id = id;
 		this_leaf->attributes |= CACHE_ID;
 	}
 }
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index c8f4f0a0b874..9c959caf8af8 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,
 #endif
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
+unsigned long cache_of_get_id(struct device_node *np);
 
 /*
  * Get the cacheinfo structure for the cache associated with @cpu at
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level
  2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
                   ` (3 preceding siblings ...)
  2025-06-13 13:03 ` [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-06-13 13:03 ` James Morse
  2025-06-17 16:28   ` Jonathan Cameron
  2025-06-23 15:05 ` [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data Rob Herring
  5 siblings, 1 reply; 25+ messages in thread
From: James Morse @ 2025-06-13 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Greg Kroah-Hartman, Rafael J . Wysocki, sudeep.holla, Rob Herring,
	Ben Horgan, James Morse

The MPAM driver needs to know the size of a cache associated with a
particular CPU. The DT/ACPI agnostic way of doing this is to ask cacheinfo.

Add a helper to do this.

Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/cacheinfo.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 9c959caf8af8..3f1b6b2e25b5 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -148,6 +148,26 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
 	return ci ? ci->id : -1;
 }
 
+/*
+ * Get the size of the cache associated with @cpu at level @level.
+ * cpuhp lock must be held.
+ */
+static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
+{
+	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
+	int i;
+
+	if (!ci->info_list)
+		return 0;
+
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == level)
+			return ci->info_list[i].size;
+	}
+
+	return 0;
+}
+
 #if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
 #define use_arch_cache_info()	(true)
 #else
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cacheinfo: Set cache 'id' based on DT data
  2025-06-13 13:03 ` [PATCH 1/5] " James Morse
@ 2025-06-17 16:03   ` Jonathan Cameron
  2025-06-23 14:18     ` Rob Herring
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2025-06-17 16:03 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

On Fri, 13 Jun 2025 13:03:52 +0000
James Morse <james.morse@arm.com> wrote:

> From: Rob Herring <robh@kernel.org>
> 
> Use the minimum CPU h/w id of the CPUs associated with the cache for the
> cache 'id'. This will provide a stable id value for a given system. As
> we need to check all possible CPUs, we can't use the shared_cpu_map
> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> we have to walk all CPU nodes and then walk cache levels.

Is it ok for these to match for different levels?  I've no idea for
these use cases.

> 
> The cache_id exposed to user-space has historically been 32 bits, and
> is too late to change. Give up on assigning cache-id's if a CPU h/w
> id greater than 32 bits is found.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Rob Herring <robh@kernel.org>
> [ ben: converted to use the __free cleanup idiom ]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> [ morse: Add checks to give up if a value larger than 32 bits is seen. ]
> Signed-off-by: James Morse <james.morse@arm.com>

Hi James, Rob,

Mainly a couple of questions for Rob on the fun of scoped cleanup being
used for some of the iterators in a similar fashion to already
done for looping over child nodes etc.

> ---
> Use as a 32bit value has been seen in DPDK patches here:
> http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> ---
>  drivers/base/cacheinfo.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index cf0d455209d7..9888d87840a2 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -8,6 +8,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/acpi.h>
> +#include <linux/bitfield.h>
>  #include <linux/bitops.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/compiler.h>
> @@ -183,6 +184,37 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>  	return of_property_read_bool(np, "cache-unified");
>  }
>  
> +static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
> +{
> +	struct device_node *cpu;
> +	u32 min_id = ~0;
> +
> +	for_each_of_cpu_node(cpu) {

Rob is it worth a scoped variant of this one?  I've come across
this a few times recently and it irritates me but I didn't feel
there were necessarily enough cases to bother.  With one more
maybe it is time to do it (maybe 10+ from a quick look)_.



> +		struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
> +		u64 id = of_get_cpu_hwid(cpu, 0);
> +
> +		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
> +			of_node_put(cpu);
> +			return;
> +		}
> +		while (1) {

for_each_of_cache_node_scoped() perhaps?  With the find already defined this would end
up something like the following.  Modeled on for_each_child_of_node_scoped.

	#define for_each_of_cache_node_scoped(cpu, cache) \
		for (struct device_node *cache __free(device_node) = \
		     of_find_next_cache_node(cpu); cache != NULL; \
		     cache = of_find_next_cache_node(cache))

	for_each_of_cpu_node_scoped(cpu) {
		u64 id = of_get_cpu_hwid(cpu, 0);

		if (FIELD_GET(GENMASK_ULL(63, 32), id))
			return;
		for_each_of_cache_node_scoped(cpu, cache_node) {
			if (cache_node == np) {
				min_id = min(min_id, id);
				break;
			}
		} 
	}

> +			if (!cache_node)
> +				break;
> +			if (cache_node == np && id < min_id) {

Why do you carry on if id >= min_id?  Id isn't changing.  For that
matter why not do this check before iterating the caches at all?

Maybe a comment if that does make sense but I'd not expect cache_node
to match again.

> +				min_id = id;
> +				break;
> +			}
> +			struct device_node *prev __free(device_node) = cache_node;
> +			cache_node = of_find_next_cache_node(cache_node);
> +		}
> +	}
> +
> +	if (min_id != ~0) {
> +		this_leaf->id = min_id;
> +		this_leaf->attributes |= CACHE_ID;
> +	}
> +}
> +
>  static void cache_of_set_props(struct cacheinfo *this_leaf,
>  			       struct device_node *np)
>  {
> @@ -198,6 +230,7 @@ static void cache_of_set_props(struct cacheinfo *this_leaf,
>  	cache_get_line_size(this_leaf, np);
>  	cache_nr_sets(this_leaf, np);
>  	cache_associativity(this_leaf);
> +	cache_of_set_id(this_leaf, np);
>  }
>  
>  static int cache_setup_of_node(unsigned int cpu)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-06-13 13:03 ` [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id James Morse
@ 2025-06-17 16:05   ` Jonathan Cameron
  2025-06-23 14:48   ` Rob Herring
  1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2025-06-17 16:05 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

On Fri, 13 Jun 2025 13:03:53 +0000
James Morse <james.morse@arm.com> wrote:

> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> groups of CPUs. The value is also used for PCIe cache steering tags. On
> DT platforms cache-id is not something that is described in the
> device-tree, but instead generated from the smallest CPU h/w id of the
> CPUs associated with that cache.
> 
> CPU h/w ids may be larger than 32 bits.
> 
> Add a hook to allow architectures to compress the value from the devicetree
> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> will stop if a value larger than 32 bits is seen.
> 
> For example, on arm64 the value is the MPIDR affinity register, which only
> has 32 bits of affinity data, but spread across the 64 bit field. An
> arch-specific bit swizzle gives a 32 bit value.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Looks fine to me
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/base/cacheinfo.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 9888d87840a2..d8e5b4c7156c 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -184,6 +184,10 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>  	return of_property_read_bool(np, "cache-unified");
>  }
>  
> +#ifndef arch_compact_of_hwid
> +#define arch_compact_of_hwid(_x)	(_x)
> +#endif
> +
>  static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>  {
>  	struct device_node *cpu;
> @@ -193,6 +197,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>  		struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
>  		u64 id = of_get_cpu_hwid(cpu, 0);
>  
> +		id = arch_compact_of_hwid(id);
>  		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
>  			of_node_put(cpu);
>  			return;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32
  2025-06-13 13:03 ` [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32 James Morse
@ 2025-06-17 16:14   ` Jonathan Cameron
  2025-06-27 16:39     ` James Morse
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2025-06-17 16:14 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

On Fri, 13 Jun 2025 13:03:54 +0000
James Morse <james.morse@arm.com> wrote:

> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> groups of CPUs. The value is also used for PCIe cache steering tags. On
> DT platforms cache-id is not something that is described in the
> device-tree, but instead generated from the smallest MPIDR of the CPUs
> associated with that cache. The cache-id exposed to user-space has
> historically been 32 bits.
> 
> MPIDR values may be larger than 32 bits.
> 
> MPIDR only has 32 bits worth of affinity data, but the aff3 field lives
> above 32bits. The corresponding lower bits are masked out by
> MPIDR_HWID_BITMASK and contain an SMT flag and Uni-Processor flag.
> 
> Swizzzle the aff3 field into the bottom 32 bits and using that.
> 
> In case more affinity fields are added in the future, the upper RES0
> area should be checked. Returning a value greater than 32 bits from
> this helper will cause the caller to give up on allocating cache-ids.
Hi James,

I'd mention that in the code via a comment, not just the commit message.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Seems a few unrelated tiny things snuck in here.

Otherwise seems fine to me.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  arch/arm64/include/asm/cache.h | 14 ++++++++++++++
>  arch/arm64/kernel/sleep.S      |  1 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index 99cd6546e72e..f8798dc96364 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -42,6 +42,7 @@
>  
>  #include <asm/cputype.h>
>  #include <asm/mte-def.h>
> +#include <asm/suspend.h>
That seems a little random?  Why?
>  #include <asm/sysreg.h>
>  
>  #ifdef CONFIG_KASAN_SW_TAGS
> @@ -87,6 +88,19 @@ int cache_line_size(void);
>  
>  #define dma_get_cache_alignment	cache_line_size
>  
> +/* Compress a u64 MPIDR value into 32 bits. */
> +static inline u64 arch_compact_of_hwid(u64 id)
> +{
> +	u64 aff3 = MPIDR_AFFINITY_LEVEL(id, 3);
> +
> +	/* These bits are expected to be RES0 */
> +	if (FIELD_GET(GENMASK_ULL(63, 40), id))
> +		return id;

I would add a comment that the way this fails is to ensure
there are bits in the upper bits.  It is a little unusual
as APIs go but matches the not defined variant so sort of
makes sense.

> +
> +	return (aff3 << 24) | FIELD_GET(GENMASK_ULL(23, 0), id);
> +}
> +#define arch_compact_of_hwid	arch_compact_of_hwid
> +
>  /*
>   * Read the effective value of CTR_EL0.
>   *
> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
> index f093cdf71be1..ebc23304d430 100644
> --- a/arch/arm64/kernel/sleep.S
> +++ b/arch/arm64/kernel/sleep.S
> @@ -50,6 +50,7 @@
>  	lsr	\mask ,\mask, \rs3
>  	orr	\dst, \dst, \mask		// dst|=(aff3>>rs3)
>  	.endm
> +

Stray change.

>  /*
>   * Save CPU state in the provided sleep_stack_data area, and publish its
>   * location for cpu_resume()'s use in sleep_save_stash.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-06-13 13:03 ` [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-06-17 16:21   ` Jonathan Cameron
  2025-06-27  5:54     ` Shaopeng Tan (Fujitsu)
  2025-06-27 16:38     ` James Morse
  0 siblings, 2 replies; 25+ messages in thread
From: Jonathan Cameron @ 2025-06-17 16:21 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

On Fri, 13 Jun 2025 13:03:55 +0000
James Morse <james.morse@arm.com> wrote:

> The MPAM driver identifies caches by id for use with resctrl. It
> needs to know the cache-id when probe-ing, but the value isn't set
> in cacheinfo until the corresponding CPU comes online.
> 
> Expose the code that generates the cache-id. This allows the MPAM
> driver to determine the properties of the caches without waiting
> for all CPUs to come online.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
Feels to me like this needs to come with the user.
The earlier patches at least expose it via existing infrastructure
this isn't used at all yet...

> ---
>  drivers/base/cacheinfo.c  | 15 +++++++++++----
>  include/linux/cacheinfo.h |  1 +
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index d8e5b4c7156c..6316d80abab8 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -188,7 +188,7 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>  #define arch_compact_of_hwid(_x)	(_x)
>  #endif
>  
> -static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
> +unsigned long cache_of_get_id(struct device_node *np)
Bit confusing to have cache_of_set_id() call cache_of_get_id() like this because
they are in no way mirrors of each other.   Rename?
(and naturally I'm providing no suggestions :)

>  {
>  	struct device_node *cpu;
>  	u32 min_id = ~0;
> @@ -200,7 +200,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>  		id = arch_compact_of_hwid(id);
>  		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
>  			of_node_put(cpu);
> -			return;
> +			return ~0UL;
>  		}
>  		while (1) {
>  			if (!cache_node)
> @@ -214,8 +214,15 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>  		}
>  	}
>  
> -	if (min_id != ~0) {
> -		this_leaf->id = min_id;
> +	return min_id;
> +}
> +
> +static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
> +{
> +	unsigned long id = cache_of_get_id(np);
> +
> +	if (id != ~0UL) {
> +		this_leaf->id = id;
>  		this_leaf->attributes |= CACHE_ID;
>  	}
>  }
> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index c8f4f0a0b874..9c959caf8af8 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,
>  #endif
>  
>  const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
> +unsigned long cache_of_get_id(struct device_node *np);
>  
>  /*
>   * Get the cacheinfo structure for the cache associated with @cpu at


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level
  2025-06-13 13:03 ` [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-06-17 16:28   ` Jonathan Cameron
  2025-06-27 16:38     ` James Morse
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2025-06-17 16:28 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

On Fri, 13 Jun 2025 13:03:56 +0000
James Morse <james.morse@arm.com> wrote:

> The MPAM driver needs to know the size of a cache associated with a
> particular CPU. The DT/ACPI agnostic way of doing this is to ask cacheinfo.
> 
> Add a helper to do this.
> 
> Signed-off-by: James Morse <james.morse@arm.com>



> ---
>  include/linux/cacheinfo.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 9c959caf8af8..3f1b6b2e25b5 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -148,6 +148,26 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
>  	return ci ? ci->id : -1;
>  }
>  
> +/*
> + * Get the size of the cache associated with @cpu at level @level.
> + * cpuhp lock must be held.
To me kernel-doc would be appropriate.  Particularly the return 0 thing.
However there isn't any for existing cacheinfo interfaces so maybe
fair enough to 'follow local style' on that.

> + */
> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> +{
> +	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
> +	int i;
> +
> +	if (!ci->info_list)
> +		return 0;
> +
> +	for (i = 0; i < ci->num_leaves; i++) {
> +		if (ci->info_list[i].level == level)
> +			return ci->info_list[i].size;
> +	}
> +
> +	return 0;
> +}
> +
Why not

static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
{
	struct cpu_cachinfo *ci = get_cpu_cacheinfo_level(cpu, lev);

	return ci ? ci->size; 0;
}

Like existing get_cpu_cacheinfo_id()?

>  #if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
>  #define use_arch_cache_info()	(true)
>  #else


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cacheinfo: Set cache 'id' based on DT data
  2025-06-17 16:03   ` Jonathan Cameron
@ 2025-06-23 14:18     ` Rob Herring
  2025-06-27 16:38       ` James Morse
  0 siblings, 1 reply; 25+ messages in thread
From: Rob Herring @ 2025-06-23 14:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: James Morse, linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

On Tue, Jun 17, 2025 at 11:03 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Fri, 13 Jun 2025 13:03:52 +0000
> James Morse <james.morse@arm.com> wrote:
>
> > From: Rob Herring <robh@kernel.org>
> >
> > Use the minimum CPU h/w id of the CPUs associated with the cache for the
> > cache 'id'. This will provide a stable id value for a given system. As
> > we need to check all possible CPUs, we can't use the shared_cpu_map
> > which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> > we have to walk all CPU nodes and then walk cache levels.
>
> Is it ok for these to match for different levels?  I've no idea for
> these use cases.

Yes. The 'id' is per level, not globally unique.

> >
> > The cache_id exposed to user-space has historically been 32 bits, and
> > is too late to change. Give up on assigning cache-id's if a CPU h/w
> > id greater than 32 bits is found.
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Signed-off-by: Rob Herring <robh@kernel.org>
> > [ ben: converted to use the __free cleanup idiom ]
> > Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> > [ morse: Add checks to give up if a value larger than 32 bits is seen. ]
> > Signed-off-by: James Morse <james.morse@arm.com>
>
> Hi James, Rob,
>
> Mainly a couple of questions for Rob on the fun of scoped cleanup being
> used for some of the iterators in a similar fashion to already
> done for looping over child nodes etc.
>
> > ---
> > Use as a 32bit value has been seen in DPDK patches here:
> > http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> > ---
> >  drivers/base/cacheinfo.c | 33 +++++++++++++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> >
> > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> > index cf0d455209d7..9888d87840a2 100644
> > --- a/drivers/base/cacheinfo.c
> > +++ b/drivers/base/cacheinfo.c
> > @@ -8,6 +8,7 @@
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> >  #include <linux/acpi.h>
> > +#include <linux/bitfield.h>
> >  #include <linux/bitops.h>
> >  #include <linux/cacheinfo.h>
> >  #include <linux/compiler.h>
> > @@ -183,6 +184,37 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
> >       return of_property_read_bool(np, "cache-unified");
> >  }
> >
> > +static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
> > +{
> > +     struct device_node *cpu;
> > +     u32 min_id = ~0;
> > +
> > +     for_each_of_cpu_node(cpu) {
>
> Rob is it worth a scoped variant of this one?  I've come across
> this a few times recently and it irritates me but I didn't feel
> there were necessarily enough cases to bother.  With one more
> maybe it is time to do it (maybe 10+ from a quick look)_.

My question on all of these (though more so for drivers), is why are
we parsing CPU nodes again? Ideally, we'd parse the CPU and cache
nodes only once and the kernel would provide the necessary info.

Take drivers/clk/mvebu/ap-cpu-clk.c for example. The code there really
just needs to know if there are 2 or 4 possible CPUs or what is the
max physical CPU id (If CPU #1 could be not present).

> > +             struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
> > +             u64 id = of_get_cpu_hwid(cpu, 0);
> > +
> > +             if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
> > +                     of_node_put(cpu);
> > +                     return;
> > +             }
> > +             while (1) {
>
> for_each_of_cache_node_scoped() perhaps?  With the find already defined this would end
> up something like the following.  Modeled on for_each_child_of_node_scoped.

That seems like an invitation for someone to parse the cache nodes
themselves rather than use cacheinfo. Plus, there are multiple ways we
could iterate over cache nodes. Is it just ones associated with a CPU
or all cache nodes or all cache nodes at a level?

That being said, I do find the current loop a bit odd with setting
'prev' pointer which is then never explicitly used. We're still having
to worry about refcounting, but handling it in a less obvious way.

>         #define for_each_of_cache_node_scoped(cpu, cache) \
>                 for (struct device_node *cache __free(device_node) = \
>                      of_find_next_cache_node(cpu); cache != NULL; \
>                      cache = of_find_next_cache_node(cache))
>
>         for_each_of_cpu_node_scoped(cpu) {
>                 u64 id = of_get_cpu_hwid(cpu, 0);
>
>                 if (FIELD_GET(GENMASK_ULL(63, 32), id))
>                         return;
>                 for_each_of_cache_node_scoped(cpu, cache_node) {
>                         if (cache_node == np) {
>                                 min_id = min(min_id, id);
>                                 break;
>                         }
>                 }
>         }
>
> > +                     if (!cache_node)
> > +                             break;
> > +                     if (cache_node == np && id < min_id) {
>
> Why do you carry on if id >= min_id?  Id isn't changing.  For that
> matter why not do this check before iterating the caches at all?

You're right, no need. There's no need to handle the id in the loop at
all, we just need to match the cache node. So perhaps just a helper:

static bool match_cache_node(struct device_node *cpu, const struct
device_node *cache_node)
{
  for (struct device_node *cache __free(device_node) =
        of_find_next_cache_node(cpu); cache != NULL;
        cache = of_find_next_cache_node(cache)) {
    if (cache == cache_node)
      return true;
  }
  return false;
}

And then the cpu loop would have:

if (match_cache_node(cpu, cache_node))
  min_id = min(min_id, id);

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-06-13 13:03 ` [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id James Morse
  2025-06-17 16:05   ` Jonathan Cameron
@ 2025-06-23 14:48   ` Rob Herring
  2025-06-27 16:38     ` James Morse
  1 sibling, 1 reply; 25+ messages in thread
From: Rob Herring @ 2025-06-23 14:48 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>
> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> groups of CPUs. The value is also used for PCIe cache steering tags. On
> DT platforms cache-id is not something that is described in the
> device-tree, but instead generated from the smallest CPU h/w id of the
> CPUs associated with that cache.
>
> CPU h/w ids may be larger than 32 bits.
>
> Add a hook to allow architectures to compress the value from the devicetree
> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> will stop if a value larger than 32 bits is seen.
>
> For example, on arm64 the value is the MPIDR affinity register, which only
> has 32 bits of affinity data, but spread across the 64 bit field. An
> arch-specific bit swizzle gives a 32 bit value.

What's missing here is why do we need the cache id to be only 32-bits?
I suppose it is because the sysfs 'id' file has been implicitly that?
Why can't we just allow 64-bit values there? Obviously, you can't have
a 64-bit value on x86 because that might break existing userspace. But
for Arm, there is no existing userspace to break. Even with 32-bits,
it is entirely possible that an existing userspace assumed values less
than 32-bits and would be broken for Arm as-is. It is obviously nice
if we can avoid modifying userspace, but I don't think that's a
requirement and I'd be surprised if there's not other things that need
to be adapted for MPAM support.

Also, what if an architecture can't swizzle their value into 32-bits?
They would be stuck with requiring userspace to deal with 64-bit
values.

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data
  2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
                   ` (4 preceding siblings ...)
  2025-06-13 13:03 ` [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-06-23 15:05 ` Rob Herring
  2025-06-27 16:38   ` James Morse
  5 siblings, 1 reply; 25+ messages in thread
From: Rob Herring @ 2025-06-23 15:05 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>
> This series adds support for cache-ids to device-tree systems.
> These values are exposed to user-space via
> /sys/devices/system/cpu/cpuX/cache/indexY/id
> and are used to identify caches and their associated CPUs by kernel interfaces
> such as resctrl.
>
> Resctrl anticipates cache-ids are unique for a given cache level, but may
> be sparse. See Documentation/filesystems/resctrl.rst's "Cache IDs" section.
>
> Another user is PCIe's cache-steering hints, where an id provided by the
> hardware would be needed. Today this expects a platform specific ACPI hook
> the program that value into the PCIe root port registers. If DT platforms
> are ever supported, it will likely need a kernel driver to convert the
> user-space cache-id to whatever hardware value is needed.
>
> Rob H previously preferred to generate a cache-id from the information DT
> already has. (Rob: does the PCIe cache-steering use-case change this?)

I don't think so because who knows what values the PCI root port
needs. It's never going to be the cache id directly since that is per
level. So we'd need some sort of mapping. That's going to be something
like this:

Userspace level+id -> DT cache node -> PCI RP value

So the first translation is the same as you have here. The 2nd
translation might be something we put in DT or could be in PCI host
bridge driver.

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Re: [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-06-17 16:21   ` Jonathan Cameron
@ 2025-06-27  5:54     ` Shaopeng Tan (Fujitsu)
  2025-06-27 16:39       ` James Morse
  2025-06-27 16:38     ` James Morse
  1 sibling, 1 reply; 25+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-06-27  5:54 UTC (permalink / raw)
  To: 'Jonathan Cameron', 'James Morse'
  Cc: 'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org',
	'Greg Kroah-Hartman', 'Rafael J . Wysocki',
	'sudeep.holla@arm.com', 'Rob Herring',
	'Ben Horgan'

Hello James

> On Fri, 13 Jun 2025 13:03:55 +0000
> James Morse <james.morse@arm.com> wrote:
> 
> > The MPAM driver identifies caches by id for use with resctrl. It needs
> > to know the cache-id when probe-ing, but the value isn't set in
> > cacheinfo until the corresponding CPU comes online.
> >
> > Expose the code that generates the cache-id. This allows the MPAM
> > driver to determine the properties of the caches without waiting for
> > all CPUs to come online.
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> Feels to me like this needs to come with the user.
> The earlier patches at least expose it via existing infrastructure this isn't used
> at all yet...
> 
> > ---
> >  drivers/base/cacheinfo.c  | 15 +++++++++++----
> > include/linux/cacheinfo.h |  1 +
> >  2 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index
> > d8e5b4c7156c..6316d80abab8 100644
> > --- a/drivers/base/cacheinfo.c
> > +++ b/drivers/base/cacheinfo.c
> > @@ -188,7 +188,7 @@ static bool cache_node_is_unified(struct cacheinfo
> *this_leaf,
> >  #define arch_compact_of_hwid(_x)	(_x)
> >  #endif
> >
> > -static void cache_of_set_id(struct cacheinfo *this_leaf, struct
> > device_node *np)
> > +unsigned long cache_of_get_id(struct device_node *np)
> Bit confusing to have cache_of_set_id() call cache_of_get_id() like this because
> they are in no way mirrors of each other.   Rename?
> (and naturally I'm providing no suggestions :)
> 
> >  {
> >  	struct device_node *cpu;
> >  	u32 min_id = ~0;
> > @@ -200,7 +200,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf,
> struct device_node *np)
> >  		id = arch_compact_of_hwid(id);
> >  		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
Since "id" was compressed into 32bits by the function arch_compact_of_hwid(),
is this required?

Best regards,
Shaopeng TAN

> >  			of_node_put(cpu);
> > -			return;
> > +			return ~0UL;
> >  		}
> >  		while (1) {
> >  			if (!cache_node)
> > @@ -214,8 +214,15 @@ static void cache_of_set_id(struct cacheinfo
> *this_leaf, struct device_node *np)
> >  		}
> >  	}
> >
> > -	if (min_id != ~0) {
> > -		this_leaf->id = min_id;
> > +	return min_id;
> > +}
> > +
> > +static void cache_of_set_id(struct cacheinfo *this_leaf, struct
> > +device_node *np) {
> > +	unsigned long id = cache_of_get_id(np);
> > +
> > +	if (id != ~0UL) {
> > +		this_leaf->id = id;
> >  		this_leaf->attributes |= CACHE_ID;
> >  	}
> >  }
> > diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> > index c8f4f0a0b874..9c959caf8af8 100644
> > --- a/include/linux/cacheinfo.h
> > +++ b/include/linux/cacheinfo.h
> > @@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,  #endif
> >
> >  const struct attribute_group *cache_get_priv_group(struct cacheinfo
> > *this_leaf);
> > +unsigned long cache_of_get_id(struct device_node *np);
> >
> >  /*
> >   * Get the cacheinfo structure for the cache associated with @cpu at
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-06-23 14:48   ` Rob Herring
@ 2025-06-27 16:38     ` James Morse
  2025-06-30 19:43       ` Rob Herring
  0 siblings, 1 reply; 25+ messages in thread
From: James Morse @ 2025-06-27 16:38 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

Hi Rob,

On 23/06/2025 15:48, Rob Herring wrote:
> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
>> groups of CPUs. The value is also used for PCIe cache steering tags. On
>> DT platforms cache-id is not something that is described in the
>> device-tree, but instead generated from the smallest CPU h/w id of the
>> CPUs associated with that cache.
>>
>> CPU h/w ids may be larger than 32 bits.
>>
>> Add a hook to allow architectures to compress the value from the devicetree
>> into 32 bits. Returning the same value is always safe as cache_of_set_id()
>> will stop if a value larger than 32 bits is seen.
>>
>> For example, on arm64 the value is the MPIDR affinity register, which only
>> has 32 bits of affinity data, but spread across the 64 bit field. An
>> arch-specific bit swizzle gives a 32 bit value.

> What's missing here is why do we need the cache id to be only 32-bits?
> I suppose it is because the sysfs 'id' file has been implicitly that?

Yup, and its too late to change.


> Why can't we just allow 64-bit values there? Obviously, you can't have
> a 64-bit value on x86 because that might break existing userspace.

It's the same user-space. Users of resctrl should be portable between architectures.
Resctrl isn't the only user, of the cache-id field.


> But for Arm, there is no existing userspace to break.

libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/


> Even with 32-bits,
> it is entirely possible that an existing userspace assumed values less
> than 32-bits and would be broken for Arm as-is.

Sure, but I've not found a project where that is broken yet.


> It is obviously nice
> if we can avoid modifying userspace, but I don't think that's a
> requirement and I'd be surprised if there's not other things that need
> to be adapted for MPAM support.

The whole multi-year effort has been to make existing user-space work without any ABI
changes. The effect is some platforms have features that can't be used because resctrl
expects things to be Xeon shaped.
But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
doesn't have an equivalent, so '1' is exposed as a safe value.


> Also, what if an architecture can't swizzle their value into 32-bits?
> They would be stuck with requiring userspace to deal with 64-bit
> values.

Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.


I'll add something about the libvirt example to the commit message.


Thanks,

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level
  2025-06-17 16:28   ` Jonathan Cameron
@ 2025-06-27 16:38     ` James Morse
  0 siblings, 0 replies; 25+ messages in thread
From: James Morse @ 2025-06-27 16:38 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

Hi Jonathan,

On 17/06/2025 17:28, Jonathan Cameron wrote:
> On Fri, 13 Jun 2025 13:03:56 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> The MPAM driver needs to know the size of a cache associated with a
>> particular CPU. The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>>
>> Add a helper to do this.

>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 9c959caf8af8..3f1b6b2e25b5 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -148,6 +148,26 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
>>  	return ci ? ci->id : -1;
>>  }
>>  
>> +/*
>> + * Get the size of the cache associated with @cpu at level @level.
>> + * cpuhp lock must be held.

> To me kernel-doc would be appropriate.  Particularly the return 0 thing.
> However there isn't any for existing cacheinfo interfaces so maybe
> fair enough to 'follow local style' on that.

Documentation should be the direction of travel right? I'll change this.


>> + */
>> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
>> +{
>> +	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
>> +	int i;
>> +
>> +	if (!ci->info_list)
>> +		return 0;
>> +
>> +	for (i = 0; i < ci->num_leaves; i++) {
>> +		if (ci->info_list[i].level == level)
>> +			return ci->info_list[i].size;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
> Why not
> 
> static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> {
> 	struct cpu_cachinfo *ci = get_cpu_cacheinfo_level(cpu, lev);
> 
> 	return ci ? ci->size; 0;
> }
> 
> Like existing get_cpu_cacheinfo_id()?

I've been dragging this around for years. Fixed.


Thanks!

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-06-17 16:21   ` Jonathan Cameron
  2025-06-27  5:54     ` Shaopeng Tan (Fujitsu)
@ 2025-06-27 16:38     ` James Morse
  1 sibling, 0 replies; 25+ messages in thread
From: James Morse @ 2025-06-27 16:38 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

Hi Jonathan,

On 17/06/2025 17:21, Jonathan Cameron wrote:
> On Fri, 13 Jun 2025 13:03:55 +0000
> James Morse <james.morse@arm.com> wrote:

>> The MPAM driver identifies caches by id for use with resctrl. It
>> needs to know the cache-id when probe-ing, but the value isn't set
>> in cacheinfo until the corresponding CPU comes online.
>>
>> Expose the code that generates the cache-id. This allows the MPAM
>> driver to determine the properties of the caches without waiting
>> for all CPUs to come online.

> Feels to me like this needs to come with the user.
> The earlier patches at least expose it via existing infrastructure
> this isn't used at all yet...

Yeah, I'm damned whatever I do here. I can move this (and patch 5) to the series with the
MPAM driver - this was me trying to reduce the number of people that get copied on that,
and the number of trees that it touches...


>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index d8e5b4c7156c..6316d80abab8 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -188,7 +188,7 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>>  #define arch_compact_of_hwid(_x)	(_x)
>>  #endif
>>  
>> -static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>> +unsigned long cache_of_get_id(struct device_node *np)

> Bit confusing to have cache_of_set_id() call cache_of_get_id() like this because
> they are in no way mirrors of each other.   Rename?
> (and naturally I'm providing no suggestions :)

I'll try cache_of_calculate_id() ...


>>  {
>>  	struct device_node *cpu;
>>  	u32 min_id = ~0;


Thanks,

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data
  2025-06-23 15:05 ` [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data Rob Herring
@ 2025-06-27 16:38   ` James Morse
  0 siblings, 0 replies; 25+ messages in thread
From: James Morse @ 2025-06-27 16:38 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

Hi Rob,

On 23/06/2025 16:05, Rob Herring wrote:
> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>>
>> This series adds support for cache-ids to device-tree systems.
>> These values are exposed to user-space via
>> /sys/devices/system/cpu/cpuX/cache/indexY/id
>> and are used to identify caches and their associated CPUs by kernel interfaces
>> such as resctrl.
>>
>> Resctrl anticipates cache-ids are unique for a given cache level, but may
>> be sparse. See Documentation/filesystems/resctrl.rst's "Cache IDs" section.
>>
>> Another user is PCIe's cache-steering hints, where an id provided by the
>> hardware would be needed. Today this expects a platform specific ACPI hook
>> the program that value into the PCIe root port registers. If DT platforms
>> are ever supported, it will likely need a kernel driver to convert the
>> user-space cache-id to whatever hardware value is needed.
>>
>> Rob H previously preferred to generate a cache-id from the information DT
>> already has. (Rob: does the PCIe cache-steering use-case change this?)

> I don't think so because who knows what values the PCI root port
> needs.

Some hardware specific value, that would have to come from the DT...


> It's never going to be the cache id directly since that is per level.

Can re-used across levels, but because they can also be sparse its equally valid for them
to be unique. This is what is happening on arm64 ACPI platforms.


> So we'd need some sort of mapping. That's going to be something
> like this:
> 
> Userspace level+id -> DT cache node -> PCI RP value
> 
> So the first translation is the same as you have here. The 2nd
> translation might be something we put in DT or could be in PCI host
> bridge driver.

ACPI currently requires some kernel interaction too as the value gets written to some
platform specific register. Even if it did get standardised, I guess something like VFIO
would manage access to that register to fix up the values.


Thanks,

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] cacheinfo: Set cache 'id' based on DT data
  2025-06-23 14:18     ` Rob Herring
@ 2025-06-27 16:38       ` James Morse
  0 siblings, 0 replies; 25+ messages in thread
From: James Morse @ 2025-06-27 16:38 UTC (permalink / raw)
  To: Rob Herring, Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

Hi Jonathan, Rob,

On 23/06/2025 15:18, Rob Herring wrote:
> On Tue, Jun 17, 2025 at 11:03 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
>> On Fri, 13 Jun 2025 13:03:52 +0000
>> James Morse <james.morse@arm.com> wrote:
>>> From: Rob Herring <robh@kernel.org>
>>>
>>> Use the minimum CPU h/w id of the CPUs associated with the cache for the
>>> cache 'id'. This will provide a stable id value for a given system. As
>>> we need to check all possible CPUs, we can't use the shared_cpu_map
>>> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
>>> we have to walk all CPU nodes and then walk cache levels.
>>
>> Is it ok for these to match for different levels?  I've no idea for
>> these use cases.

> Yes. The 'id' is per level, not globally unique.

Documented here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/resctrl.rst#n482

The values are unique per-level, but also sparse, meaning they could be unique
system-wide. This is what will happen on arm64 ACPI platforms because the PPTT value gets
exposed directly.


>>> The cache_id exposed to user-space has historically been 32 bits, and
>>> is too late to change. Give up on assigning cache-id's if a CPU h/w
>>> id greater than 32 bits is found.

>> Mainly a couple of questions for Rob on the fun of scoped cleanup being
>> used for some of the iterators in a similar fashion to already
>> done for looping over child nodes etc.

This is mostly over my head!


>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>>> index cf0d455209d7..9888d87840a2 100644
>>> --- a/drivers/base/cacheinfo.c
>>> +++ b/drivers/base/cacheinfo.c
>>> @@ -183,6 +184,37 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>>>       return of_property_read_bool(np, "cache-unified");
>>>  }
>>>
>>> +static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>>> +{
>>> +     struct device_node *cpu;
>>> +     u32 min_id = ~0;
>>> +
>>> +     for_each_of_cpu_node(cpu) {
>>
>> Rob is it worth a scoped variant of this one?  I've come across
>> this a few times recently and it irritates me but I didn't feel
>> there were necessarily enough cases to bother.  With one more
>> maybe it is time to do it (maybe 10+ from a quick look)_.
> 
> My question on all of these (though more so for drivers), is why are
> we parsing CPU nodes again? Ideally, we'd parse the CPU and cache
> nodes only once and the kernel would provide the necessary info.
> 
> Take drivers/clk/mvebu/ap-cpu-clk.c for example. The code there really
> just needs to know if there are 2 or 4 possible CPUs or what is the
> max physical CPU id (If CPU #1 could be not present).
> 
>>> +             struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
>>> +             u64 id = of_get_cpu_hwid(cpu, 0);
>>> +
>>> +             if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
>>> +                     of_node_put(cpu);
>>> +                     return;
>>> +             }
>>> +             while (1) {
>>
>> for_each_of_cache_node_scoped() perhaps?  With the find already defined this would end
>> up something like the following.  Modeled on for_each_child_of_node_scoped.
> 
> That seems like an invitation for someone to parse the cache nodes
> themselves rather than use cacheinfo. Plus, there are multiple ways we
> could iterate over cache nodes. Is it just ones associated with a CPU
> or all cache nodes or all cache nodes at a level?
> 
> That being said, I do find the current loop a bit odd with setting
> 'prev' pointer which is then never explicitly used. We're still having
> to worry about refcounting, but handling it in a less obvious way.
> 
>>         #define for_each_of_cache_node_scoped(cpu, cache) \
>>                 for (struct device_node *cache __free(device_node) = \
>>                      of_find_next_cache_node(cpu); cache != NULL; \
>>                      cache = of_find_next_cache_node(cache))
>>
>>         for_each_of_cpu_node_scoped(cpu) {
>>                 u64 id = of_get_cpu_hwid(cpu, 0);
>>
>>                 if (FIELD_GET(GENMASK_ULL(63, 32), id))
>>                         return;
>>                 for_each_of_cache_node_scoped(cpu, cache_node) {
>>                         if (cache_node == np) {
>>                                 min_id = min(min_id, id);
>>                                 break;
>>                         }
>>                 }
>>         }
>>
>>> +                     if (!cache_node)
>>> +                             break;
>>> +                     if (cache_node == np && id < min_id) {
>>
>> Why do you carry on if id >= min_id?  Id isn't changing.  For that
>> matter why not do this check before iterating the caches at all?
> 
> You're right, no need. There's no need to handle the id in the loop at
> all, we just need to match the cache node. So perhaps just a helper:
> 
> static bool match_cache_node(struct device_node *cpu, const struct
> device_node *cache_node)
> {
>   for (struct device_node *cache __free(device_node) =
>         of_find_next_cache_node(cpu); cache != NULL;
>         cache = of_find_next_cache_node(cache)) {
>     if (cache == cache_node)
>       return true;
>   }
>   return false;
> }
> 
> And then the cpu loop would have:
> 
> if (match_cache_node(cpu, cache_node))
>   min_id = min(min_id, id);
Done,


Thanks!

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32
  2025-06-17 16:14   ` Jonathan Cameron
@ 2025-06-27 16:39     ` James Morse
  0 siblings, 0 replies; 25+ messages in thread
From: James Morse @ 2025-06-27 16:39 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Rob Herring, Ben Horgan

Hi Jonathan,

On 17/06/2025 17:14, Jonathan Cameron wrote:
> On Fri, 13 Jun 2025 13:03:54 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
>> groups of CPUs. The value is also used for PCIe cache steering tags. On
>> DT platforms cache-id is not something that is described in the
>> device-tree, but instead generated from the smallest MPIDR of the CPUs
>> associated with that cache. The cache-id exposed to user-space has
>> historically been 32 bits.
>>
>> MPIDR values may be larger than 32 bits.
>>
>> MPIDR only has 32 bits worth of affinity data, but the aff3 field lives
>> above 32bits. The corresponding lower bits are masked out by
>> MPIDR_HWID_BITMASK and contain an SMT flag and Uni-Processor flag.
>>
>> Swizzzle the aff3 field into the bottom 32 bits and using that.
>>
>> In case more affinity fields are added in the future, the upper RES0
>> area should be checked. Returning a value greater than 32 bits from
>> this helper will cause the caller to give up on allocating cache-ids.

> I'd mention that in the code via a comment, not just the commit message.

Sure!


>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Seems a few unrelated tiny things snuck in here.
> 
> Otherwise seems fine to me.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index 99cd6546e72e..f8798dc96364 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -42,6 +42,7 @@
>>  
>>  #include <asm/cputype.h>
>>  #include <asm/mte-def.h>
>> +#include <asm/suspend.h>

> That seems a little random?  Why?

That's a hangover from a much earlier version that tried to use the MPIDR 'hash' that
cpu-suspend already creates. But the advice was to avoid that as if we find platforms with
wildly sparse MPIDR values, we'd need to make that 'hash' smarter to save memory - and
having an ABI hanging from it would be a hindrance.

Hence the swizzle.

>>  #include <asm/sysreg.h>
>>  
>>  #ifdef CONFIG_KASAN_SW_TAGS
>> @@ -87,6 +88,19 @@ int cache_line_size(void);
>>  
>>  #define dma_get_cache_alignment	cache_line_size
>>  
>> +/* Compress a u64 MPIDR value into 32 bits. */
>> +static inline u64 arch_compact_of_hwid(u64 id)
>> +{
>> +	u64 aff3 = MPIDR_AFFINITY_LEVEL(id, 3);
>> +
>> +	/* These bits are expected to be RES0 */
>> +	if (FIELD_GET(GENMASK_ULL(63, 40), id))
>> +		return id;
> 
> I would add a comment that the way this fails is to ensure
> there are bits in the upper bits.  It is a little unusual
> as APIs go but matches the not defined variant so sort of
> makes sense.

Yup, subtle enough that its worth a comment!

|	/*
|	 * These bits are expected to be RES0. If not, return a value with
|	 * the upper 32 bits set to force the caller to give up on 32 bit
|	 * cache ids.
|	 */


>> +
>> +	return (aff3 << 24) | FIELD_GET(GENMASK_ULL(23, 0), id);
>> +}
>> +#define arch_compact_of_hwid	arch_compact_of_hwid
>> +
>>  /*
>>   * Read the effective value of CTR_EL0.
>>   *
>> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
>> index f093cdf71be1..ebc23304d430 100644
>> --- a/arch/arm64/kernel/sleep.S
>> +++ b/arch/arm64/kernel/sleep.S
>> @@ -50,6 +50,7 @@
>>  	lsr	\mask ,\mask, \rs3
>>  	orr	\dst, \dst, \mask		// dst|=(aff3>>rs3)
>>  	.endm
>> +
> 
> Stray change.

This is a left over from trying to use the above 'hash' directly.

All fixed,


Thanks!

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-06-27  5:54     ` Shaopeng Tan (Fujitsu)
@ 2025-06-27 16:39       ` James Morse
  0 siblings, 0 replies; 25+ messages in thread
From: James Morse @ 2025-06-27 16:39 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), 'Jonathan Cameron'
  Cc: 'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org',
	'Greg Kroah-Hartman', 'Rafael J . Wysocki',
	'sudeep.holla@arm.com', 'Rob Herring',
	'Ben Horgan'

Hi Shaopeng,

On 27/06/2025 06:54, Shaopeng Tan (Fujitsu) wrote:
>> On Fri, 13 Jun 2025 13:03:55 +0000
>> James Morse <james.morse@arm.com> wrote:
>>> The MPAM driver identifies caches by id for use with resctrl. It needs
>>> to know the cache-id when probe-ing, but the value isn't set in
>>> cacheinfo until the corresponding CPU comes online.
>>>
>>> Expose the code that generates the cache-id. This allows the MPAM
>>> driver to determine the properties of the caches without waiting for
>>> all CPUs to come online.

>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index
>>> d8e5b4c7156c..6316d80abab8 100644
>>> --- a/drivers/base/cacheinfo.c
>>> +++ b/drivers/base/cacheinfo.c
>>> @@ -200,7 +200,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf,
>> struct device_node *np)
>>>  		id = arch_compact_of_hwid(id);
>>>  		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {

> Since "id" was compressed into 32bits by the function arch_compact_of_hwid(),
> is this required?

The need for this is clearer in the patch that introduces it - arch_compact_of_hwid() may
not be implemented by all architectures that use OF, and arch_compact_of_hwid() needs to
be able to fail if it can't produce a 32bit version of the hwid. (on arm64 this would
happen if an aff4 was allocated in the RES0 bits of MPIDR_EL1 - which is why the helper
checks those bits are all zero).
This check ensures that if any cache-id is greater than 32 bits, then the platform doesn't
expose any cache-id to user-space, which will let use fix it up in some way without
changing the values user-space saw for the 'other' caches.


Thanks,

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-06-27 16:38     ` James Morse
@ 2025-06-30 19:43       ` Rob Herring
  2025-07-04 17:39         ` James Morse
  0 siblings, 1 reply; 25+ messages in thread
From: Rob Herring @ 2025-06-30 19:43 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote:
>
> Hi Rob,
>
> On 23/06/2025 15:48, Rob Herring wrote:
> > On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
> >> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> >> groups of CPUs. The value is also used for PCIe cache steering tags. On
> >> DT platforms cache-id is not something that is described in the
> >> device-tree, but instead generated from the smallest CPU h/w id of the
> >> CPUs associated with that cache.
> >>
> >> CPU h/w ids may be larger than 32 bits.
> >>
> >> Add a hook to allow architectures to compress the value from the devicetree
> >> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> >> will stop if a value larger than 32 bits is seen.
> >>
> >> For example, on arm64 the value is the MPIDR affinity register, which only
> >> has 32 bits of affinity data, but spread across the 64 bit field. An
> >> arch-specific bit swizzle gives a 32 bit value.
>
> > What's missing here is why do we need the cache id to be only 32-bits?
> > I suppose it is because the sysfs 'id' file has been implicitly that?
>
> Yup, and its too late to change.
>
>
> > Why can't we just allow 64-bit values there? Obviously, you can't have
> > a 64-bit value on x86 because that might break existing userspace.
>
> It's the same user-space. Users of resctrl should be portable between architectures.
> Resctrl isn't the only user, of the cache-id field.
>
>
> > But for Arm, there is no existing userspace to break.
>
> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588

Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1].

> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/

Is that even applied yet?

> > Even with 32-bits,
> > it is entirely possible that an existing userspace assumed values less
> > than 32-bits and would be broken for Arm as-is.
>
> Sure, but I've not found a project where that is broken yet.
>
>
> > It is obviously nice
> > if we can avoid modifying userspace, but I don't think that's a
> > requirement and I'd be surprised if there's not other things that need
> > to be adapted for MPAM support.
>
> The whole multi-year effort has been to make existing user-space work without any ABI
> changes. The effect is some platforms have features that can't be used because resctrl
> expects things to be Xeon shaped.
> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
> doesn't have an equivalent, so '1' is exposed as a safe value.

Fair enough, but I'd be rather surprised if there doesn't end up being
changes to support Arm platforms.

> > Also, what if an architecture can't swizzle their value into 32-bits?
> > They would be stuck with requiring userspace to deal with 64-bit
> > values.
>
> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.

What about using the logical CPU number instead? That's stable for a
given machine and firmware. And then instead of having 3 sets of
numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only
have 2. And logical CPU is what sysfs already exposes to userspace.

Rob

[1] https://libvirt.org/news.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-06-30 19:43       ` Rob Herring
@ 2025-07-04 17:39         ` James Morse
  2025-07-07 17:41           ` Rob Herring
  0 siblings, 1 reply; 25+ messages in thread
From: James Morse @ 2025-07-04 17:39 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

Hi Rob,

On 30/06/2025 20:43, Rob Herring wrote:
> On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote:
>> On 23/06/2025 15:48, Rob Herring wrote:
>>> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>>>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
>>>> groups of CPUs. The value is also used for PCIe cache steering tags. On
>>>> DT platforms cache-id is not something that is described in the
>>>> device-tree, but instead generated from the smallest CPU h/w id of the
>>>> CPUs associated with that cache.
>>>>
>>>> CPU h/w ids may be larger than 32 bits.
>>>>
>>>> Add a hook to allow architectures to compress the value from the devicetree
>>>> into 32 bits. Returning the same value is always safe as cache_of_set_id()
>>>> will stop if a value larger than 32 bits is seen.
>>>>
>>>> For example, on arm64 the value is the MPIDR affinity register, which only
>>>> has 32 bits of affinity data, but spread across the 64 bit field. An
>>>> arch-specific bit swizzle gives a 32 bit value.
>>
>>> What's missing here is why do we need the cache id to be only 32-bits?
>>> I suppose it is because the sysfs 'id' file has been implicitly that?
>>
>> Yup, and its too late to change.
>>
>>
>>> Why can't we just allow 64-bit values there? Obviously, you can't have
>>> a 64-bit value on x86 because that might break existing userspace.
>>
>> It's the same user-space. Users of resctrl should be portable between architectures.
>> Resctrl isn't the only user, of the cache-id field.
>>
>>
>>> But for Arm, there is no existing userspace to break.
>>
>> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> 
> Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1].

'when mounted with [a particular option]'

AMDs bandwidth controls count in 1/8ths of 1GB/s - and you have to know you're running on
an AMD machine. I'm aiming for the arm64 support to be portable between Intel and RISC-V.


>> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> 
> Is that even applied yet?

No idea, but its equally likely that I haven't found all the places this gets parsed by
user-space. I don't think we have a way of telling people using stable-kernels that we
might change the size of that field. It's pretty clear people don't anticipate it changing!

This is just the downside of exposing anything to user-space!

[...]

>>> It is obviously nice
>>> if we can avoid modifying userspace, but I don't think that's a
>>> requirement and I'd be surprised if there's not other things that need
>>> to be adapted for MPAM support.
>>
>> The whole multi-year effort has been to make existing user-space work without any ABI
>> changes. The effect is some platforms have features that can't be used because resctrl
>> expects things to be Xeon shaped.
>> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
>> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
>> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
>> doesn't have an equivalent, so '1' is exposed as a safe value.
> 
> Fair enough, but I'd be rather surprised if there doesn't end up being
> changes to support Arm platforms.
> 
>>> Also, what if an architecture can't swizzle their value into 32-bits?
>>> They would be stuck with requiring userspace to deal with 64-bit
>>> values.
>>
>> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.

> What about using the logical CPU number instead? That's stable for a
> given machine and firmware.

Hmmm, if you offline CPU-0 then kexec, then CPU-1 becomes the new CPU-0 and the numbers
get doled out differently.


> And then instead of having 3 sets of
> numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only
> have 2. And logical CPU is what sysfs already exposes to userspace.

I don't think the linux allocated CPU number is robust enough.

We could use the CPU number as seen when walking through the DT to make it stable, but
that would still be a third type of number. It would save the arch hook to swizzle the
bits, but changing the DT would change the numbers which doesn't happen with this scheme.

Let me know if that is what you prefer.
(I'll summarise this on the cover-letter/patch-1 of the incoming series)


Thanks,

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
  2025-07-04 17:39         ` James Morse
@ 2025-07-07 17:41           ` Rob Herring
  0 siblings, 0 replies; 25+ messages in thread
From: Rob Herring @ 2025-07-07 17:41 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, Greg Kroah-Hartman,
	Rafael J . Wysocki, sudeep.holla, Ben Horgan

On Fri, Jul 4, 2025 at 12:39 PM James Morse <james.morse@arm.com> wrote:
>
> Hi Rob,
>
> On 30/06/2025 20:43, Rob Herring wrote:
> > On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote:
> >> On 23/06/2025 15:48, Rob Herring wrote:
> >>> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
> >>>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> >>>> groups of CPUs. The value is also used for PCIe cache steering tags. On
> >>>> DT platforms cache-id is not something that is described in the
> >>>> device-tree, but instead generated from the smallest CPU h/w id of the
> >>>> CPUs associated with that cache.
> >>>>
> >>>> CPU h/w ids may be larger than 32 bits.
> >>>>
> >>>> Add a hook to allow architectures to compress the value from the devicetree
> >>>> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> >>>> will stop if a value larger than 32 bits is seen.
> >>>>
> >>>> For example, on arm64 the value is the MPIDR affinity register, which only
> >>>> has 32 bits of affinity data, but spread across the 64 bit field. An
> >>>> arch-specific bit swizzle gives a 32 bit value.
> >>
> >>> What's missing here is why do we need the cache id to be only 32-bits?
> >>> I suppose it is because the sysfs 'id' file has been implicitly that?
> >>
> >> Yup, and its too late to change.
> >>
> >>
> >>> Why can't we just allow 64-bit values there? Obviously, you can't have
> >>> a 64-bit value on x86 because that might break existing userspace.
> >>
> >> It's the same user-space. Users of resctrl should be portable between architectures.
> >> Resctrl isn't the only user, of the cache-id field.
> >>
> >>
> >>> But for Arm, there is no existing userspace to break.
> >>
> >> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> >
> > Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1].
>
> 'when mounted with [a particular option]'
>
> AMDs bandwidth controls count in 1/8ths of 1GB/s - and you have to know you're running on
> an AMD machine. I'm aiming for the arm64 support to be portable between Intel and RISC-V.
>
>
> >> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> >
> > Is that even applied yet?
>
> No idea, but its equally likely that I haven't found all the places this gets parsed by
> user-space. I don't think we have a way of telling people using stable-kernels that we
> might change the size of that field. It's pretty clear people don't anticipate it changing!
>
> This is just the downside of exposing anything to user-space!
>
> [...]
>
> >>> It is obviously nice
> >>> if we can avoid modifying userspace, but I don't think that's a
> >>> requirement and I'd be surprised if there's not other things that need
> >>> to be adapted for MPAM support.
> >>
> >> The whole multi-year effort has been to make existing user-space work without any ABI
> >> changes. The effect is some platforms have features that can't be used because resctrl
> >> expects things to be Xeon shaped.
> >> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
> >> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
> >> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
> >> doesn't have an equivalent, so '1' is exposed as a safe value.
> >
> > Fair enough, but I'd be rather surprised if there doesn't end up being
> > changes to support Arm platforms.
> >
> >>> Also, what if an architecture can't swizzle their value into 32-bits?
> >>> They would be stuck with requiring userspace to deal with 64-bit
> >>> values.
> >>
> >> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.
>
> > What about using the logical CPU number instead? That's stable for a
> > given machine and firmware.
>
> Hmmm, if you offline CPU-0 then kexec, then CPU-1 becomes the new CPU-0 and the numbers
> get doled out differently.

Ah, I thought they were more stable than that, but indeed there's a
special case for the boot CPU.


> > And then instead of having 3 sets of
> > numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only
> > have 2. And logical CPU is what sysfs already exposes to userspace.
>
> I don't think the linux allocated CPU number is robust enough.
>
> We could use the CPU number as seen when walking through the DT to make it stable, but
> that would still be a third type of number. It would save the arch hook to swizzle the
> bits, but changing the DT would change the numbers which doesn't happen with this scheme.
>
> Let me know if that is what you prefer.
> (I'll summarise this on the cover-letter/patch-1 of the incoming series)

Let's leave it with what you have...

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-07-07 17:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 13:03 [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data James Morse
2025-06-13 13:03 ` [PATCH 1/5] " James Morse
2025-06-17 16:03   ` Jonathan Cameron
2025-06-23 14:18     ` Rob Herring
2025-06-27 16:38       ` James Morse
2025-06-13 13:03 ` [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id James Morse
2025-06-17 16:05   ` Jonathan Cameron
2025-06-23 14:48   ` Rob Herring
2025-06-27 16:38     ` James Morse
2025-06-30 19:43       ` Rob Herring
2025-07-04 17:39         ` James Morse
2025-07-07 17:41           ` Rob Herring
2025-06-13 13:03 ` [PATCH 3/5] arm64: cacheinfo: Provide helper to compress MPIDR value into u32 James Morse
2025-06-17 16:14   ` Jonathan Cameron
2025-06-27 16:39     ` James Morse
2025-06-13 13:03 ` [PATCH 4/5] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
2025-06-17 16:21   ` Jonathan Cameron
2025-06-27  5:54     ` Shaopeng Tan (Fujitsu)
2025-06-27 16:39       ` James Morse
2025-06-27 16:38     ` James Morse
2025-06-13 13:03 ` [PATCH 5/5] cacheinfo: Add helper to find the cache size from cpu+level James Morse
2025-06-17 16:28   ` Jonathan Cameron
2025-06-27 16:38     ` James Morse
2025-06-23 15:05 ` [PATCH 0/5] cacheinfo: Set cache 'id' based on DT data Rob Herring
2025-06-27 16:38   ` James Morse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).