[PATCH v2 00/29] arm_mpam: Add basic mpam driver

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/29] arm_mpam: Add basic mpam driver
@ 2025-09-10 20:42 James Morse
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
                   ` (29 more replies)
  0 siblings, 30 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hello,

The major changes since v1 are:
 * DT got ripped out - see below.
 * The mon_sel locking was simplified - but that will come back.
 
 Otherwise the myriad of changes are noted on each patch.
 
~

This is just enough MPAM driver for ACPI. DT got ripped out. If you need DT
support - please share your DTS so the DT folk know the binding is what is
needed.
This doesn't contain any of the resctrl code, meaning you can't actually drive it
from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
This will change once the user interface is connected up.

This is the initial group of patches that allows the resctrl code to be built
on top. Including that will increase the number of trees that may need to
coordinate, so breaking it up make sense.

The locking got simplified, but is still strange - this is because of the 'mpam-fb'
firmware interface specification that is still alpha. That thing needs to wait for
an interrupt after every system register write, which significantly impacts the
driver. Some features just won't work, e.g. reading the monitor registers via
perf.

I've not found a platform that can test all the behaviours around the monitors,
so this is where I'd expect the most bugs.

The MPAM spec that describes all the system and MMIO registers can be found here:
https://developer.arm.com/documentation/ddi0598/db/?lang=en
(Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
 This document has the best overview)

The expectation is this will go via the arm64 tree.


This series is based on v6.17-rc4, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v2

The rest of the driver can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.17-rc4

What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/

This series was previously posted here:
[v1] lore.kernel.org/r/20250822153048.2287-1-james.morse@arm.com
[RFC] lore.kernel.org/r/20250711183648.30766-2-james.morse@arm.com


Bugs welcome,
Thanks,

James Morse (27):
  ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear
    levels
  ACPI / PPTT: Find cache level by cache-id
  ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  arm64: kconfig: Add Kconfig entry for MPAM
  ACPI / MPAM: Parse the MPAM table
  arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  arm_mpam: Add the class and component structures for firmware
    described ris
  arm_mpam: Add MPAM MSC register layout definitions
  arm_mpam: Add cpuhp callbacks to probe MSC hardware
  arm_mpam: Probe hardware to find the supported partid/pmg values
  arm_mpam: Add helpers for managing the locking around the mon_sel
    registers
  arm_mpam: Probe the hardware features resctrl supports
  arm_mpam: Merge supported features during mpam_enable() into
    mpam_class
  arm_mpam: Reset MSC controls from cpu hp callbacks
  arm_mpam: Add a helper to touch an MSC from any CPU
  arm_mpam: Extend reset logic to allow devices to be reset any time
  arm_mpam: Register and enable IRQs
  arm_mpam: Use a static key to indicate when mpam is enabled
  arm_mpam: Allow configuration to be applied and restored during cpu
    online
  arm_mpam: Probe and reset the rest of the features
  arm_mpam: Add helpers to allocate monitors
  arm_mpam: Add mpam_msmon_read() to read monitor value
  arm_mpam: Track bandwidth counter state for overflow and power
    management
  arm_mpam: Add helper to reset saved mbwu state
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add kunit tests for props_mismatch()

Rohit Mathew (2):
  arm_mpam: Probe for long/lwd mbwu counters
  arm_mpam: Use long MBWU counters if supported

 arch/arm64/Kconfig                  |   25 +
 drivers/Kconfig                     |    2 +
 drivers/Makefile                    |    1 +
 drivers/acpi/arm64/Kconfig          |    3 +
 drivers/acpi/arm64/Makefile         |    1 +
 drivers/acpi/arm64/mpam.c           |  361 ++++
 drivers/acpi/pptt.c                 |  224 ++-
 drivers/acpi/tables.c               |    2 +-
 drivers/resctrl/Kconfig             |   24 +
 drivers/resctrl/Makefile            |    4 +
 drivers/resctrl/mpam_devices.c      | 2668 +++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h     |  676 +++++++
 drivers/resctrl/test_mpam_devices.c |  389 ++++
 include/linux/acpi.h                |   26 +
 include/linux/arm_mpam.h            |   58 +
 15 files changed, 4455 insertions(+), 9 deletions(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h
 create mode 100644 drivers/resctrl/test_mpam_devices.c
 create mode 100644 include/linux/arm_mpam.h

-- 
2.39.5



^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 10:43   ` Jonathan Cameron
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
                   ` (28 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

The ACPI MPAM table uses the UID of a processor container specified in
the PPTT to indicate the subset of CPUs and cache topology that can
access each MPAM System Component (MSC).

This information is not directly useful to the kernel. The equivalent
cpumask is needed instead.

Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.

CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Replaced commit message with wording from Dave.
 * Fixed a stray plural.
 * Moved further down in the file to make use of get_pptt() helper.
 * Added a break to exit the loop early.

Changes since RFC:
 * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()

Changes since RFC:
 * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
 * Added missing : in kernel-doc
 * Made helper return void as this never actually returns an error.
---
 drivers/acpi/pptt.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  3 ++
 2 files changed, 86 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..1728545d90b2 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -817,3 +817,86 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
+ * @table_hdr:		A reference to the PPTT table.
+ * @parent_node:	A pointer to the processor node in the @table_hdr.
+ * @cpus:		A cpumask to fill with the CPUs below @parent_node.
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+				     struct acpi_pptt_processor *parent_node,
+				     cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_id;
+	int cpu;
+
+	cpumask_clear(cpus);
+
+	for_each_possible_cpu(cpu) {
+		acpi_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+		while (cpu_node) {
+			if (cpu_node == parent_node) {
+				cpumask_set_cpu(cpu, cpus);
+				break;
+			}
+			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+		}
+	}
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ *                                       processor container
+ * @acpi_cpu_id:	The UID of the processor container.
+ * @cpus:		The resulting CPU mask.
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
+ * Container, they may exist purely to describe a Private resource. CPUs
+ * have to be leaves, so a Processor Container is a non-leaf that has the
+ * 'ACPI Processor ID valid' flag set.
+ *
+ * Return: 0 for a complete walk, or an error if the mask is incomplete.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table_hdr;
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 proc_sz;
+
+	cpumask_clear(cpus);
+
+	table_hdr = acpi_get_pptt();
+	if (!table_hdr)
+		return;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+	proc_sz = sizeof(struct acpi_pptt_processor);
+	while ((unsigned long)entry + proc_sz <= table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
+		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
+			if (!acpi_pptt_leaf_node(table_hdr, cpu_node)) {
+				if (cpu_node->acpi_processor_id == acpi_cpu_id) {
+					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+					break;
+				}
+			}
+		}
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1c5bb1e887cd..f97a9ff678cc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
 int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 {
 	return -EINVAL;
 }
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+						     cpumask_t *cpus) { }
 #endif
 
 void acpi_arch_init(void);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 10:46   ` Jonathan Cameron
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
                   ` (27 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

In acpi_count_levels(), the initial value of *levels passed by the
caller is really an implementation detail of acpi_count_levels(), so it
is unreasonable to expect the callers of this function to know what to
pass in for this parameter.  The only sensible initial value is 0,
which is what the only upstream caller (acpi_get_cache_info()) passes.

Use a local variable for the starting cache level in acpi_count_levels(),
and pass the result back to the caller via the function return value.

Gid rid of the levels parameter, which has no remaining purpose.

Fix acpi_get_cache_info() to match.

Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
---
Changes since v1:
 * Rewritten commit message from Dave.
 * Minor changes to kernel doc comment.
 * Keep the much loved typo.

Changes since RFC:
 * Made acpi_count_levels() return the levels value.
---
 drivers/acpi/pptt.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 1728545d90b2..7af7d62597df 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -177,14 +177,14 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
 }
 
 /**
- * acpi_count_levels() - Given a PPTT table, and a CPU node, count the cache
- * levels and split cache levels (data/instruction).
+ * acpi_count_levels() - Given a PPTT table, and a CPU node, count the
+ * total number of levels and split cache levels (data/instruction).
  * @table_hdr: Pointer to the head of the PPTT table
  * @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
  * @split_levels:	Number of split cache levels (data/instruction) if
  *			success. Can by NULL.
  *
+ * Return: number of levels.
  * Given a processor node containing a processing unit, walk into it and count
  * how many levels exist solely for it, and then walk up each level until we hit
  * the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * split cache levels (data/instruction) that exist at each level on the way
  * up.
  */
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
-			      struct acpi_pptt_processor *cpu_node,
-			      unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node,
+			     unsigned int *split_levels)
 {
+	int starting_level = 0;
+
 	do {
-		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
 		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
 	} while (cpu_node);
+
+	return starting_level;
 }
 
 /**
@@ -645,7 +649,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
 	if (!cpu_node)
 		return -ENOENT;
 
-	acpi_count_levels(table, cpu_node, levels, split_levels);
+	*levels = acpi_count_levels(table, cpu_node, split_levels);
 
 	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
 		 *levels, split_levels ? *split_levels : -1);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
  2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 10:59   ` Jonathan Cameron
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
                   ` (26 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.

Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.

Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
 * Removed a confusing comment.
 * Clarified the kernel doc.

Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  5 ++++
 2 files changed, 67 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 7af7d62597df..c5f2a51d280b 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
 				     entry->length);
 	}
 }
+
+/*
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the unified cache
+ *
+ * Determine the level relative to any CPU for the unified cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group unified caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * If one CPUs L2 is shared with another as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
+ * the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_table_header *table;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+
+	table = acpi_get_pptt();
+	if (!table)
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			return -ENOENT;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				return level;
+		}
+	}
+
+	return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f97a9ff678cc..5bdca5546697 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1542,6 +1542,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1565,6 +1566,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
 						     cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	return -EINVAL;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (2 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 11:06   ` Jonathan Cameron
  2025-10-02  5:03   ` Fenghua Yu
  2025-09-10 20:42 ` [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
                   ` (25 subsequent siblings)
  29 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew

MPAM identifies CPUs by the cache_id in the PPTT cache structure.

The driver needs to know which CPUs are associated with the cache.
The CPUs may not all be online, so cacheinfo does not have the
information.

Add a helper to pull this information out of the PPTT.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
---
Changes since v1:
 * Added punctuation to the commit message.
 * Removed a comment about an alternative implementaion.
 * Made the loop continue with a warning if a CPU is missing from the PPTT.

Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  6 +++++
 2 files changed, 65 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index c5f2a51d280b..c379a9952b00 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -966,3 +966,62 @@ int find_acpi_cache_level_from_id(u32 cache_id)
 
 	return -ENOENT;
 }
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ *					   specified cache
+ * @cache_id: The id field of the unified cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+	cpumask_clear(cpus);
+
+	if (IS_ERR(table))
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (WARN_ON_ONCE(!cpu_node))
+			continue;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				cpumask_set_cpu(cpu, cpus);
+		}
+	}
+
+	return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 5bdca5546697..c5fd92cda487 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1543,6 +1543,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1570,6 +1571,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
 {
 	return -EINVAL;
 }
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+						      cpumask_t *cpus)
+{
+	return -ENOENT;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (3 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 10:14   ` Ben Horgan
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
                   ` (24 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it. As MPAM is only found on arm64
platforms, the arm64 tree is the most natural home for the Kconfig
option.

This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and to register properties
of CPUs with the MPAM driver.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
CC: Dave Martin <dave.martin@arm.com>
---
Changes since v1:
 * Help text rewritten by Dave.
---
 arch/arm64/Kconfig | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..4be8a13505bf 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2060,6 +2060,29 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 
+config ARM64_MPAM
+	bool "Enable support for MPAM"
+	help
+	  Memory System Resource Partitioning and Monitoring (MPAM) is an
+	  optional extension to the Arm architecture that allows each
+	  transaction issued to the memory system to be labelled with a
+	  Partition identifier (PARTID) and Performance Monitoring Group
+	  identifier (PMG).
+
+	  Memory system components, such as the caches, can be configured with
+	  policies to control how much of various physical resources (such as
+	  memory bandwidth or cache memory) the transactions labelled with each
+	  PARTID can consume.  Depending on the capabilities of the hardware,
+	  the PARTID and PMG can also be used as filtering criteria to measure
+	  the memory system resource consumption of different parts of a
+	  workload.
+
+	  Use of this extension requires CPU support, support in the
+	  Memory System Components (MSC), and a description from firmware
+	  of where the MSCs are in the address space.
+
+	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (4 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 13:17   ` Jonathan Cameron
                     ` (4 more replies)
  2025-09-10 20:42 ` [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                   ` (23 subsequent siblings)
  29 siblings, 5 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.

For now the MPAM hook mpam_ris_create() is stubbed out, but will update
the MPAM driver with optional discovered data.

CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v1:
 * Whitespace.
 * Gave GLOBAL_AFFINITY a pre-processor'd name.
 * Fixed assumption that there are zero functional dependencies.
 * Bounds check walking of the MSC RIS.
 * More bounds checking in the main table walk.
 * Check for nonsense numbers of function dependencies.
 * Smattering of pr_debug() to help folk feeding line-noise to the parser.
 * Changed the comment flavour on the SPDX string.
 * Removed additional table check.
 * More comment wrangling.

Changes since RFC:
 * Used DEFINE_RES_IRQ_NAMED() and friends macros.
 * Additional error handling.
 * Check for zero sized MSC.
 * Allow table revisions greater than 1. (no spec for revision 0!)
 * Use cleanup helpers to retrive ACPI tables, which allows some functions
   to be folded together.
---
 arch/arm64/Kconfig          |   1 +
 drivers/acpi/arm64/Kconfig  |   3 +
 drivers/acpi/arm64/Makefile |   1 +
 drivers/acpi/arm64/mpam.c   | 361 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c       |   2 +-
 include/linux/acpi.h        |  12 ++
 include/linux/arm_mpam.h    |  48 +++++
 7 files changed, 427 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 include/linux/arm_mpam.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4be8a13505bf..6487c511bdc6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
 	  optional extension to the Arm architecture that allows each
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
 
 config ACPI_APMT
 	bool
+
+config ACPI_MPAM
+	bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
 obj-$(CONFIG_ACPI_FFH)		+= ffh.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
+obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
 obj-$(CONFIG_ARM_AMBA)		+= amba.o
 obj-y				+= dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..fd9cfa143676
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,361 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
+
+static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
+				   u32 flags, int *irq,
+				   u32 processor_container_uid)
+{
+	int sense;
+
+	if (!intid)
+		return false;
+
+	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
+	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+		return false;
+
+	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
+
+	if (16 <= intid && intid < 32 && processor_container_uid != GLOBAL_AFFINITY) {
+		pr_err_once("Partitioned interrupts not supported\n");
+		return false;
+	}
+
+	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
+	if (*irq <= 0) {
+		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
+			    intid);
+		return false;
+	}
+
+	return true;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+				 struct acpi_mpam_msc_node *tbl_msc,
+				 struct resource *res, int *res_idx)
+{
+	u32 flags, aff;
+	int irq;
+
+	flags = tbl_msc->overflow_interrupt_flags;
+	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+		aff = tbl_msc->overflow_interrupt_affinity;
+	else
+		aff = GLOBAL_AFFINITY;
+	if (acpi_mpam_register_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+	flags = tbl_msc->error_interrupt_flags;
+	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+		aff = tbl_msc->error_interrupt_affinity;
+	else
+		aff = GLOBAL_AFFINITY;
+	if (acpi_mpam_register_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+				    struct acpi_mpam_resource_node *res)
+{
+	int level, nid;
+	u32 cache_id;
+
+	switch (res->locator_type) {
+	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+		cache_id = res->locator.cache_locator.cache_reference;
+		level = find_acpi_cache_level_from_id(cache_id);
+		if (level <= 0) {
+			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
+			return -EINVAL;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+				       level, cache_id);
+	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+		if (nid == NUMA_NO_NODE)
+			nid = 0;
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+				       255, nid);
+	default:
+		/* These get discovered later and treated as unknown */
+		return 0;
+	}
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc)
+{
+	int i, err;
+	char *ptr, *table_end;
+	struct acpi_mpam_resource_node *resource;
+
+	ptr = (char *)(tbl_msc + 1);
+	table_end = ptr + tbl_msc->length;
+	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+		u64 max_deps, remaining_table;
+
+		if (ptr + sizeof(*resource) > table_end)
+			return -EINVAL;
+
+		resource = (struct acpi_mpam_resource_node *)ptr;
+
+		remaining_table = table_end - ptr;
+		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
+		if (resource->num_functional_deps > max_deps) {
+			pr_debug("MSC has impossible number of functional dependencies\n");
+			return -EINVAL;
+		}
+
+		err = acpi_mpam_parse_resource(msc, resource);
+		if (err)
+			return err;
+
+		ptr += sizeof(*resource);
+		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
+	}
+
+	return 0;
+}
+
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+				     struct platform_device *pdev,
+				     u32 *acpi_id)
+{
+	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
+	bool acpi_id_valid = false;
+	struct acpi_device *buddy;
+	char uid[11];
+	int err;
+
+	memset(&hid, 0, sizeof(hid));
+	memcpy(hid, &tbl_msc->hardware_id_linked_device,
+	       sizeof(tbl_msc->hardware_id_linked_device));
+
+	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+		*acpi_id = tbl_msc->instance_id_linked_device;
+		acpi_id_valid = true;
+	}
+
+	err = snprintf(uid, sizeof(uid), "%u",
+		       tbl_msc->instance_id_linked_device);
+	if (err >= sizeof(uid)) {
+		pr_debug("Failed to convert uid of device for power management.");
+		return acpi_id_valid;
+	}
+
+	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+	if (buddy)
+		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+	return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+				 enum mpam_msc_iface *iface)
+{
+	switch (tbl_msc->interface_type) {
+	case 0:
+		*iface = MPAM_IFACE_MMIO;
+		return 0;
+	case 0xa:
+		*iface = MPAM_IFACE_PCC;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int __init acpi_mpam_parse(void)
+{
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct property_entry props[4]; /* needs a sentinel */
+	struct acpi_mpam_msc_node *tbl_msc;
+	int next_res, next_prop, err = 0;
+	struct acpi_device *companion;
+	struct platform_device *pdev;
+	enum mpam_msc_iface iface;
+	struct resource res[3];
+	char uid[16];
+	u32 acpi_id;
+
+	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		table_offset += tbl_msc->length;
+
+		if (table_offset > table_end) {
+			pr_debug("MSC entry overlaps end of ACPI table\n");
+			break;
+		}
+
+		/*
+		 * If any of the reserved fields are set, make no attempt to
+		 * parse the MSC structure. This MSC will still be counted,
+		 * meaning the MPAM driver can't probe against all MSC, and
+		 * will never be enabled. There is no way to enable it safely,
+		 * because we cannot determine safe system-wide partid and pmg
+		 * ranges in this situation.
+		 */
+		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
+			pr_err_once("Unrecognised MSC, MPAM not usable\n");
+			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
+			continue;
+		}
+
+		if (!tbl_msc->mmio_size) {
+			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
+			continue;
+		}
+
+		if (decode_interface_type(tbl_msc, &iface)) {
+			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
+			continue;
+		}
+
+		next_res = 0;
+		next_prop = 0;
+		memset(res, 0, sizeof(res));
+		memset(props, 0, sizeof(props));
+
+		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
+		if (!pdev) {
+			err = -ENOMEM;
+			break;
+		}
+
+		if (tbl_msc->length < sizeof(*tbl_msc)) {
+			err = -EINVAL;
+			break;
+		}
+
+		/* Some power management is described in the namespace: */
+		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+		if (err > 0 && err < sizeof(uid)) {
+			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+			if (companion)
+				ACPI_COMPANION_SET(&pdev->dev, companion);
+			else
+				pr_debug("MSC.%u: missing namespace entry\n", tbl_msc->identifier);
+		}
+
+		if (iface == MPAM_IFACE_MMIO) {
+			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+							       tbl_msc->mmio_size,
+							       "MPAM:MSC");
+		} else if (iface == MPAM_IFACE_PCC) {
+			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+								tbl_msc->base_address);
+			next_prop++;
+		}
+
+		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+		err = platform_device_add_resources(pdev, res, next_res);
+		if (err)
+			break;
+
+		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+							tbl_msc->max_nrdy_usec);
+
+		/*
+		 * The MSC's CPU affinity is described via its linked power
+		 * management device, but only if it points at a Processor or
+		 * Processor Container.
+		 */
+		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
+			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
+								acpi_id);
+		}
+
+		err = device_create_managed_software_node(&pdev->dev, props,
+							  NULL);
+		if (err)
+			break;
+
+		/* Come back later if you want the RIS too */
+		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+		if (err)
+			break;
+
+		err = platform_device_add(pdev);
+		if (err)
+			break;
+	}
+
+	if (err)
+		platform_device_put(pdev);
+
+	return err;
+}
+
+int acpi_mpam_count_msc(void)
+{
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct acpi_mpam_msc_node *tbl_msc;
+	int count = 0;
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		if (tbl_msc->length < sizeof(*tbl_msc))
+			return -EINVAL;
+		if (tbl_msc->length > table_end - table_offset)
+			return -EINVAL;
+		table_offset += tbl_msc->length;
+
+		count++;
+	}
+
+	return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index fa9bb8c8ce95..835e3795ede3 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
 	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
 	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
 	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
-	ACPI_SIG_NBFT };
+	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c5fd92cda487..af449964426b 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_ACPI_H
 #define _LINUX_ACPI_H
 
+#include <linux/cleanup.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>	/* for struct resource */
 #include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+	struct acpi_table_header *table;
+	int status = acpi_get_table(signature, instance, &table);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-ENOENT);
+	return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
 		unsigned long table_size, int entry_id,
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..3d6c39c667c3
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+#define GLOBAL_AFFINITY		~0
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
+	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
+	MPAM_CLASS_MEMORY,      /* Main memory */
+	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+/* Parse the ACPI description of resources entries for this MSC. */
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+					    struct acpi_mpam_msc_node *tbl_msc)
+{
+	return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (5 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 13:35   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
                   ` (22 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.

Start with driver probe/remove and mapping the MSC.

CC: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Avoid selecting driver on other architectrues.
 * Removed PCC support stub.
 * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
 * Clarified a comment.
 * Stopped using mpam_num_msc as an id,a and made it atomic.
 * Size of -1 returned from cache_of_calculate_id()
 * Renamed some struct members.
 * Made a bunch of pr_err() dev_err_ocne().
 * Used more cleanup magic.
 * Inlined a print message.
 * Fixed error propagation from mpam_dt_parse_resources().
 * Moved cache accessibility checks earlier.

Changes since RFC:
 * Check for status=broken DT devices.
 * Moved all the files around.
 * Made Kconfig symbols depend on EXPERT
---
 arch/arm64/Kconfig              |   1 +
 drivers/Kconfig                 |   2 +
 drivers/Makefile                |   1 +
 drivers/resctrl/Kconfig         |  14 +++
 drivers/resctrl/Makefile        |   4 +
 drivers/resctrl/mpam_devices.c  | 180 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  65 ++++++++++++
 7 files changed, 267 insertions(+)
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6487c511bdc6..93e563e1cce4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ARM64_MPAM_DRIVER if EXPERT
 	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
 
 source "drivers/cdx/Kconfig"
 
+source "drivers/resctrl/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index b5749cf67044..f41cf4eddeba 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
 obj-$(CONFIG_DRM_ACCEL)		+= accel/
 obj-$(CONFIG_CDX_BUS)		+= cdx/
 obj-$(CONFIG_DPLL)		+= dpll/
+obj-y				+= resctrl/
 
 obj-$(CONFIG_S390)		+= s390/
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..c30532a3a3a4
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,14 @@
+menuconfig ARM64_MPAM_DRIVER
+	bool "MPAM driver"
+	depends on ARM64 && ARM64_MPAM && EXPERT
+	help
+	  MPAM driver for System IP, e,g. caches and memory controllers.
+
+if ARM64_MPAM_DRIVER
+config ARM64_MPAM_DRIVER_DEBUG
+	bool "Enable debug messages from the MPAM driver"
+	depends on ARM64_MPAM_DRIVER
+	help
+	  Say yes here to enable debug messages from the MPAM driver.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..92b48fa20108
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
+mpam-y						+= mpam_devices.o
+
+cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..efc4738e3b4d
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+static struct srcu_struct mpam_srcu;
+
+/*
+ * Number of MSCs that have been probed. Once all MSC have been probed MPAM
+ * can be enabled.
+ */
+static atomic_t mpam_num_msc;
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+	u32 affinity_id;
+	int err;
+
+	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+				       &affinity_id);
+	if (err)
+		cpumask_copy(&msc->accessibility, cpu_possible_mask);
+	else
+		acpi_pptt_get_cpus_from_container(affinity_id,
+						  &msc->accessibility);
+
+	return 0;
+
+	return err;
+}
+
+static int fw_num_msc;
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+	struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+	if (!msc)
+		return;
+
+	mutex_lock(&mpam_list_lock);
+	platform_set_drvdata(pdev, NULL);
+	list_del_rcu(&msc->all_msc_list);
+	synchronize_srcu(&mpam_srcu);
+	mutex_unlock(&mpam_list_lock);
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	struct mpam_msc *msc;
+	struct resource *msc_res;
+	struct device *dev = &pdev->dev;
+	void *plat_data = pdev->dev.platform_data;
+
+	mutex_lock(&mpam_list_lock);
+	do {
+		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+		if (!msc) {
+			err = -ENOMEM;
+			break;
+		}
+
+		mutex_init(&msc->probe_lock);
+		mutex_init(&msc->part_sel_lock);
+		mutex_init(&msc->outer_mon_sel_lock);
+		raw_spin_lock_init(&msc->inner_mon_sel_lock);
+		msc->id = pdev->id;
+		msc->pdev = pdev;
+		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
+		INIT_LIST_HEAD_RCU(&msc->ris);
+
+		err = update_msc_accessibility(msc);
+		if (err)
+			break;
+		if (cpumask_empty(&msc->accessibility)) {
+			dev_err_once(dev, "MSC is not accessible from any CPU!");
+			err = -EINVAL;
+			break;
+		}
+
+		if (device_property_read_u32(&pdev->dev, "pcc-channel",
+					     &msc->pcc_subspace_id))
+			msc->iface = MPAM_IFACE_MMIO;
+		else
+			msc->iface = MPAM_IFACE_PCC;
+
+		if (msc->iface == MPAM_IFACE_MMIO) {
+			void __iomem *io;
+
+			io = devm_platform_get_and_ioremap_resource(pdev, 0,
+								    &msc_res);
+			if (IS_ERR(io)) {
+				dev_err_once(dev, "Failed to map MSC base address\n");
+				err = PTR_ERR(io);
+				break;
+			}
+			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+			msc->mapped_hwpage = io;
+		}
+
+		list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
+		platform_set_drvdata(pdev, msc);
+	} while (0);
+	mutex_unlock(&mpam_list_lock);
+
+	if (!err) {
+		/* Create RIS entries described by firmware */
+		err = acpi_mpam_parse_resources(msc, plat_data);
+	}
+
+	if (err && msc)
+		mpam_msc_drv_remove(pdev);
+
+	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
+		pr_info("Discovered all MSC\n");
+
+	return err;
+}
+
+static struct platform_driver mpam_msc_driver = {
+	.driver = {
+		.name = "mpam_msc",
+	},
+	.probe = mpam_msc_drv_probe,
+	.remove = mpam_msc_drv_remove,
+};
+
+static int __init mpam_msc_driver_init(void)
+{
+	if (!system_supports_mpam())
+		return -EOPNOTSUPP;
+
+	init_srcu_struct(&mpam_srcu);
+
+	fw_num_msc = acpi_mpam_count_msc();
+
+	if (fw_num_msc <= 0) {
+		pr_err("No MSC devices found in firmware\n");
+		return -EINVAL;
+	}
+
+	return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..7c63d590fc98
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2025 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/resctrl.h>
+#include <linux/sizes.h>
+
+struct mpam_msc {
+	/* member of mpam_all_msc */
+	struct list_head        all_msc_list;
+
+	int			id;
+	struct platform_device *pdev;
+
+	/* Not modified after mpam_is_enabled() becomes true */
+	enum mpam_msc_iface	iface;
+	u32			pcc_subspace_id;
+	struct mbox_client	pcc_cl;
+	struct pcc_mbox_chan	*pcc_chan;
+	u32			nrdy_usec;
+	cpumask_t		accessibility;
+
+	/*
+	 * probe_lock is only taken during discovery. After discovery these
+	 * properties become read-only and the lists are protected by SRCU.
+	 */
+	struct mutex		probe_lock;
+	unsigned long		ris_idxs;
+	u32			ris_max;
+
+	/* mpam_msc_ris of this component */
+	struct list_head	ris;
+
+	/*
+	 * part_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+	 * by RIS).
+	 * If needed, take msc->probe_lock first.
+	 */
+	struct mutex		part_sel_lock;
+
+	/*
+	 * mon_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_MON_SEL.
+	 * If needed, take msc->probe_lock first.
+	 */
+	struct mutex		outer_mon_sel_lock;
+	raw_spinlock_t		inner_mon_sel_lock;
+	unsigned long		inner_mon_sel_flags;
+
+	void __iomem		*mapped_hwpage;
+	size_t			mapped_hwpage_sz;
+};
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity);
+
+#endif /* MPAM_INTERNAL_H */
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (6 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 14:22   ` Jonathan Cameron
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
                   ` (21 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.

To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)

Add support for creating and destroying structures to allow a hierarchy
of resources to be created.

CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Fixed a comp/vmsc typo.
 * Removed duplicate description from the commit message.
 * Moved parenthesis in the add_to_garbage() macro.
 * Check for out of range ris_idx when creating ris.
 * Removed GFP as probe_lock is no longer a spin lock.
 * Removed alloc flag as ended up searching the lists itself.
 * Added a comment about affinity masks not overlapping.

Changes since RFC:
 * removed a pr_err() debug message that crept in.
---
 drivers/resctrl/mpam_devices.c  | 406 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  90 +++++++
 include/linux/arm_mpam.h        |   8 +-
 3 files changed, 493 insertions(+), 11 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index efc4738e3b4d..c7f4981b3545 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -18,7 +18,6 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
-#include <linux/srcu.h>
 #include <linux/types.h>
 
 #include "mpam_internal.h"
@@ -31,7 +30,7 @@
 static DEFINE_MUTEX(mpam_list_lock);
 static LIST_HEAD(mpam_all_msc);
 
-static struct srcu_struct mpam_srcu;
+struct srcu_struct mpam_srcu;
 
 /*
  * Number of MSCs that have been probed. Once all MSC have been probed MPAM
@@ -39,6 +38,402 @@ static struct srcu_struct mpam_srcu;
  */
 static atomic_t mpam_num_msc;
 
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
+	if (!vmsc)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(vmsc);
+
+	INIT_LIST_HEAD_RCU(&vmsc->ris);
+	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+	vmsc->comp = comp;
+	vmsc->msc = msc;
+
+	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+	return vmsc;
+}
+
+static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
+				       struct mpam_msc *msc)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (vmsc->msc->id == msc->id)
+			return vmsc;
+	}
+
+	return mpam_vmsc_alloc(comp, msc);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(comp);
+
+	comp->comp_id = id;
+	INIT_LIST_HEAD_RCU(&comp->vmsc);
+	/* affinity is updated when ris are added */
+	INIT_LIST_HEAD_RCU(&comp->class_list);
+	comp->class = class;
+
+	list_add_rcu(&comp->class_list, &class->components);
+
+	return comp;
+}
+
+static struct mpam_component *
+mpam_component_get(struct mpam_class *class, int id)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(comp, &class->components, class_list) {
+		if (comp->comp_id == id)
+			return comp;
+	}
+
+	return mpam_component_alloc(class, id);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	class = kzalloc(sizeof(*class), GFP_KERNEL);
+	if (!class)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(class);
+
+	INIT_LIST_HEAD_RCU(&class->components);
+	/* affinity is updated when ris are added */
+	class->level = level_idx;
+	class->type = type;
+	INIT_LIST_HEAD_RCU(&class->classes_list);
+
+	list_add_rcu(&class->classes_list, &mpam_classes);
+
+	return class;
+}
+
+static struct mpam_class *
+mpam_class_get(u8 level_idx, enum mpam_class_types type)
+{
+	bool found = false;
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		if (class->type == type && class->level == level_idx) {
+			found = true;
+			break;
+		}
+	}
+
+	if (found)
+		return class;
+
+	return mpam_class_alloc(level_idx, type);
+}
+
+#define add_to_garbage(x)				\
+do {							\
+	__typeof__(x) _x = (x);				\
+	_x->garbage.to_free = _x;			\
+	llist_add(&_x->garbage.llist, &mpam_garbage);	\
+} while (0)
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&class->classes_list);
+	add_to_garbage(class);
+}
+
+static void mpam_comp_destroy(struct mpam_component *comp)
+{
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&comp->class_list);
+	add_to_garbage(comp);
+
+	if (list_empty(&class->components))
+		mpam_class_destroy(class);
+}
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+	struct mpam_component *comp = vmsc->comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&vmsc->comp_list);
+	add_to_garbage(vmsc);
+
+	if (list_empty(&comp->vmsc))
+		mpam_comp_destroy(comp);
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+	struct mpam_vmsc *vmsc = ris->vmsc;
+	struct mpam_msc *msc = vmsc->msc;
+	struct platform_device *pdev = msc->pdev;
+	struct mpam_component *comp = vmsc->comp;
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	/*
+	 * It is assumed affinities don't overlap. If they do the class becomes
+	 * unusable immediately.
+	 */
+	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+	clear_bit(ris->ris_idx, &msc->ris_idxs);
+	list_del_rcu(&ris->vmsc_list);
+	list_del_rcu(&ris->msc_list);
+	add_to_garbage(ris);
+	ris->garbage.pdev = pdev;
+
+	if (list_empty(&vmsc->ris))
+		mpam_vmsc_destroy(vmsc);
+}
+
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+	struct platform_device *pdev = msc->pdev;
+	struct mpam_msc_ris *ris, *tmp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+		mpam_ris_destroy(ris);
+
+	list_del_rcu(&msc->all_msc_list);
+	platform_set_drvdata(pdev, NULL);
+
+	add_to_garbage(msc);
+	msc->garbage.pdev = pdev;
+}
+
+static void mpam_free_garbage(void)
+{
+	struct mpam_garbage *iter, *tmp;
+	struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+	if (!to_free)
+		return;
+
+	synchronize_srcu(&mpam_srcu);
+
+	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+		if (iter->pdev)
+			devm_kfree(&iter->pdev->dev, iter->to_free);
+		else
+			kfree(iter->to_free);
+	}
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity)
+{
+	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (node_id == cpu_to_node(cpu))
+			cpumask_set_cpu(cpu, affinity);
+	}
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+				 enum mpam_class_types type,
+				 struct mpam_class *class,
+				 struct mpam_component *comp)
+{
+	int err;
+
+	switch (type) {
+	case MPAM_CLASS_CACHE:
+		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+						     affinity);
+		if (err)
+			return err;
+
+		if (cpumask_empty(affinity))
+			pr_warn_once("%s no CPUs associated with cache node",
+				     dev_name(&msc->pdev->dev));
+
+		break;
+	case MPAM_CLASS_MEMORY:
+		get_cpumask_from_node_id(comp->comp_id, affinity);
+		/* affinity may be empty for CPU-less memory nodes */
+		break;
+	case MPAM_CLASS_UNKNOWN:
+		return 0;
+	}
+
+	cpumask_and(affinity, affinity, &msc->accessibility);
+
+	return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	int err;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
+		return -EINVAL;
+
+	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
+		return -EBUSY;
+
+	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
+	if (!ris)
+		return -ENOMEM;
+	init_garbage(ris);
+
+	class = mpam_class_get(class_id, type);
+	if (IS_ERR(class))
+		return PTR_ERR(class);
+
+	comp = mpam_component_get(class, component_id);
+	if (IS_ERR(comp)) {
+		if (list_empty(&class->components))
+			mpam_class_destroy(class);
+		return PTR_ERR(comp);
+	}
+
+	vmsc = mpam_vmsc_get(comp, msc);
+	if (IS_ERR(vmsc)) {
+		if (list_empty(&comp->vmsc))
+			mpam_comp_destroy(comp);
+		return PTR_ERR(vmsc);
+	}
+
+	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+	if (err) {
+		if (list_empty(&vmsc->ris))
+			mpam_vmsc_destroy(vmsc);
+		return err;
+	}
+
+	ris->ris_idx = ris_idx;
+	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+	ris->vmsc = vmsc;
+
+	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+
+	return 0;
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id)
+{
+	int err;
+
+	mutex_lock(&mpam_list_lock);
+	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+				     component_id);
+	mutex_unlock(&mpam_list_lock);
+	if (err)
+		mpam_free_garbage();
+
+	return err;
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
 		return;
 
 	mutex_lock(&mpam_list_lock);
-	platform_set_drvdata(pdev, NULL);
-	list_del_rcu(&msc->all_msc_list);
-	synchronize_srcu(&mpam_srcu);
+	mpam_msc_destroy(msc);
 	mutex_unlock(&mpam_list_lock);
+
+	mpam_free_garbage();
 }
 
 static int mpam_msc_drv_probe(struct platform_device *pdev)
@@ -95,6 +490,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 			err = -ENOMEM;
 			break;
 		}
+		init_garbage(msc);
 
 		mutex_init(&msc->probe_lock);
 		mutex_init(&msc->part_sel_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 7c63d590fc98..02e9576ece6b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,10 +7,29 @@
 #include <linux/arm_mpam.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
 #include <linux/resctrl.h>
 #include <linux/sizes.h>
+#include <linux/srcu.h>
+
+#define MPAM_MSC_MAX_NUM_RIS	16
+
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be gargbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+	/* member of mpam_garbage */
+	struct llist_node	llist;
+
+	void			*to_free;
+	struct platform_device	*pdev;
+};
 
 struct mpam_msc {
 	/* member of mpam_all_msc */
@@ -57,8 +76,79 @@ struct mpam_msc {
 
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
+
+	struct mpam_garbage	garbage;
 };
 
+struct mpam_class {
+	/* mpam_components in this class */
+	struct list_head	components;
+
+	cpumask_t		affinity;
+
+	u8			level;
+	enum mpam_class_types	type;
+
+	/* member of mpam_classes */
+	struct list_head	classes_list;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_component {
+	u32			comp_id;
+
+	/* mpam_vmsc in this component */
+	struct list_head	vmsc;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_class:components */
+	struct list_head	class_list;
+
+	/* parent: */
+	struct mpam_class	*class;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_vmsc {
+	/* member of mpam_component:vmsc_list */
+	struct list_head	comp_list;
+
+	/* mpam_msc_ris in this vmsc */
+	struct list_head	ris;
+
+	/* All RIS in this vMSC are members of this MSC */
+	struct mpam_msc		*msc;
+
+	/* parent: */
+	struct mpam_component	*comp;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_msc_ris {
+	u8			ris_idx;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_vmsc:ris */
+	struct list_head	vmsc_list;
+
+	/* member of mpam_msc:ris */
+	struct list_head	msc_list;
+
+	/* parent: */
+	struct mpam_vmsc	*vmsc;
+
+	struct mpam_garbage	garbage;
+};
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 3d6c39c667c3..3206f5ddc147 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -38,11 +38,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
 static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 #endif
 
-static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
-				  enum mpam_class_types type, u8 class_id,
-				  int component_id)
-{
-	return -EINVAL;
-}
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id);
 
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (7 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 15:00   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
                   ` (20 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.

Add the definitions for these registers as offset within the page(s).

Link: https://developer.arm.com/documentation/ihi0099/latest/
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v1:
 * Whitespace.
 * Added constants for CASSOC and XCL.
 * Merged FLT/CTL defines.
 * Fixed MSMON_CFG_CSU_CTL_TYPE_CSU definition.

Changes since RFC:
 * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
 * Whitepsace churn.
 * Cite a more recent document.
 * Removed some stale feature, fixed some names etc.
---
 drivers/resctrl/mpam_internal.h | 267 ++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)

diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 02e9576ece6b..109f03df46c2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -152,4 +152,271 @@ extern struct list_head mpam_classes;
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/latest/
+ */
+#define MPAM_ARCHITECTURE_V1    0x10
+
+/* Memory mapped control pages: */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR		0x0000  /* features id register */
+#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
+#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
+#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
+#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
+#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
+#define MPAMF_IIDR		0x0018  /* implementer id register */
+#define MPAMF_AIDR		0x0020  /* architectural id register */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL	0x0100  /* partid to configure: */
+#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
+#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
+#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
+#define MPAMCFG_CASSOC		0x0118  /* cache-associativity config */
+#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
+#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
+#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
+#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
+#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
+#define MSMON_CSU		0x0840  /* current cache-usage */
+#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
+#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
+#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
+#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
+#define MPAMF_ESR		0x00F8  /* error status register */
+#define MPAMF_ECR		0x00F0  /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
+#define MPAMF_IDR_EXT			BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
+#define MPAMF_IDR_HAS_MSMON		BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
+#define MPAMF_IDR_HAS_RIS		BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
+#define MPAMF_IDR_HAS_ESR		BIT(39)
+#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
+#define MPAMF_MBW_IDR_WINDWR		BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX	GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
+#define MPAMF_IIDR_PRODUCTID_SHIFT	20
+#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
+#define MPAMF_IIDR_VARIANT_SHIFT	16
+#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
+#define MPAMF_IIDR_REVISON_SHIFT	12
+#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
+#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
+#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
+#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMCFG_CASSOC - MPAM cache maximum associativity partition configuration register */
+#define MPAMCFG_CASSOC_CASSOC		GENMASK(15, 0)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
+#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ *                     register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ *                    configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN		BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
+#define MPAMF_ESR_PMG		GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR		BIT(31)
+#define MPAMF_ESR_RIS		GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN		BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE			0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
+#define MPAM_ERRCODE_REQ_PMG_RANGE		4
+#define MPAM_ERRCODE_MONITOR_RANGE		5
+#define MPAM_ERRCODE_INTPARTID_RANGE		6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ *                    usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
+#define MSMON_CFG_x_CTL_SCLEN			BIT(19)
+#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN			BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU			0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU			0x43
+
+/*
+ * MSMON_CFG_CSU_FLT -  Memory system performance monitor configure cache storage
+ *                      usage monitor filter register
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ *                      bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_x_FLT_PARTID			GENMASK(15, 0)
+#define MSMON_CFG_x_FLT_PMG			GENMASK(23, 16)
+
+#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
+#define MSMON_CFG_CSU_FLT_XCL			BIT(31)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ *            register
+ * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
+ *                     capture register
+ * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
+ *               monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ *                     capture register
+ */
+#define MSMON___VALUE		GENMASK(30, 0)
+#define MSMON___NRDY		BIT(31)
+#define MSMON___NRDY_L		BIT(63)
+#define MSMON___L_VALUE		GENMASK(43, 0)
+#define MSMON___LWD_VALUE	GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ *                  generation register
+ */
+#define MSMON_CAPT_EVNT_NOW	BIT(0)
+
 #endif /* MPAM_INTERNAL_H */
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (8 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 15:07   ` Jonathan Cameron
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
                   ` (19 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen

Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.

Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed as each CPU's online call is made.

This adds the low-level MSC register accessors.

Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced.  The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.

The architecture's context switch code will be enabled by a static-key,
this can be set by mpam_enable(), but must be done from process context,
not a cpuhp callback because both take the cpuhp lock.
Whenever a new MSC has been probed, the mpam_enable() work is scheduled
to test if all the MSCs have been probed. If probing fails, mpam_disable()
is scheduled to unregister the cpuhp callbacks and free memory.

CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Removed register bounds check. If the firmware tables are wrong the
   resulting translation fault should be enough to debug this.
 * Removed '&' in front of a function pointer.
 * Pulled mpam_disable() into this patch.
 * Disable mpam when probing fails to avoid extra work on broken platforms.
 * Added mpam_disbale_reason as there are now two non-debug reasons for this
   to happen.
---
 drivers/resctrl/mpam_devices.c  | 173 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   5 +
 2 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c7f4981b3545..c265376d936b 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,6 +4,7 @@
 #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
 
 #include <linux/acpi.h>
+#include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
@@ -19,6 +20,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include "mpam_internal.h"
 
@@ -38,6 +40,22 @@ struct srcu_struct mpam_srcu;
  */
 static atomic_t mpam_num_msc;
 
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -77,6 +95,24 @@ LIST_HEAD(mpam_classes);
 /* List of all objects that can be free()d after synchronise_srcu() */
 static LLIST_HEAD(mpam_garbage);
 
+/* When mpam is disabled, the printed reason to aid debugging */
+static char *mpam_disable_reason;
+
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
+
 #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
 
 static struct mpam_vmsc *
@@ -434,6 +470,86 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+	u64 idr;
+	struct device *dev = &msc->pdev->dev;
+
+	lockdep_assert_held(&msc->probe_lock);
+
+	idr = __mpam_read_reg(msc, MPAMF_AIDR);
+	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+		dev_err_once(dev, "MSC does not match MPAM architecture v1.x\n");
+		return -EIO;
+	}
+
+	msc->probed = true;
+
+	return 0;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
+{
+	return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+	int err = 0;
+	struct mpam_msc *msc;
+	bool new_device_probed = false;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			err = mpam_msc_hw_probe(msc);
+		mutex_unlock(&msc->probe_lock);
+
+		if (!err)
+			new_device_probed = true;
+		else
+			break;
+	}
+
+	if (new_device_probed && !err)
+		schedule_work(&mpam_enable_work);
+	if (err) {
+		mpam_disable_reason = "error during probing";
+		schedule_work(&mpam_broken_work);
+	}
+
+	return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+	return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+					  int (*offline)(unsigned int offline))
+{
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+
+	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
+					     online, offline);
+	if (mpam_cpuhp_state <= 0) {
+		pr_err("Failed to register cpuhp callbacks");
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -544,7 +660,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 		mpam_msc_drv_remove(pdev);
 
 	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
-		pr_info("Discovered all MSC\n");
+		mpam_register_cpuhp_callbacks(mpam_discovery_cpu_online, NULL);
 
 	return err;
 }
@@ -557,6 +673,61 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+static void mpam_enable_once(void)
+{
+	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
+
+	pr_info("MPAM enabled\n");
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+	struct mpam_msc *msc, *tmp;
+
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
+		mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+	mpam_free_garbage();
+
+	pr_err_once("MPAM disabled due to %s\n", mpam_disable_reason);
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+	static atomic_t once;
+	struct mpam_msc *msc;
+	bool all_devices_probed = true;
+
+	/* Have we probed all the hw devices? */
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			all_devices_probed = false;
+		mutex_unlock(&msc->probe_lock);
+
+		if (!all_devices_probed)
+			break;
+	}
+
+	if (all_devices_probed && !atomic_fetch_inc(&once))
+		mpam_enable_once();
+}
+
 static int __init mpam_msc_driver_init(void)
 {
 	if (!system_supports_mpam())
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 109f03df46c2..d4f3febc7a50 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -51,6 +51,7 @@ struct mpam_msc {
 	 * properties become read-only and the lists are protected by SRCU.
 	 */
 	struct mutex		probe_lock;
+	bool			probed;
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
@@ -149,6 +150,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+void mpam_disable(struct work_struct *work);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (9 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 15:18   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
                   ` (18 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may also have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
The limits from requestors (e.g. CPUs) also need taking into account.

While doing this, RIS entries that firmware didn't describe are created
under MPAM_CLASS_UNKNOWN.

While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Change to lock ordering now that the list-lock mutex isn't held from
   the cpuhp call.
 * Removed irq-unmaksed assert in requestor register.
 * Changed captialisation in print message.
---
 drivers/resctrl/mpam_devices.c  | 150 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   6 ++
 include/linux/arm_mpam.h        |  14 +++
 3 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c265376d936b..24dc81c15ec8 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
 #include <linux/acpi.h>
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -43,6 +44,15 @@ static atomic_t mpam_num_msc;
 static int mpam_cpuhp_state;
 static DEFINE_MUTEX(mpam_cpuhp_state_lock);
 
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
 /*
  * mpam is enabled once all devices have been probed from CPU online callbacks,
  * scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -113,6 +123,72 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
 
 #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
 
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+	u64 idr_high = 0, idr_low;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	idr_low = mpam_read_partsel_reg(msc, IDR);
+	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+	return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+	int err = 0;
+
+	spin_lock(&partid_max_lock);
+	if (!partid_max_init) {
+		mpam_partid_max = partid_max;
+		mpam_pmg_max = pmg_max;
+		partid_max_init = true;
+	} else if (!partid_max_published) {
+		mpam_partid_max = min(mpam_partid_max, partid_max);
+		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+	} else {
+		/* New requestors can't lower the values */
+		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+			err = -EBUSY;
+	}
+	spin_unlock(&partid_max_lock);
+
+	return err;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
 #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
 
 static struct mpam_vmsc *
@@ -451,6 +527,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
 	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
 	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
 	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+	list_add_rcu(&ris->msc_list, &msc->ris);
 
 	return 0;
 }
@@ -470,9 +547,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+						   u8 ris_idx)
+{
+	int err;
+	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (!test_bit(ris_idx, &msc->ris_idxs)) {
+		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+					     0, 0);
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	list_for_each_entry(ris, &msc->ris, msc_list) {
+		if (ris->ris_idx == ris_idx) {
+			found = ris;
+			break;
+		}
+	}
+
+	return found;
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
+	u16 partid_max;
+	u8 ris_idx, pmg_max;
+	struct mpam_msc_ris *ris;
 	struct device *dev = &msc->pdev->dev;
 
 	lockdep_assert_held(&msc->probe_lock);
@@ -483,6 +588,39 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		return -EIO;
 	}
 
+	/* Grab an IDR value to find out how many RIS there are */
+	mutex_lock(&msc->part_sel_lock);
+	idr = mpam_msc_read_idr(msc);
+	mutex_unlock(&msc->part_sel_lock);
+	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+	/* Use these values so partid/pmg always starts with a valid value */
+	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		idr = mpam_msc_read_idr(msc);
+		mutex_unlock(&msc->part_sel_lock);
+
+		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+		msc->partid_max = min(msc->partid_max, partid_max);
+		msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+		mutex_lock(&mpam_list_lock);
+		ris = mpam_get_or_create_ris(msc, ris_idx);
+		mutex_unlock(&mpam_list_lock);
+		if (IS_ERR(ris))
+			return PTR_ERR(ris);
+	}
+
+	spin_lock(&partid_max_lock);
+	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+	spin_unlock(&partid_max_lock);
+
 	msc->probed = true;
 
 	return 0;
@@ -675,9 +813,18 @@ static struct platform_driver mpam_msc_driver = {
 
 static void mpam_enable_once(void)
 {
+	/*
+	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+	 * longer change.
+	 */
+	spin_lock(&partid_max_lock);
+	partid_max_published = true;
+	spin_unlock(&partid_max_lock);
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
-	pr_info("MPAM enabled\n");
+	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
+	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
 void mpam_disable(struct work_struct *ignored)
@@ -744,4 +891,5 @@ static int __init mpam_msc_driver_init(void)
 
 	return platform_driver_register(&mpam_msc_driver);
 }
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d4f3febc7a50..828ce93c95d5 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -52,6 +52,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	u16			partid_max;
+	u8			pmg_max;
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
@@ -150,6 +152,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 3206f5ddc147..cb6e6cfbea0b 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -41,4 +41,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 		    enum mpam_class_types type, u8 class_id, int component_id);
 
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max:		The maximum PARTID value the requestor can generate.
+ * @pmg_max:		The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (10 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 15:24   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
                   ` (17 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

The MSC MON_SEL register needs to be accessed from hardirq for the overflow
interrupt, and when taking an IPI to access these registers on platforms
where MSC are not accesible from every CPU. This makes an irqsave
spinlock the obvious lock to protect these registers. On systems with SCMI
mailboxes it must be able to sleep, meaning a mutex must be used. The
SCMI platforms can't support an overflow interrupt.

Clearly these two can't exist for one MSC at the same time.

Add helpers for the MON_SEL locking. The outer lock must be taken in a
pre-emptible context before the inner lock can be taken. On systems with
SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
will fail to be 'taken' if the caller is unable to sleep. This will allow
callers to fail without having to explicitly check the interface type of
each MSC.

Signed-off-by: James Morse <james.morse@arm.com>
---
Change since v1:
 * Made accesses to outer_lock_held READ_ONCE() for torn values in the failure
   case.
---
 drivers/resctrl/mpam_devices.c  |  3 +--
 drivers/resctrl/mpam_internal.h | 37 +++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 24dc81c15ec8..a26b012452e2 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -748,8 +748,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 
 		mutex_init(&msc->probe_lock);
 		mutex_init(&msc->part_sel_lock);
-		mutex_init(&msc->outer_mon_sel_lock);
-		raw_spin_lock_init(&msc->inner_mon_sel_lock);
+		mpam_mon_sel_lock_init(msc);
 		msc->id = pdev->id;
 		msc->pdev = pdev;
 		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 828ce93c95d5..4cc44d4e21c4 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -70,12 +70,17 @@ struct mpam_msc {
 
 	/*
 	 * mon_sel_lock protects access to the MSC hardware registers that are
-	 * affected by MPAMCFG_MON_SEL.
+	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+	 * Access to mon_sel is needed from both process and interrupt contexts,
+	 * but is complicated by firmware-backed platforms that can't make any
+	 * access unless they can sleep.
+	 * Always use the mpam_mon_sel_lock() helpers.
+	 * Accessed to mon_sel need to be able to fail if they occur in the wrong
+	 * context.
 	 * If needed, take msc->probe_lock first.
 	 */
-	struct mutex		outer_mon_sel_lock;
-	raw_spinlock_t		inner_mon_sel_lock;
-	unsigned long		inner_mon_sel_flags;
+	raw_spinlock_t		_mon_sel_lock;
+	unsigned long		_mon_sel_flags;
 
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
@@ -83,6 +88,30 @@ struct mpam_msc {
 	struct mpam_garbage	garbage;
 };
 
+/* Returning false here means accesses to mon_sel must fail and report an error. */
+static inline bool __must_check mpam_mon_sel_lock(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(msc->iface != MPAM_IFACE_MMIO);
+
+	raw_spin_lock_irqsave(&msc->_mon_sel_lock, msc->_mon_sel_flags);
+	return true;
+}
+
+static inline void mpam_mon_sel_unlock(struct mpam_msc *msc)
+{
+	raw_spin_unlock_irqrestore(&msc->_mon_sel_lock, msc->_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+	lockdep_assert_held_once(&msc->_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
+{
+	raw_spin_lock_init(&msc->_mon_sel_lock);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (11 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-11 15:29   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
                   ` (16 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin

Expand the probing support with the control and monitor types
we can use with resctrl.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * added an underscore to a variable name.

Changes since RFC:
 * Made mpam_ris_hw_probe_hw_nrdy() more in C.
 * Added static assert on features bitmap size.
---
 drivers/resctrl/mpam_devices.c  | 151 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  53 +++++++++++
 2 files changed, 204 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a26b012452e2..ba8e8839cdc4 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -138,6 +138,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
 }
 #define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
 
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	mpam_mon_sel_lock_held(msc);
+	return __mpam_read_reg(msc, reg);
+}
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	mpam_mon_sel_lock_held(msc);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_monsel_reg(msc, reg, val)   _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
 static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 {
 	u64 idr_high = 0, idr_low;
@@ -572,6 +586,136 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
 	return found;
 }
 
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
+{
+	u32 now;
+	u64 mon_sel;
+	bool can_set, can_clear;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+		return false;
+
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	_mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+	_mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_set = now & MSMON___NRDY;
+
+	_mpam_write_monsel_reg(msc, mon_reg, 0);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_clear = !(now & MSMON___NRDY);
+	mpam_mon_sel_unlock(msc);
+
+	return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg)			\
+        _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+	int err;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct device *dev = &msc->pdev->dev;
+	struct mpam_props *props = &ris->props;
+
+	lockdep_assert_held(&msc->probe_lock);
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	/* Cache Portion partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+		props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+		if (props->cpbm_wd)
+			mpam_set_feature(mpam_feat_cpor_part, props);
+	}
+
+	/* Memory bandwidth partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+		u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+		/* portion bitmap resolution */
+		props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+		if (props->mbw_pbm_bits &&
+		    FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_part, props);
+
+		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_max, props);
+	}
+
+	/* Performance Monitoring */
+	if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+		u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+		/*
+		 * If the firmware max-nrdy-us property is missing, the
+		 * CSU counters can't be used. Should we wait forever?
+		 */
+		err = device_property_read_u32(&msc->pdev->dev,
+					       "arm,not-ready-us",
+					       &msc->nrdy_usec);
+
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+			u32 csumonidr;
+
+			csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+			props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+			if (props->num_csu_mon) {
+				bool hw_managed;
+
+				mpam_set_feature(mpam_feat_msmon_csu, props);
+
+				/* Is NRDY hardware managed? */
+				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+				if (hw_managed)
+					mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+			}
+
+			/*
+			 * Accept the missing firmware property if NRDY appears
+			 * un-implemented.
+			 */
+			if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
+		}
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+			bool hw_managed;
+			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
+			if (props->num_mbwu_mon)
+				mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
+				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
+			/* Is NRDY hardware managed? */
+			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+			if (hw_managed)
+				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+			/*
+			 * Don't warn about any missing firmware property for
+			 * MBWU NRDY - it doesn't make any sense!
+			 */
+		}
+	}
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
@@ -592,6 +736,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	mutex_lock(&msc->part_sel_lock);
 	idr = mpam_msc_read_idr(msc);
 	mutex_unlock(&msc->part_sel_lock);
+
 	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
 
 	/* Use these values so partid/pmg always starts with a valid value */
@@ -614,6 +759,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&mpam_list_lock);
 		if (IS_ERR(ris))
 			return PTR_ERR(ris);
+		ris->idr = idr;
+
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		mpam_ris_hw_probe(ris);
+		mutex_unlock(&msc->part_sel_lock);
 	}
 
 	spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4cc44d4e21c4..5ae5d4eee8ec 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
 	raw_spin_lock_init(&msc->_mon_sel_lock);
 }
 
+/*
+ * When we compact the supported features, we don't care what they are.
+ * Storing them as a bitmap makes life easy.
+ */
+typedef u16 mpam_features_t;
+
+/* Bits for mpam_features_t */
+enum mpam_device_features {
+	mpam_feat_ccap_part = 0,
+	mpam_feat_cpor_part,
+	mpam_feat_mbw_part,
+	mpam_feat_mbw_min,
+	mpam_feat_mbw_max,
+	mpam_feat_mbw_prop,
+	mpam_feat_msmon,
+	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_capture,
+	mpam_feat_msmon_csu_hw_nrdy,
+	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_capture,
+	mpam_feat_msmon_mbwu_rwbw,
+	mpam_feat_msmon_mbwu_hw_nrdy,
+	mpam_feat_msmon_capt,
+	MPAM_FEATURE_LAST,
+};
+static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
+
+struct mpam_props {
+	mpam_features_t		features;
+
+	u16			cpbm_wd;
+	u16			mbw_pbm_bits;
+	u16			bwa_wd;
+	u16			num_csu_mon;
+	u16			num_mbwu_mon;
+};
+
+static inline bool mpam_has_feature(enum mpam_device_features feat,
+				    struct mpam_props *props)
+{
+	return (1 << feat) & props->features;
+}
+
+static inline void mpam_set_feature(enum mpam_device_features feat,
+				    struct mpam_props *props)
+{
+	props->features |= (1 << feat);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -151,6 +200,8 @@ struct mpam_vmsc {
 	/* mpam_msc_ris in this vmsc */
 	struct list_head	ris;
 
+	struct mpam_props	props;
+
 	/* All RIS in this vMSC are members of this MSC */
 	struct mpam_msc		*msc;
 
@@ -162,6 +213,8 @@ struct mpam_vmsc {
 
 struct mpam_msc_ris {
 	u8			ris_idx;
+	u64			idr;
+	struct mpam_props	props;
 
 	cpumask_t		affinity;
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (12 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 11:49   ` Jonathan Cameron
  2025-10-05  1:28   ` Fenghua Yu
  2025-09-10 20:42 ` [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
                   ` (15 subsequent siblings)
  29 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.

Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.

If bitmap properties are mismatched within a component we
cannot support the mismatched feature.

Care has to be taken as vMSC may hold mismatched RIS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 215 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   8 ++
 2 files changed, 223 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index ba8e8839cdc4..cd8e95fa5fd6 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -961,8 +961,223 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_mbw_min, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_max, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_prop, props))
+		return true;
+	return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
+	helper(parent) &&						\
+	((helper(child) && (parent)->field != (child)->field) ||	\
+	 (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias)		     \
+	mpam_has_feature((feat), (parent)) &&				     \
+	((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+	 (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias)			\
+	(alias) && !mpam_has_feature((feat), (parent)) &&		\
+	mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+			     struct mpam_props *child, bool alias)
+{
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+		parent->cpbm_wd = child->cpbm_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+				   cpbm_wd, alias)) {
+		pr_debug("%s cleared cpor_part\n", __func__);
+		mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
+		parent->cpbm_wd = 0;
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+		parent->mbw_pbm_bits = child->mbw_pbm_bits;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+				   mbw_pbm_bits, alias)) {
+		pr_debug("%s cleared mbw_part\n", __func__);
+		mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
+		parent->mbw_pbm_bits = 0;
+	}
+
+	/* bwa_wd is a count of bits, fewer bits means less precision */
+	if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {
+		parent->bwa_wd = child->bwa_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+				     bwa_wd, alias)) {
+		pr_debug("%s took the min bwa_wd\n", __func__);
+		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+	}
+
+	/* For num properties, take the minimum */
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+		parent->num_csu_mon = child->num_csu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+				   num_csu_mon, alias)) {
+		pr_debug("%s took the min num_csu_mon\n", __func__);
+		parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+		parent->num_mbwu_mon = child->num_mbwu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+				   num_mbwu_mon, alias)) {
+		pr_debug("%s took the min num_mbwu_mon\n", __func__);
+		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
+	}
+
+	if (alias) {
+		/* Merge features for aliased resources */
+		parent->features |= child->features;
+	} else {
+		/* Clear missing features for non aliasing */
+		parent->features &= child->features;
+	}
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+	struct mpam_props *cprops = &class->props;
+	struct mpam_props *vprops = &vmsc->props;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+	pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
+		 dev_name(&vmsc->msc->pdev->dev),
+		 (long)cprops->features, (long)vprops->features);
+
+	/* Take the safe value for any common features */
+	__props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_props *vprops = &vmsc->props;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+	pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+		 dev_name(&vmsc->msc->pdev->dev),
+		 (long)vprops->features, (long)rprops->features);
+
+	/*
+	 * Merge mismatched features - Copy any features that aren't common,
+	 * but take the safe value for any common features.
+	 */
+	__props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_component *comp;
+
+	comp = list_first_entry_or_null(&class->components,
+					struct mpam_component, class_list);
+	if (WARN_ON(!comp))
+		return;
+
+	vmsc = list_first_entry_or_null(&comp->vmsc,
+					struct mpam_vmsc, comp_list);
+	if (WARN_ON(!vmsc))
+		return;
+
+	class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			__vmsc_props_mismatch(vmsc, ris);
+			class->nrdy_usec = max(class->nrdy_usec,
+					       vmsc->msc->nrdy_usec);
+		}
+	}
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+		__class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together, this must be done first.
+ * Next the class features are the bitwise-and of all the vmsc features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, all_classes_list, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_vmsc_features(comp);
+
+		mpam_enable_init_class_features(class);
+
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_class_features(comp);
+	}
+}
+
 static void mpam_enable_once(void)
 {
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+	mutex_unlock(&mpam_list_lock);
+
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
 	 * longer change.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 5ae5d4eee8ec..eace5ba871f3 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -161,12 +161,20 @@ static inline void mpam_set_feature(enum mpam_device_features feat,
 	props->features |= (1 << feat);
 }
 
+static inline void mpam_clear_feature(enum mpam_device_features feat,
+				      mpam_features_t *supported)
+{
+	*supported &= ~(1 << feat);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
 
 	cpumask_t		affinity;
 
+	struct mpam_props	props;
+	u32			nrdy_usec;
 	u8			level;
 	enum mpam_class_types	type;
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (13 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 11:25   ` Ben Horgan
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
                   ` (14 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew

When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtyied. e.g. Kexec.

Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.

MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.

If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.

To reset, write the maximum values for all discovered controls.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Last bitmap write will always be non-zero.
 * Dropped READ_ONCE() - teh value can no longer change.
 * Write 0 to proporitional stride, remove the bwa_fract variable.
 * Removed nested srcu lock, the assert should cover it.
---
 drivers/resctrl/mpam_devices.c  | 117 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   8 +++
 2 files changed, 125 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index cd8e95fa5fd6..0353313cf284 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/bitfield.h>
+#include <linux/bitmap.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -777,8 +778,110 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+	u32 num_words, msb;
+	u32 bm = ~0;
+	int i;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	if (wd == 0)
+		return;
+
+	/*
+	 * Write all ~0 to all but the last 32bit-word, which may
+	 * have fewer bits...
+	 */
+	num_words = DIV_ROUND_UP(wd, 32);
+	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+		__mpam_write_reg(msc, reg, bm);
+
+	/*
+	 * ....and then the last (maybe) partial 32bit word. When wd is a
+	 * multiple of 32, msb should be 31 to write a full 32bit word.
+	 */
+	msb = (wd - 1) % 32;
+	bm = GENMASK(msb, 0);
+	__mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *rprops = &ris->props;
+
+	mpam_assert_srcu_read_lock_held();
+
+	mutex_lock(&msc->part_sel_lock);
+	__mpam_part_sel(ris->ris_idx, partid, msc);
+
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+		mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+
+	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
+		mpam_write_partsel_reg(msc, MBW_PROP, 0);
+	mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+	u16 partid, partid_max;
+
+	mpam_assert_srcu_read_lock_held();
+
+	if (ris->in_reset_state)
+		return;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid < partid_max; partid++)
+		mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+	struct mpam_msc_ris *ris;
+
+	mpam_assert_srcu_read_lock_held();
+
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+		mpam_reset_ris(ris);
+
+		/*
+		 * Set in_reset_state when coming online. The reset state
+		 * for non-zero partid may be lost while the CPUs are offline.
+		 */
+		ris->in_reset_state = online;
+	}
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
+	int idx;
+	struct mpam_msc *msc;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_fetch_inc(&msc->online_refs) == 0)
+			mpam_reset_msc(msc, true);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	return 0;
 }
 
@@ -818,6 +921,20 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 
 static int mpam_cpu_offline(unsigned int cpu)
 {
+	int idx;
+	struct mpam_msc *msc;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_dec_and_test(&msc->online_refs))
+			mpam_reset_msc(msc, false);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index eace5ba871f3..6e047fbd3512 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
 #include <linux/llist.h>
@@ -45,6 +46,7 @@ struct mpam_msc {
 	struct pcc_mbox_chan	*pcc_chan;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	atomic_t		online_refs;
 
 	/*
 	 * probe_lock is only taken during discovery. After discovery these
@@ -223,6 +225,7 @@ struct mpam_msc_ris {
 	u8			ris_idx;
 	u64			idr;
 	struct mpam_props	props;
+	bool			in_reset_state;
 
 	cpumask_t		affinity;
 
@@ -242,6 +245,11 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+static inline void mpam_assert_srcu_read_lock_held(void)
+{
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+}
+
 /* System wide partid/pmg values */
 extern u16 mpam_partid_max;
 extern u8 mpam_pmg_max;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (14 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 11:57   ` Jonathan Cameron
  2025-10-05 21:08   ` Fenghua Yu
  2025-09-10 20:42 ` [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
                   ` (13 subsequent siblings)
  29 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.

Add a helper that schedules the provided function if necessary.

Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0353313cf284..e7faf453b5d7 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -833,20 +833,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 	mutex_unlock(&msc->part_sel_lock);
 }
 
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
+	struct mpam_msc_ris *ris = arg;
 
 	mpam_assert_srcu_read_lock_held();
 
 	if (ris->in_reset_state)
-		return;
+		return 0;
 
 	spin_lock(&partid_max_lock);
 	partid_max = mpam_partid_max;
 	spin_unlock(&partid_max_lock);
 	for (partid = 0; partid < partid_max; partid++)
 		mpam_reset_ris_partid(ris, partid);
+
+	return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (cpumask_test_cpu(cpu, &msc->accessibility))
+		return cpu;
+
+	return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+	lockdep_assert_irqs_enabled();
+	lockdep_assert_cpus_held();
+	mpam_assert_srcu_read_lock_held();
+
+	return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
 }
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -856,7 +887,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	mpam_assert_srcu_read_lock_held();
 
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
-		mpam_reset_ris(ris);
+		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
 		/*
 		 * Set in_reset_state when coming online. The reset state
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (15 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 11:42   ` Ben Horgan
                     ` (2 more replies)
  2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
                   ` (12 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.

Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * more complete use of _srcu helpers.
 * Use guard macro for srcu.
 * Dropped a might_sleep() - something else will bark.
---
 drivers/resctrl/mpam_devices.c | 56 ++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e7faf453b5d7..a9d3c4b09976 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -842,8 +842,6 @@ static int mpam_reset_ris(void *arg)
 	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
 
-	mpam_assert_srcu_read_lock_held();
-
 	if (ris->in_reset_state)
 		return 0;
 
@@ -1340,8 +1338,56 @@ static void mpam_enable_once(void)
 	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			if (!ris->in_reset_state)
+				mpam_touch_msc(msc, mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+		}
+	}
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(comp, &class->components, class_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_component_locked(comp);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+	cpus_read_lock();
+	mpam_reset_class_locked(class);
+	cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
 void mpam_disable(struct work_struct *ignored)
 {
+	int idx;
+	struct mpam_class *class;
 	struct mpam_msc *msc, *tmp;
 
 	mutex_lock(&mpam_cpuhp_state_lock);
@@ -1351,6 +1397,12 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_class(class);
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	mutex_lock(&mpam_list_lock);
 	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
 		mpam_msc_destroy(msc);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (16 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 12:12   ` Jonathan Cameron
                     ` (3 more replies)
  2025-09-10 20:42 ` [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
                   ` (11 subsequent siblings)
  29 siblings, 4 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.

Only the irq handler accesses the ESR register, so no locking is needed.
The work to disable MPAM after an error needs to happen at process
context as it takes mutex. It also unregisters the interrupts, meaning
it can't be done from the threaded part of a threaded interrupt.
Instead, mpam_disable() gets scheduled.

Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.

Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.

CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Made mpam_unregister_irqs() safe to race with itself.
 * Removed threaded interrupts.
 * Schedule mpam_disable() from cpuhp callback in the case of an error.
 * Added mpam_disable_reason.
 * Use alloc_percpu()

Changes since RFC:
 * Use guard marco when walking srcu list.
 * Use INTEN macro for enabling interrupts.
 * Move partid_max_published up earlier in mpam_enable_once().
---
 drivers/resctrl/mpam_devices.c  | 277 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  10 ++
 2 files changed, 284 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a9d3c4b09976..e7e4afc1ea95 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
 #include <linux/list.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
@@ -166,6 +169,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 	return (idr_high << 32) | idr_low;
 }
 
+static void mpam_msc_zero_esr(struct mpam_msc *msc)
+{
+	__mpam_write_reg(msc, MPAMF_ESR, 0);
+	if (msc->has_extd_esr)
+		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+	u64 esr_high = 0, esr_low;
+
+	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+	if (msc->has_extd_esr)
+		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+	return (esr_high << 32) | esr_low;
+}
+
 static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
 {
 	lockdep_assert_held(&msc->part_sel_lock);
@@ -754,6 +775,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
 		msc->partid_max = min(msc->partid_max, partid_max);
 		msc->pmg_max = min(msc->pmg_max, pmg_max);
+		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
 
 		mutex_lock(&mpam_list_lock);
 		ris = mpam_get_or_create_ris(msc, ris_idx);
@@ -768,6 +790,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&msc->part_sel_lock);
 	}
 
+	/* Clear any stale errors */
+	mpam_msc_zero_esr(msc);
+
 	spin_lock(&partid_max_lock);
 	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
 	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -895,6 +920,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	}
 }
 
+static void _enable_percpu_irq(void *_irq)
+{
+	int *irq = _irq;
+
+	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
 	int idx;
@@ -906,6 +938,9 @@ static int mpam_cpu_online(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			_enable_percpu_irq(&msc->reenable_error_ppi);
+
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
 			mpam_reset_msc(msc, true);
 	}
@@ -959,6 +994,9 @@ static int mpam_cpu_offline(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			disable_percpu_irq(msc->reenable_error_ppi);
+
 		if (atomic_dec_and_test(&msc->online_refs))
 			mpam_reset_msc(msc, false);
 	}
@@ -985,6 +1023,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
 	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
+static int __setup_ppi(struct mpam_msc *msc)
+{
+	int cpu;
+	struct device *dev = &msc->pdev->dev;
+
+	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
+	if (!msc->error_dev_id)
+		return -ENOMEM;
+
+	for_each_cpu(cpu, &msc->accessibility) {
+		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
+
+		if (empty) {
+			dev_err_once(dev, "MSC shares PPI with %s!\n",
+				     dev_name(&empty->pdev->dev));
+			return -EBUSY;
+		}
+		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+	}
+
+	return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+	int irq;
+
+	irq = platform_get_irq_byname_optional(msc->pdev, "error");
+	if (irq <= 0)
+		return 0;
+
+	/* Allocate and initialise the percpu device pointer for PPI */
+	if (irq_is_percpu(irq))
+		return __setup_ppi(msc);
+
+	/* sanity check: shared interrupts can be routed anywhere? */
+	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+		pr_err_once("msc:%u is a private resource with a shared error interrupt",
+			    msc->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -1060,6 +1143,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 			break;
 		}
 
+		err = mpam_msc_setup_error_irq(msc);
+		if (err)
+			break;
+
 		if (device_property_read_u32(&pdev->dev, "pcc-channel",
 					     &msc->pcc_subspace_id))
 			msc->iface = MPAM_IFACE_MMIO;
@@ -1318,11 +1405,172 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
 	}
 }
 
+static char *mpam_errcode_names[16] = {
+	[0] = "No error",
+	[1] = "PARTID_SEL_Range",
+	[2] = "Req_PARTID_Range",
+	[3] = "MSMONCFG_ID_RANGE",
+	[4] = "Req_PMG_Range",
+	[5] = "Monitor_Range",
+	[6] = "intPARTID_Range",
+	[7] = "Unexpected_INTERNAL",
+	[8] = "Undefined_RIS_PART_SEL",
+	[9] = "RIS_No_Control",
+	[10] = "Undefined_RIS_MON_SEL",
+	[11] = "RIS_No_Monitor",
+	[12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+	return 0;
+}
+
+/* This can run in mpam_disable(), and the interrupt handler on the same CPU */
+static int mpam_disable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, 0);
+
+	return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+	u64 reg;
+	u16 partid;
+	u8 errcode, pmg, ris;
+
+	if (WARN_ON_ONCE(!msc) ||
+	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &msc->accessibility)))
+		return IRQ_NONE;
+
+	reg = mpam_msc_read_esr(msc);
+
+	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+	if (!errcode)
+		return IRQ_NONE;
+
+	/* Clear level triggered irq */
+	mpam_msc_zero_esr(msc);
+
+	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+	pr_err_ratelimited("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+			   msc->id, mpam_errcode_names[errcode], partid, pmg,
+			   ris);
+
+	/* Disable this interrupt. */
+	mpam_disable_msc_ecr(msc);
+
+	/*
+	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
+	 * unregister the interrupt from the threaded part of the handler.
+	 */
+	mpam_disable_reason = "hardware error interrupt";
+	schedule_work(&mpam_broken_work);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static int mpam_register_irqs(void)
+{
+	int err, irq;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+		/* We anticipate sharing the interrupt with other MSCs */
+		if (irq_is_percpu(irq)) {
+			err = request_percpu_irq(irq, &mpam_ppi_handler,
+						 "mpam:msc:error",
+						 msc->error_dev_id);
+			if (err)
+				return err;
+
+			msc->reenable_error_ppi = irq;
+			smp_call_function_many(&msc->accessibility,
+					       &_enable_percpu_irq, &irq,
+					       true);
+		} else {
+			err = devm_request_irq(&msc->pdev->dev,irq,
+					       &mpam_spi_handler, IRQF_SHARED,
+					       "mpam:msc:error", msc);
+			if (err)
+				return err;
+		}
+
+		set_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags);
+		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+		set_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags);
+	}
+
+	return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+	int irq, idx;
+	struct mpam_msc *msc;
+
+	cpus_read_lock();
+	/* take the lock as free_irq() can sleep */
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		if (test_and_clear_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags))
+			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+
+		if (test_and_clear_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags)) {
+			if (irq_is_percpu(irq)) {
+				msc->reenable_error_ppi = 0;
+				free_percpu_irq(irq, msc->error_dev_id);
+			} else {
+				devm_free_irq(&msc->pdev->dev, irq, msc);
+			}
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+	cpus_read_unlock();
+}
+
 static void mpam_enable_once(void)
 {
-	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
-	mutex_unlock(&mpam_list_lock);
+	int err;
 
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
@@ -1332,6 +1580,27 @@ static void mpam_enable_once(void)
 	partid_max_published = true;
 	spin_unlock(&partid_max_lock);
 
+	/*
+	 * If all the MSC have been probed, enabling the IRQs happens next.
+	 * That involves cross-calling to a CPU that can reach the MSC, and
+	 * the locks must be taken in this order:
+	 */
+	cpus_read_lock();
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+
+	err = mpam_register_irqs();
+	if (err)
+		pr_warn("Failed to register irqs: %d\n", err);
+
+	mutex_unlock(&mpam_list_lock);
+	cpus_read_unlock();
+
+	if (err) {
+		schedule_work(&mpam_broken_work);
+		return;
+	}
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
 	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
@@ -1397,6 +1666,8 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	mpam_unregister_irqs();
+
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
 				 srcu_read_lock_held(&mpam_srcu))
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6e047fbd3512..f04a9ef189cf 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -32,6 +32,10 @@ struct mpam_garbage {
 	struct platform_device	*pdev;
 };
 
+/* Bit positions for error_irq_flags */
+#define	MPAM_ERROR_IRQ_REQUESTED  0
+#define	MPAM_ERROR_IRQ_HW_ENABLED 1
+
 struct mpam_msc {
 	/* member of mpam_all_msc */
 	struct list_head        all_msc_list;
@@ -46,6 +50,11 @@ struct mpam_msc {
 	struct pcc_mbox_chan	*pcc_chan;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	bool			has_extd_esr;
+
+	int				reenable_error_ppi;
+	struct mpam_msc * __percpu	*error_dev_id;
+
 	atomic_t		online_refs;
 
 	/*
@@ -54,6 +63,7 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	unsigned long		error_irq_flags;
 	u16			partid_max;
 	u8			pmg_max;
 	unsigned long		ris_idxs;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (17 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
@ 2025-09-10 20:42 ` James Morse
  2025-09-12 12:13   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:43 ` [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
                   ` (10 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:42 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.

After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.

Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in repsonse to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 12 ++++++++++++
 drivers/resctrl/mpam_internal.h |  8 ++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e7e4afc1ea95..ec1db5f8b05c 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,6 +29,8 @@
 
 #include "mpam_internal.h"
 
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* TODO: move to arch code */
+
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
@@ -956,6 +958,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 	struct mpam_msc *msc;
 	bool new_device_probed = false;
 
+	if (mpam_is_enabled())
+		return 0;
+
 	guard(srcu)(&mpam_srcu);
 	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
 				 srcu_read_lock_held(&mpam_srcu)) {
@@ -1471,6 +1476,10 @@ static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
 	/* Disable this interrupt. */
 	mpam_disable_msc_ecr(msc);
 
+	/* Are we racing with the thread disabling MPAM? */
+	if (!mpam_is_enabled())
+		return IRQ_HANDLED;
+
 	/*
 	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
 	 * unregister the interrupt from the threaded part of the handler.
@@ -1601,6 +1610,7 @@ static void mpam_enable_once(void)
 		return;
 	}
 
+	static_branch_enable(&mpam_enabled);
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
 	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
@@ -1666,6 +1676,8 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	static_branch_disable(&mpam_enabled);
+
 	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index f04a9ef189cf..b69fa9199cb4 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -8,6 +8,7 @@
 #include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/jump_label.h>
 #include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
@@ -17,6 +18,13 @@
 
 #define MPAM_MSC_MAX_NUM_RIS	16
 
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+	return static_branch_likely(&mpam_enabled);
+}
+
 /*
  * Structures protected by SRCU may not be freed for a surprising amount of
  * time (especially if perf is running). To ensure the MPAM error interrupt can
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (18 preceding siblings ...)
  2025-09-10 20:42 ` [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 12:22   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:43 ` [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features James Morse
                   ` (9 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin

When CPUs come online the MSC's original configuration should be restored.

Add struct mpam_config to hold the configuration. This has a bitmap of
features that were modified. Once the maximum partid is known, allocate
a configuration array for each component, and reprogram each RIS
configuration from this.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Switched entry_rcu to srcu versions.

Changes since RFC:
 * Added a comment about the ordering around max_partid.
 * Allocate configurations after interrupts are registered to reduce churn.
 * Added mpam_assert_partid_sizes_fixed();
 * Make reset use an all-ones instead of zero config.
---
 drivers/resctrl/mpam_devices.c  | 269 +++++++++++++++++++++++++++++---
 drivers/resctrl/mpam_internal.h |  29 +++-
 2 files changed, 271 insertions(+), 27 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index ec1db5f8b05c..7fd149109c75 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -114,6 +114,16 @@ static LLIST_HEAD(mpam_garbage);
 /* When mpam is disabled, the printed reason to aid debugging */
 static char *mpam_disable_reason;
 
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+	WARN_ON_ONCE(!partid_max_published);
+}
+
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
 	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
@@ -363,12 +373,16 @@ static void mpam_class_destroy(struct mpam_class *class)
 	add_to_garbage(class);
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp);
+
 static void mpam_comp_destroy(struct mpam_component *comp)
 {
 	struct mpam_class *class = comp->class;
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	__destroy_component_cfg(comp);
+
 	list_del_rcu(&comp->class_list);
 	add_to_garbage(comp);
 
@@ -833,50 +847,105 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 	__mpam_write_reg(msc, reg, bm);
 }
 
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+				      struct mpam_config *cfg)
 {
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
 
-	mpam_assert_srcu_read_lock_held();
-
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
-	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
+	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
+		if (cfg->reset_cpbm)
+			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+					      rprops->cpbm_wd);
+		else
+			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_part, cfg)) {
+		if (cfg->reset_mbw_pbm)
+			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+					      rprops->mbw_pbm_bits);
+		else
+			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_min, cfg))
 		mpam_write_partsel_reg(msc, MBW_MIN, 0);
 
-	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
-		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_max, cfg))
+		mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
 
-	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
+	if (mpam_has_feature(mpam_feat_mbw_prop, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_prop, cfg))
 		mpam_write_partsel_reg(msc, MBW_PROP, 0);
 	mutex_unlock(&msc->part_sel_lock);
 }
 
+struct reprogram_ris {
+	struct mpam_msc_ris *ris;
+	struct mpam_config *cfg;
+};
+
+/* Call with MSC lock held */
+static int mpam_reprogram_ris(void *_arg)
+{
+	u16 partid, partid_max;
+	struct reprogram_ris *arg = _arg;
+	struct mpam_msc_ris *ris = arg->ris;
+	struct mpam_config *cfg = arg->cfg;
+
+	if (ris->in_reset_state)
+		return 0;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid <= partid_max; partid++)
+		mpam_reprogram_ris_partid(ris, partid, cfg);
+
+	return 0;
+}
+
+static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
+{
+	memset(reset_cfg, 0, sizeof(*reset_cfg));
+
+	reset_cfg->features = ~0;
+	reset_cfg->cpbm = ~0;
+	reset_cfg->mbw_pbm = ~0;
+	reset_cfg->mbw_max = MPAMCFG_MBW_MAX_MAX;
+
+	reset_cfg->reset_cpbm = true;
+	reset_cfg->reset_mbw_pbm = true;
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible.
  */
 static int mpam_reset_ris(void *arg)
 {
-	u16 partid, partid_max;
+	struct mpam_config reset_cfg;
 	struct mpam_msc_ris *ris = arg;
+	struct reprogram_ris reprogram_arg;
 
 	if (ris->in_reset_state)
 		return 0;
 
-	spin_lock(&partid_max_lock);
-	partid_max = mpam_partid_max;
-	spin_unlock(&partid_max_lock);
-	for (partid = 0; partid < partid_max; partid++)
-		mpam_reset_ris_partid(ris, partid);
+	mpam_init_reset_cfg(&reset_cfg);
+
+	reprogram_arg.ris = ris;
+	reprogram_arg.cfg = &reset_cfg;
+
+	mpam_reprogram_ris(&reprogram_arg);
 
 	return 0;
 }
@@ -922,6 +991,40 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	}
 }
 
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+	u16 partid;
+	bool reset;
+	struct mpam_config *cfg;
+	struct mpam_msc_ris *ris;
+
+	/*
+	 * No lock for mpam_partid_max as partid_max_published has been
+	 * set by mpam_enabled(), so the values can no longer change.
+	 */
+	mpam_assert_partid_sizes_fixed();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!mpam_is_enabled() && !ris->in_reset_state) {
+			mpam_touch_msc(msc, &mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+			continue;
+		}
+
+		reset = true;
+		for (partid = 0; partid <= mpam_partid_max; partid++) {
+			cfg = &ris->vmsc->comp->cfg[partid];
+			if (cfg->features)
+				reset = false;
+
+			mpam_reprogram_ris_partid(ris, partid, cfg);
+		}
+		ris->in_reset_state = reset;
+	}
+}
+
 static void _enable_percpu_irq(void *_irq)
 {
 	int *irq = _irq;
@@ -944,7 +1047,7 @@ static int mpam_cpu_online(unsigned int cpu)
 			_enable_percpu_irq(&msc->reenable_error_ppi);
 
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
-			mpam_reset_msc(msc, true);
+			mpam_reprogram_msc(msc);
 	}
 	srcu_read_unlock(&mpam_srcu, idx);
 
@@ -1577,6 +1680,45 @@ static void mpam_unregister_irqs(void)
 	cpus_read_unlock();
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+	add_to_garbage(comp->cfg);
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+	mpam_assert_partid_sizes_fixed();
+
+	if (comp->cfg)
+		return 0;
+
+	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+	if (!comp->cfg)
+		return -ENOMEM;
+	init_garbage(comp->cfg);
+
+	return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+	int err = 0;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list) {
+			err = __allocate_component_cfg(comp);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
 static void mpam_enable_once(void)
 {
 	int err;
@@ -1596,12 +1738,21 @@ static void mpam_enable_once(void)
 	 */
 	cpus_read_lock();
 	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
+	do {
+		mpam_enable_merge_features(&mpam_classes);
 
-	err = mpam_register_irqs();
-	if (err)
-		pr_warn("Failed to register irqs: %d\n", err);
+		err = mpam_register_irqs();
+		if (err) {
+			pr_warn("Failed to register irqs: %d\n", err);
+			break;
+		}
 
+		err = mpam_allocate_config();
+		if (err) {
+			pr_err("Failed to allocate configuration arrays.\n");
+			break;
+		}
+	} while (0);
 	mutex_unlock(&mpam_list_lock);
 	cpus_read_unlock();
 
@@ -1624,6 +1775,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
 	struct mpam_msc_ris *ris;
 
 	lockdep_assert_cpus_held();
+	mpam_assert_partid_sizes_fixed();
+
+	memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
 
 	guard(srcu)(&mpam_srcu);
 	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
@@ -1723,6 +1877,77 @@ void mpam_enable(struct work_struct *work)
 		mpam_enable_once();
 }
 
+struct mpam_write_config_arg {
+	struct mpam_msc_ris *ris;
+	struct mpam_component *comp;
+	u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+	struct mpam_write_config_arg *c = arg;
+
+	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+	return 0;
+}
+
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+	if (mpam_has_feature(feature, newcfg) &&			\
+	    (newcfg)->member != (cfg)->member) {			\
+		(cfg)->member = (newcfg)->member;			\
+		cfg->features |= (1 << feature);			\
+									\
+		(changes) |= (1 << feature);				\
+	}								\
+} while (0)
+
+static mpam_features_t mpam_update_config(struct mpam_config *cfg,
+					  const struct mpam_config *newcfg)
+{
+	mpam_features_t changes = 0;
+
+	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
+	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
+	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
+
+	return changes;
+}
+
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg)
+{
+	struct mpam_write_config_arg arg;
+	struct mpam_msc_ris *ris;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	/* Don't pass in the current config! */
+	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+	if (!mpam_update_config(&comp->cfg[partid], cfg))
+		return 0;
+
+	arg.comp = comp;
+	arg.partid = partid;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			arg.ris = ris;
+			mpam_touch_msc(msc, __write_config, &arg);
+		}
+	}
+
+	return 0;
+}
+
 static int __init mpam_msc_driver_init(void)
 {
 	if (!system_supports_mpam())
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b69fa9199cb4..17570d9aae9b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -169,11 +169,7 @@ struct mpam_props {
 	u16			num_mbwu_mon;
 };
 
-static inline bool mpam_has_feature(enum mpam_device_features feat,
-				    struct mpam_props *props)
-{
-	return (1 << feat) & props->features;
-}
+#define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
 
 static inline void mpam_set_feature(enum mpam_device_features feat,
 				    struct mpam_props *props)
@@ -204,6 +200,20 @@ struct mpam_class {
 	struct mpam_garbage	garbage;
 };
 
+struct mpam_config {
+	/* Which configuration values are valid. */
+	mpam_features_t		features;
+
+	u32	cpbm;
+	u32	mbw_pbm;
+	u16	mbw_max;
+
+	bool	reset_cpbm;
+	bool	reset_mbw_pbm;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_component {
 	u32			comp_id;
 
@@ -212,6 +222,12 @@ struct mpam_component {
 
 	cpumask_t		affinity;
 
+	/*
+	 * Array of configuration values, indexed by partid.
+	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+	 */
+	struct mpam_config	*cfg;
+
 	/* member of mpam_class:components */
 	struct list_head	class_list;
 
@@ -276,6 +292,9 @@ extern u8 mpam_pmg_max;
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
 
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (19 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:07   ` Jonathan Cameron
  2025-09-10 20:43 ` [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors James Morse
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew,
	Zeng Heng, Dave Martin

MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.

Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.

PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Added reset for cassoc.
 * Added detection of CSU XCL.
---
 drivers/resctrl/mpam_devices.c  | 181 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  17 ++-
 2 files changed, 196 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 7fd149109c75..f536ebbcf94e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -214,6 +214,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
 	__mpam_part_sel_raw(partsel, msc);
 }
 
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+		      MPAMCFG_PART_SEL_INTERNAL;
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
 int mpam_register_requestor(u16 partid_max, u8 pmg_max)
 {
 	int err = 0;
@@ -667,10 +676,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct device *dev = &msc->pdev->dev;
 	struct mpam_props *props = &ris->props;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	lockdep_assert_held(&msc->probe_lock);
 	lockdep_assert_held(&msc->part_sel_lock);
 
+	/* Cache Capacity Partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+		if (props->cmax_wd &&
+		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+
+		if (props->cassoc_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cassoc, props);
+	}
+
 	/* Cache Portion partitioning */
 	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
 		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -693,6 +727,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
 		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
 			mpam_set_feature(mpam_feat_mbw_max, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_min, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_prop, props);
+	}
+
+	/* Priority partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+		u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+		props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+		if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_intpri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+		}
+
+		props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+		if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_dspri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+		}
 	}
 
 	/* Performance Monitoring */
@@ -717,6 +776,9 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 
 				mpam_set_feature(mpam_feat_msmon_csu, props);
 
+				if (FIELD_GET(MPAMF_CSUMON_IDR_HAS_XCL, csumonidr))
+					mpam_set_feature(mpam_feat_msmon_csu_xcl, props);
+
 				/* Is NRDY hardware managed? */
 				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
 				if (hw_managed)
@@ -752,6 +814,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			 */
 		}
 	}
+
+	/*
+	 * RIS with PARTID narrowing don't have enough storage for one
+	 * configuration per PARTID. If these are in a class we could use,
+	 * reduce the supported partid_max to match the number of intpartid.
+	 * If the class is unknown, just ignore it.
+	 */
+	if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+	    class->type != MPAM_CLASS_UNKNOWN) {
+		u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+		u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+		mpam_set_feature(mpam_feat_partid_nrw, props);
+		msc->partid_max = min(msc->partid_max, partid_max);
+	}
 }
 
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -851,12 +928,28 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 				      struct mpam_config *cfg)
 {
+	u32 pri_val = 0;
+	u16 cmax = MPAMCFG_CMAX_CMAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
+	u16 dspri = GENMASK(rprops->dspri_wd, 0);
+	u16 intpri = GENMASK(rprops->intpri_wd, 0);
 
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
+	if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+		/* Update the intpartid mapping */
+		mpam_write_partsel_reg(msc, INTPARTID,
+				       MPAMCFG_INTPARTID_INTERNAL | partid);
+
+		/*
+		 * Then switch to the 'internal' partid to update the
+		 * configuration.
+		 */
+		__mpam_intpart_sel(ris->ris_idx, partid, msc);
+	}
+
 	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
 	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
 		if (cfg->reset_cpbm)
@@ -886,6 +979,32 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 	if (mpam_has_feature(mpam_feat_mbw_prop, rprops) &&
 	    mpam_has_feature(mpam_feat_mbw_prop, cfg))
 		mpam_write_partsel_reg(msc, MBW_PROP, 0);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+		mpam_write_partsel_reg(msc, CMAX, cmax);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+		mpam_write_partsel_reg(msc, CMIN, 0);
+
+	if (mpam_has_feature(mpam_feat_cmax_cassoc, rprops))
+		mpam_write_partsel_reg(msc, CASSOC, MPAMCFG_CASSOC_CASSOC);
+
+	if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+	    mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+		/* aces high? */
+		if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+			intpri = 0;
+		if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+			dspri = 0;
+
+		if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+		if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+		mpam_write_partsel_reg(msc, PRI, pri_val);
+	}
+
 	mutex_unlock(&msc->part_sel_lock);
 }
 
@@ -1314,6 +1433,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
 	return false;
 }
 
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+		return true;
+	return false;
+}
+
 #define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
 	helper(parent) &&						\
 	((helper(child) && (parent)->field != (child)->field) ||	\
@@ -1368,6 +1497,23 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
 	}
 
+	if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+		parent->cmax_wd = child->cmax_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+				     cmax_wd, alias)) {
+		pr_debug("%s took the min cmax_wd\n", __func__);
+		parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+		parent->cassoc_wd = child->cassoc_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+				   cassoc_wd, alias)) {
+		pr_debug("%s cleared cassoc_wd\n", __func__);
+		mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
+		parent->cassoc_wd = 0;
+	}
+
 	/* For num properties, take the minimum */
 	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
 		parent->num_csu_mon = child->num_csu_mon;
@@ -1385,6 +1531,41 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
 	}
 
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+		parent->intpri_wd = child->intpri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+				   intpri_wd, alias)) {
+		pr_debug("%s took the min intpri_wd\n", __func__);
+		parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+		parent->dspri_wd = child->dspri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+				   dspri_wd, alias)) {
+		pr_debug("%s took the min dspri_wd\n", __func__);
+		parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+	}
+
+	/* TODO: alias support for these two */
+	/* {int,ds}pri may not have differing 0-low behaviour */
+	if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+		pr_debug("%s cleared intpri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
+		mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
+	}
+	if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+		pr_debug("%s cleared dspri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
+		mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
+	}
+
 	if (alias) {
 		/* Merge features for aliased resources */
 		parent->features |= child->features;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 17570d9aae9b..326ba9114d70 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -136,25 +136,34 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
  * When we compact the supported features, we don't care what they are.
  * Storing them as a bitmap makes life easy.
  */
-typedef u16 mpam_features_t;
+typedef u32 mpam_features_t;
 
 /* Bits for mpam_features_t */
 enum mpam_device_features {
-	mpam_feat_ccap_part = 0,
+	mpam_feat_cmax_softlim,
+	mpam_feat_cmax_cmax,
+	mpam_feat_cmax_cmin,
+	mpam_feat_cmax_cassoc,
 	mpam_feat_cpor_part,
 	mpam_feat_mbw_part,
 	mpam_feat_mbw_min,
 	mpam_feat_mbw_max,
 	mpam_feat_mbw_prop,
+	mpam_feat_intpri_part,
+	mpam_feat_intpri_part_0_low,
+	mpam_feat_dspri_part,
+	mpam_feat_dspri_part_0_low,
 	mpam_feat_msmon,
 	mpam_feat_msmon_csu,
 	mpam_feat_msmon_csu_capture,
+	mpam_feat_msmon_csu_xcl,
 	mpam_feat_msmon_csu_hw_nrdy,
 	mpam_feat_msmon_mbwu,
 	mpam_feat_msmon_mbwu_capture,
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
 	mpam_feat_msmon_capt,
+	mpam_feat_partid_nrw,
 	MPAM_FEATURE_LAST,
 };
 static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
@@ -165,6 +174,10 @@ struct mpam_props {
 	u16			cpbm_wd;
 	u16			mbw_pbm_bits;
 	u16			bwa_wd;
+	u16			cmax_wd;
+	u16			cassoc_wd;
+	u16			intpri_wd;
+	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
 };
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (20 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:11   ` Jonathan Cameron
  2025-09-10 20:43 ` [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  |  2 ++
 drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index f536ebbcf94e..cf190f896de1 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -340,6 +340,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
 	class->level = level_idx;
 	class->type = type;
 	INIT_LIST_HEAD_RCU(&class->classes_list);
+	ida_init(&class->ida_csu_mon);
+	ida_init(&class->ida_mbwu_mon);
 
 	list_add_rcu(&class->classes_list, &mpam_classes);
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 326ba9114d70..81c4c2bfea3d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -210,6 +210,9 @@ struct mpam_class {
 	/* member of mpam_classes */
 	struct list_head	classes_list;
 
+	struct ida		ida_csu_mon;
+	struct ida		ida_mbwu_mon;
+
 	struct mpam_garbage	garbage;
 };
 
@@ -288,6 +291,38 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_range(&class->ida_csu_mon, 0, cprops->num_csu_mon - 1,
+			       GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+	ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_range(&class->ida_mbwu_mon, 0,
+			       cprops->num_mbwu_mon - 1, GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+	ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
 /* List of all classes - protected by srcu*/
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (21 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-11 15:46   ` Ben Horgan
                     ` (2 more replies)
  2025-09-10 20:43 ` [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
                   ` (6 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.

Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.

CC: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Added XCL support.
 * Merged FLT/CTL constants.
 * a spelling mistake in a comment.
 * moved structrues around.
---
 drivers/resctrl/mpam_devices.c  | 226 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  19 +++
 2 files changed, 245 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index cf190f896de1..1543c33c5d6a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -898,6 +898,232 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+struct mon_read {
+	struct mpam_msc_ris		*ris;
+	struct mon_cfg			*ctx;
+	enum mpam_device_features	type;
+	u64				*val;
+	int				err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				   u32 *flt_val)
+{
+	struct mon_cfg *ctx = m->ctx;
+
+	/*
+	 * For CSU counters its implementation-defined what happens when not
+	 * filtering by partid.
+	 */
+	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
+
+	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
+	if (m->ctx->match_pmg) {
+		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
+	}
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
+
+		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
+			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
+					       ctx->csu_exclude_clean);
+
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+			*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+
+		break;
+	default:
+		return;
+	}
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				    u32 *flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Remove values set by the hardware to prevent apparent mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+				     u32 flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	/*
+	 * Write the ctl_val with the enable bit cleared, reset the counter,
+	 * then enable counter.
+	 */
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, CSU, 0);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	case mpam_feat_msmon_mbwu:
+		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Call with MSC lock held */
+static void __ris_msmon_read(void *arg)
+{
+	u64 now;
+	bool nrdy = false;
+	struct mon_read *m = arg;
+	struct mon_cfg *ctx = m->ctx;
+	struct mpam_msc_ris *ris = m->ris;
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+	if (!mpam_mon_sel_lock(msc)) {
+		m->err = -EIO;
+		return;
+	}
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+	/*
+	 * Read the existing configuration to avoid re-writing the same values.
+	 * This saves waiting for 'nrdy' on subsequent reads.
+	 */
+	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+	clean_msmon_ctl_val(&cur_ctl);
+	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		now = mpam_read_monsel_reg(msc, CSU);
+		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	case mpam_feat_msmon_mbwu:
+		now = mpam_read_monsel_reg(msc, MBWU);
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	default:
+		m->err = -EINVAL;
+		break;
+	}
+	mpam_mon_sel_unlock(msc);
+
+	if (nrdy) {
+		m->err = -EBUSY;
+		return;
+	}
+
+	now = FIELD_GET(MSMON___VALUE, now);
+	*m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+	int err, idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			arg->ris = ris;
+
+			err = smp_call_function_any(&msc->accessibility,
+						    __ris_msmon_read, arg,
+						    true);
+			if (!err && arg->err)
+				err = arg->err;
+			if (err)
+				break;
+		}
+		if (err)
+			break;
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
+	return err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features type, u64 *val)
+{
+	int err;
+	struct mon_read arg;
+	u64 wait_jiffies = 0;
+	struct mpam_props *cprops = &comp->class->props;
+
+	might_sleep();
+
+	if (!mpam_is_enabled())
+		return -EIO;
+
+	if (!mpam_has_feature(type, cprops))
+		return -EOPNOTSUPP;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.ctx = ctx;
+	arg.type = type;
+	arg.val = val;
+	*val = 0;
+
+	err = _msmon_read(comp, &arg);
+	if (err == -EBUSY && comp->class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+	while (wait_jiffies)
+		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+	if (err == -EBUSY) {
+		memset(&arg, 0, sizeof(arg));
+		arg.ctx = ctx;
+		arg.type = type;
+		arg.val = val;
+		*val = 0;
+
+		err = _msmon_read(comp, &arg);
+	}
+
+	return err;
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 81c4c2bfea3d..bb01e7dbde40 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -196,6 +196,22 @@ static inline void mpam_clear_feature(enum mpam_device_features feat,
 	*supported &= ~(1 << feat);
 }
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	u16                     mon;
+	u8                      pmg;
+	bool                    match_pmg;
+	bool			csu_exclude_clean;
+	u32                     partid;
+	enum mon_filter_options opts;
+};
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -343,6 +359,9 @@ void mpam_disable(struct work_struct *work);
 int mpam_apply_config(struct mpam_component *comp, u16 partid,
 		      struct mpam_config *cfg);
 
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features, u64 *val);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (22 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:24   ` Jonathan Cameron
  2025-09-12 15:55   ` Ben Horgan
  2025-09-10 20:43 ` [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
                   ` (5 subsequent siblings)
  29 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Bandwidth counters need to run continuously to correctly reflect the
bandwidth.

The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.

Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Fixed lock/unlock typo.
---
 drivers/resctrl/mpam_devices.c  | 154 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  23 +++++
 2 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 1543c33c5d6a..eeb62ed94520 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -918,6 +918,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
 
 	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
+
 	if (m->ctx->match_pmg) {
 		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
 		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
@@ -972,6 +973,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 				     u32 flt_val)
 {
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 
 	/*
@@ -990,20 +992,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
 		mpam_write_monsel_reg(msc, MBWU, 0);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+
+		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+		if (mbwu_state)
+			mbwu_state->prev_val = 0;
+
 		break;
 	default:
 		return;
 	}
 }
 
+static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+{
+	/* TODO: scaling, and long counters */
+	return GENMASK_ULL(30, 0);
+}
+
 /* Call with MSC lock held */
 static void __ris_msmon_read(void *arg)
 {
-	u64 now;
 	bool nrdy = false;
 	struct mon_read *m = arg;
+	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
 	struct mpam_msc_ris *ris = m->ris;
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1031,11 +1045,30 @@ static void __ris_msmon_read(void *arg)
 		now = mpam_read_monsel_reg(msc, CSU);
 		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
 		now = mpam_read_monsel_reg(msc, MBWU);
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
+
+		if (nrdy)
+			break;
+
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (!mbwu_state)
+			break;
+
+		/* Add any pre-overflow value to the mbwu_state->val */
+		if (mbwu_state->prev_val > now)
+			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+
+		mbwu_state->prev_val = now;
+		mbwu_state->correction += overflow_val;
+
+		/* Include bandwidth consumed before the last hardware reset */
+		now += mbwu_state->correction;
 		break;
 	default:
 		m->err = -EINVAL;
@@ -1048,7 +1081,6 @@ static void __ris_msmon_read(void *arg)
 		return;
 	}
 
-	now = FIELD_GET(MSMON___VALUE, now);
 	*m->val += now;
 }
 
@@ -1261,6 +1293,67 @@ static int mpam_reprogram_ris(void *_arg)
 	return 0;
 }
 
+/* Call with MSC lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+	int i;
+	struct mon_read mwbu_arg;
+	struct mpam_msc_ris *ris = _ris;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		if (ris->mbwu_state[i].enabled) {
+			mwbu_arg.ris = ris;
+			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+			mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+			__ris_msmon_read(&mwbu_arg);
+		}
+	}
+
+	return 0;
+}
+
+/* Call with MSC lock and held */
+static int mpam_save_mbwu_state(void *arg)
+{
+	int i;
+	u64 val;
+	struct mon_cfg *cfg;
+	u32 cur_flt, cur_ctl, mon_sel;
+	struct mpam_msc_ris *ris = arg;
+	struct msmon_mbwu_state *mbwu_state;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		mbwu_state = &ris->mbwu_state[i];
+		cfg = &mbwu_state->cfg;
+
+		if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+			return -EIO;
+
+		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+		val = mpam_read_monsel_reg(msc, MBWU);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+
+		cfg->mon = i;
+		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
+		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+		cfg->partid = FIELD_GET(MSMON_CFG_x_FLT_PARTID, cur_flt);
+		mbwu_state->correction += val;
+		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+		mpam_mon_sel_unlock(msc);
+	}
+
+	return 0;
+}
+
 static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
 {
 	memset(reset_cfg, 0, sizeof(*reset_cfg));
@@ -1335,6 +1428,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 * for non-zero partid may be lost while the CPUs are offline.
 		 */
 		ris->in_reset_state = online;
+
+		if (mpam_is_enabled() && !online)
+			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
 	}
 }
 
@@ -1369,6 +1465,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
 			mpam_reprogram_ris_partid(ris, partid, cfg);
 		}
 		ris->in_reset_state = reset;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
 	}
 }
 
@@ -2091,11 +2190,33 @@ static void mpam_unregister_irqs(void)
 
 static void __destroy_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
 	add_to_garbage(comp->cfg);
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		if (mpam_mon_sel_lock(msc)) {
+			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+				add_to_garbage(ris->mbwu_state);
+			mpam_mon_sel_unlock(msc);
+		}
+	}
 }
 
 static int __allocate_component_cfg(struct mpam_component *comp)
 {
+	int err = 0;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct msmon_mbwu_state *mbwu_state;
+
+	lockdep_assert_held(&mpam_list_lock);
 	mpam_assert_partid_sizes_fixed();
 
 	if (comp->cfg)
@@ -2106,6 +2227,35 @@ static int __allocate_component_cfg(struct mpam_component *comp)
 		return -ENOMEM;
 	init_garbage(comp->cfg);
 
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (!vmsc->props.num_mbwu_mon)
+			continue;
+
+		msc = vmsc->msc;
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->props.num_mbwu_mon)
+				continue;
+
+			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+					     sizeof(*ris->mbwu_state),
+					     GFP_KERNEL);
+			if (!mbwu_state) {
+				__destroy_component_cfg(comp);
+				err = -ENOMEM;
+				break;
+			}
+
+			if (mpam_mon_sel_lock(msc)) {
+				init_garbage(mbwu_state);
+				ris->mbwu_state = mbwu_state;
+				mpam_mon_sel_unlock(msc);
+			}
+		}
+
+		if (err)
+			break;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index bb01e7dbde40..725c2aefa8a2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -212,6 +212,26 @@ struct mon_cfg {
 	enum mon_filter_options opts;
 };
 
+/*
+ * Changes to enabled and cfg are protected by the msc->lock.
+ * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ */
+struct msmon_mbwu_state {
+	bool		enabled;
+	struct mon_cfg	cfg;
+
+	/* The value last read from the hardware. Used to detect overflow. */
+	u64		prev_val;
+
+	/*
+	 * The value to add to the new reading to account for power management,
+	 * and shifts to trigger the overflow interrupt.
+	 */
+	u64		correction;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -304,6 +324,9 @@ struct mpam_msc_ris {
 	/* parent: */
 	struct mpam_vmsc	*vmsc;
 
+	/* msmon mbwu configuration is preserved over reset */
+	struct msmon_mbwu_state	*mbwu_state;
+
 	struct mpam_garbage	garbage;
 };
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (23 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:27   ` Jonathan Cameron
  2025-09-10 20:43 ` [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported James Morse
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

From: Rohit Mathew <rohit.mathew@arm.com>

mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
indicating support for long counters. As of now, a 44 bit counter
represented by HAS_LONG field (bit 30) and a 63 bit counter represented
by LWD (bit 29) can be optionally integrated. Probe for these counters
and set corresponding feature bits if any of these counters are present.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 23 ++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  9 +++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index eeb62ed94520..bae9fa9441dc 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -795,7 +795,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
 		}
 		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
-			bool hw_managed;
+			bool has_long, hw_managed;
 			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
 
 			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
@@ -805,6 +805,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
 				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
 
+			/*
+			 * Treat long counter and its extension, lwd as mutually
+			 * exclusive feature bits. Though these are dependent
+			 * fields at the implementation level, there would never
+			 * be a need for mpam_feat_msmon_mbwu_44counter (long
+			 * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
+			 * bits to be set together.
+			 *
+			 * mpam_feat_msmon_mbwu isn't treated as an exclusive
+			 * bit as this feature bit would be used as the "front
+			 * facing feature bit" for any checks related to mbwu
+			 * monitors.
+			 */
+			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumon_idr);
+			if (props->num_mbwu_mon && has_long) {
+				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumon_idr))
+					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+				else
+					mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+			}
+
 			/* Is NRDY hardware managed? */
 			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
 			if (hw_managed)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 725c2aefa8a2..c190826dfbda 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -158,7 +158,16 @@ enum mpam_device_features {
 	mpam_feat_msmon_csu_capture,
 	mpam_feat_msmon_csu_xcl,
 	mpam_feat_msmon_csu_hw_nrdy,
+
+	/*
+	 * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
+	 * counter would be used. The exact counter used is decided based on the
+	 * status of mpam_feat_msmon_mbwu_44counter/mpam_feat_msmon_mbwu_63counter
+	 * as well.
+	 */
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_44counter,
+	mpam_feat_msmon_mbwu_63counter,
 	mpam_feat_msmon_mbwu_capture,
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (24 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:29   ` Jonathan Cameron
  2025-09-26  4:51   ` Fenghua Yu
  2025-09-10 20:43 ` [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
                   ` (3 subsequent siblings)
  29 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

From: Rohit Mathew <rohit.mathew@arm.com>

If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
the RIS, use long/LWD counter instead of the regular 31 bit mbwu
counter.

Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v1:
 * Only clear OFLOW_STATUS_L on MBWU counters.

Changes since RFC:
 * Commit message wrangling.
 * Refer to 31 bit counters as opposed to 32 bit (registers).
---
 drivers/resctrl/mpam_devices.c | 91 ++++++++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 9 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bae9fa9441dc..3080a81f0845 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -927,6 +927,48 @@ struct mon_read {
 	int				err;
 };
 
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+	int retry = 3;
+	u32 mbwu_l_low;
+	u64 mbwu_l_high1, mbwu_l_high2;
+
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+	do {
+		mbwu_l_high1 = mbwu_l_high2;
+		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+		retry--;
+	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+	if (mbwu_l_high1 == mbwu_l_high2)
+		return (mbwu_l_high1 << 32) | mbwu_l_low;
+	return MSMON___NRDY_L;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
+	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
 static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 				   u32 *flt_val)
 {
@@ -989,6 +1031,9 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 static void clean_msmon_ctl_val(u32 *cur_ctl)
 {
 	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+
+	if (FIELD_GET(MSMON_CFG_x_CTL_TYPE, *cur_ctl) == MSMON_CFG_MBWU_CTL_TYPE_MBWU)
+		*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
 }
 
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -1011,7 +1056,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 	case mpam_feat_msmon_mbwu:
 		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(m->ris))
+			mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
+		else
+			mpam_write_monsel_reg(msc, MBWU, 0);
+
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 
 		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
@@ -1026,8 +1075,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 
 static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
 {
-	/* TODO: scaling, and long counters */
-	return GENMASK_ULL(30, 0);
+	/* TODO: implement scaling counters */
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
+		return GENMASK_ULL(62, 0);
+	else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
+		return GENMASK_ULL(43, 0);
+	else
+		return GENMASK_ULL(30, 0);
 }
 
 /* Call with MSC lock held */
@@ -1069,10 +1123,24 @@ static void __ris_msmon_read(void *arg)
 		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
-		now = mpam_read_monsel_reg(msc, MBWU);
-		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
-			nrdy = now & MSMON___NRDY;
-		now = FIELD_GET(MSMON___VALUE, now);
+		/*
+		 * If long or lwd counters are supported, use them, else revert
+		 * to the 31 bit counter.
+		 */
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			now = mpam_msc_read_mbwu_l(msc);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY_L;
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
+				now = FIELD_GET(MSMON___LWD_VALUE, now);
+			else
+				now = FIELD_GET(MSMON___L_VALUE, now);
+		} else {
+			now = mpam_read_monsel_reg(msc, MBWU);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY;
+			now = FIELD_GET(MSMON___VALUE, now);
+		}
 
 		if (nrdy)
 			break;
@@ -1360,8 +1428,13 @@ static int mpam_save_mbwu_state(void *arg)
 		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
 
-		val = mpam_read_monsel_reg(msc, MBWU);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			val = mpam_msc_read_mbwu_l(msc);
+			mpam_msc_zero_mbwu_l(msc);
+		} else {
+			val = mpam_read_monsel_reg(msc, MBWU);
+			mpam_write_monsel_reg(msc, MBWU, 0);
+		}
 
 		cfg->mon = i;
 		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (25 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:33   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:43 ` [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
                   ` (2 subsequent siblings)
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

resctrl expects to reset the bandwidth counters when the filesystem
is mounted.

To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 47 +++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  5 +++-
 2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3080a81f0845..8254d6190ca2 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1088,9 +1088,11 @@ static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
 static void __ris_msmon_read(void *arg)
 {
 	bool nrdy = false;
+	bool config_mismatch;
 	struct mon_read *m = arg;
 	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
+	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
 	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
@@ -1105,6 +1107,14 @@ static void __ris_msmon_read(void *arg)
 		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
 	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
 
+	if (m->type == mpam_feat_msmon_mbwu) {
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (mbwu_state) {
+			reset_on_next_read = mbwu_state->reset_on_next_read;
+			mbwu_state->reset_on_next_read = false;
+		}
+	}
+
 	/*
 	 * Read the existing configuration to avoid re-writing the same values.
 	 * This saves waiting for 'nrdy' on subsequent reads.
@@ -1112,7 +1122,10 @@ static void __ris_msmon_read(void *arg)
 	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
-	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+	config_mismatch = cur_flt != flt_val ||
+			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+	if (config_mismatch || reset_on_next_read)
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
 
 	switch (m->type) {
@@ -1145,7 +1158,6 @@ static void __ris_msmon_read(void *arg)
 		if (nrdy)
 			break;
 
-		mbwu_state = &ris->mbwu_state[ctx->mon];
 		if (!mbwu_state)
 			break;
 
@@ -1245,6 +1257,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	return err;
 }
 
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+	int idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	if (!mpam_is_enabled())
+		return;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+			continue;
+
+		msc = vmsc->msc;
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+				continue;
+
+			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+				continue;
+
+			ris->mbwu_state[ctx->mon].correction = 0;
+			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+			mpam_mon_sel_unlock(msc);
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c190826dfbda..7cbcafe8294a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -223,10 +223,12 @@ struct mon_cfg {
 
 /*
  * Changes to enabled and cfg are protected by the msc->lock.
- * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ * Changes to reset_on_next_read, prev_val and correction are protected by the
+ * msc's mon_sel_lock.
  */
 struct msmon_mbwu_state {
 	bool		enabled;
+	bool		reset_on_next_read;
 	struct mon_cfg	cfg;
 
 	/* The value last read from the hardware. Used to detect overflow. */
@@ -393,6 +395,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
 
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (26 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:37   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-10 20:43 ` [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
  2025-09-25  7:18 ` [PATCH v2 00/29] arm_mpam: Add basic mpam driver Fenghua Yu
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich,
	Jonathan Cameron

The bitmap reset code has been a source of bugs. Add a unit test.

This currently has to be built in, as the rest of the driver is
builtin.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/Kconfig             | 10 +++++
 drivers/resctrl/mpam_devices.c      |  4 ++
 drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 drivers/resctrl/test_mpam_devices.c

diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index c30532a3a3a4..ef59b3057d5d 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -5,10 +5,20 @@ menuconfig ARM64_MPAM_DRIVER
 	  MPAM driver for System IP, e,g. caches and memory controllers.
 
 if ARM64_MPAM_DRIVER
+
 config ARM64_MPAM_DRIVER_DEBUG
 	bool "Enable debug messages from the MPAM driver"
 	depends on ARM64_MPAM_DRIVER
 	help
 	  Say yes here to enable debug messages from the MPAM driver.
 
+config MPAM_KUNIT_TEST
+	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+	depends on KUNIT=y
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to run tests in the MPAM driver.
+
+	  If unsure, say N.
+
 endif
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8254d6190ca2..2962cd018207 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2662,3 +2662,7 @@ static int __init mpam_msc_driver_init(void)
 }
 /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..3e7058f7601c
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2024 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+	struct mpam_msc fake_msc = {0};
+	u32 *test_result;
+
+	if (!buf)
+		return;
+
+	fake_msc.mapped_hwpage = buf;
+	fake_msc.mapped_hwpage_sz = SZ_16K;
+	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+	mutex_init(&fake_msc.part_sel_lock);
+	mutex_lock(&fake_msc.part_sel_lock);
+
+	test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+	KUNIT_EXPECT_EQ(test, test_result[0], 1);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 1);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	{}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+	.name = "mpam_devices_test_suite",
+	.test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (27 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-09-10 20:43 ` James Morse
  2025-09-12 13:41   ` Jonathan Cameron
                     ` (2 more replies)
  2025-09-25  7:18 ` [PATCH v2 00/29] arm_mpam: Add basic mpam driver Fenghua Yu
  29 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 20:43 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.

Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Waggled some words in comments.
 * Moved a bunch of variables to be global - shuts up a compiler warning.
---
 drivers/resctrl/mpam_internal.h     |   8 +-
 drivers/resctrl/test_mpam_devices.c | 321 ++++++++++++++++++++++++++++
 2 files changed, 328 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 7cbcafe8294a..6119e4573187 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -20,6 +20,12 @@
 
 DECLARE_STATIC_KEY_FALSE(mpam_enabled);
 
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
 static inline bool mpam_is_enabled(void)
 {
 	return static_branch_likely(&mpam_enabled);
@@ -189,7 +195,7 @@ struct mpam_props {
 	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
-};
+} PACKED_FOR_KUNIT;
 
 #define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
 
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 3e7058f7601c..4eca8590c691 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,325 @@
 
 #include <kunit/test.h>
 
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+	struct mpam_props parent = { 0 };
+	struct mpam_props child;
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, false);
+
+	memset(&child, 0, sizeof(child));
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, true);
+
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static struct list_head fake_classes_list;
+static struct mpam_class fake_class = { 0 };
+static struct mpam_component fake_comp1 = { 0 };
+static struct mpam_component fake_comp2 = { 0 };
+static struct mpam_vmsc fake_vmsc1 = { 0 };
+static struct mpam_vmsc fake_vmsc2 = { 0 };
+static struct mpam_msc fake_msc1 = { 0 };
+static struct mpam_msc fake_msc2 = { 0 };
+static struct mpam_msc_ris fake_ris1 = { 0 };
+static struct mpam_msc_ris fake_ris2 = { 0 };
+static struct platform_device fake_pdev = { 0 };
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+#define RESET_FAKE_HIEARCHY()	do {				\
+	INIT_LIST_HEAD(&fake_classes_list);			\
+								\
+	memset(&fake_class, 0, sizeof(fake_class));		\
+	fake_class.level = 3;					\
+	fake_class.type = MPAM_CLASS_CACHE;			\
+	INIT_LIST_HEAD_RCU(&fake_class.components);		\
+	INIT_LIST_HEAD(&fake_class.classes_list);		\
+								\
+	memset(&fake_comp1, 0, sizeof(fake_comp1));		\
+	memset(&fake_comp2, 0, sizeof(fake_comp2));		\
+	fake_comp1.comp_id = 1;					\
+	fake_comp2.comp_id = 2;					\
+	INIT_LIST_HEAD(&fake_comp1.vmsc);			\
+	INIT_LIST_HEAD(&fake_comp1.class_list);			\
+	INIT_LIST_HEAD(&fake_comp2.vmsc);			\
+	INIT_LIST_HEAD(&fake_comp2.class_list);			\
+								\
+	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));		\
+	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));		\
+	INIT_LIST_HEAD(&fake_vmsc1.ris);			\
+	INIT_LIST_HEAD(&fake_vmsc1.comp_list);			\
+	fake_vmsc1.msc = &fake_msc1;				\
+	INIT_LIST_HEAD(&fake_vmsc2.ris);			\
+	INIT_LIST_HEAD(&fake_vmsc2.comp_list);			\
+	fake_vmsc2.msc = &fake_msc2;				\
+								\
+	memset(&fake_ris1, 0, sizeof(fake_ris1));		\
+	memset(&fake_ris2, 0, sizeof(fake_ris2));		\
+	fake_ris1.ris_idx = 1;					\
+	INIT_LIST_HEAD(&fake_ris1.msc_list);			\
+	fake_ris2.ris_idx = 2;					\
+	INIT_LIST_HEAD(&fake_ris2.msc_list);			\
+								\
+	fake_msc1.pdev = &fake_pdev;				\
+	fake_msc2.pdev = &fake_pdev;				\
+								\
+	list_add(&fake_class.classes_list, &fake_classes_list);	\
+} while (0)
+
+	RESET_FAKE_HIEARCHY();
+
+	mutex_lock(&mpam_list_lock);
+
+	/* One Class+Comp, two RIS in one vMSC with common features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't control the same resource,
+	 * mismatched features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with incompatible overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 5;
+	fake_ris2.props.cpbm_wd = 3;
+	fake_ris1.props.mbw_pbm_bits = 5;
+	fake_ris2.props.mbw_pbm_bits = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't control the same resource,
+	 * mismatched features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with overlapping features that need tweaking */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+	fake_ris1.props.bwa_wd = 5;
+	fake_ris2.props.bwa_wd = 3;
+	fake_ris1.props.cmax_wd = 5;
+	fake_ris2.props.cmax_wd = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * RIS with different control properties need to be sanitised so the
+	 * class has the common set of properties.
+	 */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class Two Comp with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class Two Comp with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple components can't control the same resource, mismatched features can
+	 * not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	mutex_unlock(&mpam_list_lock);
+
+#undef RESET_FAKE_HIEARCHY
+}
+
 static void test_mpam_reset_msc_bitmap(struct kunit *test)
 {
 	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -57,6 +376,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
 
 static struct kunit_case mpam_devices_test_cases[] = {
 	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	KUNIT_CASE(test_mpam_enable_merge_features),
+	KUNIT_CASE(test__props_mismatch),
 	{}
 };
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-09-11 10:43   ` Jonathan Cameron
  2025-09-11 10:48     ` Jonathan Cameron
  2025-09-19 16:10     ` James Morse
  2025-09-25  9:32   ` Stanimir Varbanov
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 10:43 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:41 +0000
James Morse <james.morse@arm.com> wrote:

> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT to indicate the subset of CPUs and cache topology that can
> access each MPAM System Component (MSC).
> 
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Hi James,

Sorry I missed v1.  Busy few weeks.

I think one resource leak plus a few suggested changes that
I'm not that bothered about.

Jonathan


> ---
> Changes since v1:
>  * Replaced commit message with wording from Dave.
>  * Fixed a stray plural.
>  * Moved further down in the file to make use of get_pptt() helper.
>  * Added a break to exit the loop early.
> 
> Changes since RFC:
>  * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()
> 
> Changes since RFC:
>  * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
>  * Added missing : in kernel-doc
>  * Made helper return void as this never actually returns an error.
> ---
>  drivers/acpi/pptt.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  3 ++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..1728545d90b2 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -817,3 +817,86 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>  					  ACPI_PPTT_ACPI_IDENTICAL);
>  }

> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor container
> + * @acpi_cpu_id:	The UID of the processor container.
> + * @cpus:		The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 proc_sz;
> +
> +	cpumask_clear(cpus);
> +
> +	table_hdr = acpi_get_pptt();

This calls acpi_get_table() so you need to put it again or every call
to this leaks a reference count.  I messed around with DEFINE_FREE() for this
but it doesn't fit that well as the underlying call doesn't return the table.
This one does though so you could do a pptt specific one.  

Or just acpi_put_table(table_hdr); at exit path from this function.


> +	if (!table_hdr)
> +		return;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
Hmm. Not related to this patch but I have no idea why acpi_get_pptt()
doesn't return a struct acpi_table_pptt as if it did this would be a simple
+ 1 and not require those who only sometimes deal with ACPI code to go
check what that macro actually does!


> +	proc_sz = sizeof(struct acpi_pptt_processor);
Maybe sizeof (*cpu_node) is more helpful to reader.
Also shorter so you could do
	while ((unsigned long)entry + sizeof(*cpu_node) <= table_end)

> +	while ((unsigned long)entry + proc_sz <= table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;

For me, assigning this before checking the type is inelegant.
but the nesting does get deep without it so I guess this is ok maybe, though
I wonder if better reorganized to combine a different bunch of conditions.
I think this is functionally identival.

		if (entry->type == ACPI_PTT_TYPE_PROCESSOR) {
			struct acpi_pptt_processor *cpu_node = 
				(struct acpi_pptt_processor *)entry;
			if ((cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) &&
			    (!acpi_pptt_leaf_node(table_hdr, cpu_node) &&
			    (cpu_node->acpi_processor_id == acpi_cpu_id)) {
				acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
				break;
		
			}
		}
		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
				     entry->length);

More generally I wonder if it is worth adding a for_each_acpi_pptt_entry() macro.
There is some precedence in drivers acpi such as for_each_nhlt_endpoint()

That's probably material for another day though unless you think it brings
enough benefits to do it here.


> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> +			if (!acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +				if (cpu_node->acpi_processor_id == acpi_cpu_id) {
> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> +					break;
> +				}
> +			}
> +		}
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 1c5bb1e887cd..f97a9ff678cc 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
>  int find_acpi_cpu_topology_cluster(unsigned int cpu);
>  int find_acpi_cpu_topology_package(unsigned int cpu);
>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>  #else
>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>  {
> @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  {
>  	return -EINVAL;
>  }
> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> +						     cpumask_t *cpus) { }
>  #endif
>  
>  void acpi_arch_init(void);



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-09-11 10:46   ` Jonathan Cameron
  2025-09-19 16:10     ` James Morse
  2025-09-11 14:08   ` Ben Horgan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 10:46 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:42 +0000
James Morse <james.morse@arm.com> wrote:

> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter.  The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
> 
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
> 
> Gid rid of the levels parameter, which has no remaining purpose.
> 
> Fix acpi_get_cache_info() to match.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-11 10:43   ` Jonathan Cameron
@ 2025-09-11 10:48     ` Jonathan Cameron
  2025-09-19 16:10     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 10:48 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Thu, 11 Sep 2025 11:43:37 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Wed, 10 Sep 2025 20:42:41 +0000
> James Morse <james.morse@arm.com> wrote:
> 
> > The ACPI MPAM table uses the UID of a processor container specified in
> > the PPTT to indicate the subset of CPUs and cache topology that can
> > access each MPAM System Component (MSC).
> > 
> > This information is not directly useful to the kernel. The equivalent
> > cpumask is needed instead.
> > 
> > Add a helper to find the processor container by its id, then walk
> > the possible CPUs to fill a cpumask with the CPUs that have this
> > processor container as a parent.
> > 
> > CC: Dave Martin <dave.martin@arm.com>
> > Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: James Morse <james.morse@arm.com>  
> 
> Hi James,
> 
> Sorry I missed v1.  Busy few weeks.
> 
> I think one resource leak plus a few suggested changes that
> I'm not that bothered about.
Ignore the resource leak. I didn't read acpi_get_pptt() properly.  No bug there.

So consider the comments below, but I'm fine with this as is.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> 
> Jonathan
> 
> 
> > ---
> > Changes since v1:
> >  * Replaced commit message with wording from Dave.
> >  * Fixed a stray plural.
> >  * Moved further down in the file to make use of get_pptt() helper.
> >  * Added a break to exit the loop early.
> > 
> > Changes since RFC:
> >  * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()
> > 
> > Changes since RFC:
> >  * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
> >  * Added missing : in kernel-doc
> >  * Made helper return void as this never actually returns an error.
> > ---
> >  drivers/acpi/pptt.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/acpi.h |  3 ++
> >  2 files changed, 86 insertions(+)
> > 
> > diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> > index 54676e3d82dd..1728545d90b2 100644
> > --- a/drivers/acpi/pptt.c
> > +++ b/drivers/acpi/pptt.c
> > @@ -817,3 +817,86 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> >  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
> >  					  ACPI_PPTT_ACPI_IDENTICAL);
> >  }  
> 
> > +/**
> > + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> > + *                                       processor container
> > + * @acpi_cpu_id:	The UID of the processor container.
> > + * @cpus:		The resulting CPU mask.
> > + *
> > + * Find the specified Processor Container, and fill @cpus with all the cpus
> > + * below it.
> > + *
> > + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> > + * Container, they may exist purely to describe a Private resource. CPUs
> > + * have to be leaves, so a Processor Container is a non-leaf that has the
> > + * 'ACPI Processor ID valid' flag set.
> > + *
> > + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> > + */
> > +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> > +{
> > +	struct acpi_pptt_processor *cpu_node;
> > +	struct acpi_table_header *table_hdr;
> > +	struct acpi_subtable_header *entry;
> > +	unsigned long table_end;
> > +	u32 proc_sz;
> > +
> > +	cpumask_clear(cpus);
> > +
> > +	table_hdr = acpi_get_pptt();  
> 
> This calls acpi_get_table() so you need to put it again or every call
> to this leaks a reference count.  I messed around with DEFINE_FREE() for this
> but it doesn't fit that well as the underlying call doesn't return the table.
> This one does though so you could do a pptt specific one.  
> 
> Or just acpi_put_table(table_hdr); at exit path from this function.
> 
> 
> > +	if (!table_hdr)
> > +		return;
> > +
> > +	table_end = (unsigned long)table_hdr + table_hdr->length;
> > +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> > +			     sizeof(struct acpi_table_pptt));  
> Hmm. Not related to this patch but I have no idea why acpi_get_pptt()
> doesn't return a struct acpi_table_pptt as if it did this would be a simple
> + 1 and not require those who only sometimes deal with ACPI code to go
> check what that macro actually does!
> 
> 
> > +	proc_sz = sizeof(struct acpi_pptt_processor);  
> Maybe sizeof (*cpu_node) is more helpful to reader.
> Also shorter so you could do
> 	while ((unsigned long)entry + sizeof(*cpu_node) <= table_end)
> 
> > +	while ((unsigned long)entry + proc_sz <= table_end) {
> > +		cpu_node = (struct acpi_pptt_processor *)entry;  
> 
> For me, assigning this before checking the type is inelegant.
> but the nesting does get deep without it so I guess this is ok maybe, though
> I wonder if better reorganized to combine a different bunch of conditions.
> I think this is functionally identival.
> 
> 		if (entry->type == ACPI_PTT_TYPE_PROCESSOR) {
> 			struct acpi_pptt_processor *cpu_node = 
> 				(struct acpi_pptt_processor *)entry;
> 			if ((cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) &&
> 			    (!acpi_pptt_leaf_node(table_hdr, cpu_node) &&
> 			    (cpu_node->acpi_processor_id == acpi_cpu_id)) {
> 				acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> 				break;
> 		
> 			}
> 		}
> 		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> 				     entry->length);
> 
> More generally I wonder if it is worth adding a for_each_acpi_pptt_entry() macro.
> There is some precedence in drivers acpi such as for_each_nhlt_endpoint()
> 
> That's probably material for another day though unless you think it brings
> enough benefits to do it here.
> 
> 
> > +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> > +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> > +			if (!acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> > +				if (cpu_node->acpi_processor_id == acpi_cpu_id) {
> > +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> > +					break;
> > +				}
> > +			}
> > +		}
> > +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> > +				     entry->length);
> > +	}
> > +}
> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> > index 1c5bb1e887cd..f97a9ff678cc 100644
> > --- a/include/linux/acpi.h
> > +++ b/include/linux/acpi.h
> > @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
> >  int find_acpi_cpu_topology_cluster(unsigned int cpu);
> >  int find_acpi_cpu_topology_package(unsigned int cpu);
> >  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> > +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> >  #else
> >  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
> >  {
> > @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> >  {
> >  	return -EINVAL;
> >  }
> > +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> > +						     cpumask_t *cpus) { }
> >  #endif
> >  
> >  void acpi_arch_init(void);  
> 



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-09-11 10:59   ` Jonathan Cameron
  2025-09-19 16:10     ` James Morse
  2025-09-11 15:27   ` Lorenzo Pieralisi
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 10:59 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:43 +0000
James Morse <james.morse@arm.com> wrote:

> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
>  * Removed a confusing comment.
>  * Clarified the kernel doc.
> 
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  5 ++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 7af7d62597df..c5f2a51d280b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>  				     entry->length);
>  	}
>  }
> +
> +/*
/**

It's an exposed interface so nice to have formal kernel-doc and automatic
checks that brings.

> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.

Silly question but why do we care if this a unified cache?
It's a bit odd to have a general sounding function fail for split caches.
The handling would have to be more complex but if we really don't want
to do it maybe rename the function to find_acpi_unifiedcache_level_from_id()
and if the general version gets added later we can switch to that.

> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
> + * the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_table_header *table;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));

sizeof(*cache) to me makes this more obvious.

> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-09-10 20:42 ` [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-09-11 11:06   ` Jonathan Cameron
  2025-09-19 16:10     ` James Morse
  2025-10-02  5:03   ` Fenghua Yu
  1 sibling, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 11:06 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:44 +0000
James Morse <james.morse@arm.com> wrote:

> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
> Changes since v1:
>  * Added punctuation to the commit message.
>  * Removed a comment about an alternative implementaion.
>  * Made the loop continue with a warning if a CPU is missing from the PPTT.
> 
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.

Why for this case does it makes sense to not just use acpi_get_pptt()?

Also you don't introduce the acpi_get_table_reg() helper until patch 6.


>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  6 +++++
>  2 files changed, 65 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index c5f2a51d280b..c379a9952b00 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -966,3 +966,62 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>  
>  	return -ENOENT;
>  }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the unified cache

Similar comment to previous patch. If we are going to make this unified only
can we reflect that in the function name.  I worry this will get reused
and that restriction will surprise.


> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> +	cpumask_clear(cpus);
> +
> +	if (IS_ERR(table))
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (WARN_ON_ONCE(!cpu_node))
> +			continue;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));

sizeof(*cache) makes more sense to me.

> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				cpumask_set_cpu(cpu, cpus);
> +		}
> +	}
> +
> +	return 0;
> +}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-09-11 13:17   ` Jonathan Cameron
  2025-09-19 16:11     ` James Morse
  2025-09-11 14:56   ` Lorenzo Pieralisi
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 13:17 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:46 +0000
James Morse <james.morse@arm.com> wrote:

> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>
> 
Hi James,

A few comments inline.  Note I've more or less completely forgotten
what was discussed in RFC 1 so I might well be repeating stuff that
you replied to then.  Always a problem for me with big complex patch sets!

Jonathan

> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..fd9cfa143676
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c


> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and treated as unknown */

are treated?


> +		return 0;
> +	}
> +}

> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int err;
> +
> +	memset(&hid, 0, sizeof(hid));

 = {}; above and skip the memset.

> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (err >= sizeof(uid)) {
> +		pr_debug("Failed to convert uid of device for power management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}

> +
> +static int __init acpi_mpam_parse(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct property_entry props[4]; /* needs a sentinel */
Perhaps move this and res into the loop and use = {};

> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int next_res, next_prop, err = 0;
> +	struct acpi_device *companion;
> +	struct platform_device *pdev;
> +	enum mpam_msc_iface iface;
> +	struct resource res[3];

Add a comment here or a check later on why this is large enough.

> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		if (table_offset > table_end) {
> +			pr_debug("MSC entry overlaps end of ACPI table\n");
> +			break;

That this isn't considered an error is a bit subtle and made me wonder
if there was a use of uninitialized pdev (there isn't because err == 0)
Why not return here?

> +		}
> +
> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the MSC structure. This MSC will still be counted,
> +		 * meaning the MPAM driver can't probe against all MSC, and
> +		 * will never be enabled. There is no way to enable it safely,
> +		 * because we cannot determine safe system-wide partid and pmg
> +		 * ranges in this situation.
> +		 */

This is decidedly paranoid. I'd normally expect the architecture to be based
on assumption that is fine for old software to ignore new fields.  ACPI itself
has fairly firm rules on this (though it goes wrong sometimes :)
I'm guessing there is something out there that made this necessary though so
keep it if you actually need it.

> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (!tbl_msc->mmio_size) {
> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (decode_interface_type(tbl_msc, &iface)) {
> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		next_res = 0;
> +		next_prop = 0;
> +		memset(res, 0, sizeof(res));
> +		memset(props, 0, sizeof(props));
> +
> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);

https://lore.kernel.org/all/20241009124120.1124-13-shiju.jose@huawei.com/
was a proposal to add a DEFINE_FREE() to clean these up.  Might be worth a revisit.
Then Greg was against the use it was put to and asking for an example of where
it helped.  Maybe this is that example.

If you do want to do that, I'd factor out a bunch of the stuff here as a helper
so we can have the clean ownership pass of a return_ptr().  
Similar to what Shiju did here (this is the usecase for platform device that
Greg didn't like).
https://lore.kernel.org/all/20241009124120.1124-14-shiju.jose@huawei.com/

Even without that I think factoring some of this out and hence being able to
do returns on errors and put the if (err) into the loop would be a nice
improvement to readability.

> +		if (!pdev) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		/* Some power management is described in the namespace: */
> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +		if (err > 0 && err < sizeof(uid)) {
> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> +			if (companion)
> +				ACPI_COMPANION_SET(&pdev->dev, companion);
> +			else
> +				pr_debug("MSC.%u: missing namespace entry\n", tbl_msc->identifier);
> +		}
> +
> +		if (iface == MPAM_IFACE_MMIO) {
> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +							       tbl_msc->mmio_size,
> +							       "MPAM:MSC");
> +		} else if (iface == MPAM_IFACE_PCC) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> +								tbl_msc->base_address);
> +			next_prop++;

Why the double increment? Needs a comment if that is right thing to do.

> +		}
> +
> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +		err = platform_device_add_resources(pdev, res, next_res);
> +		if (err)
> +			break;
> +
> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +							tbl_msc->max_nrdy_usec);
> +
> +		/*
> +		 * The MSC's CPU affinity is described via its linked power
> +		 * management device, but only if it points at a Processor or
> +		 * Processor Container.
> +		 */
> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
> +								acpi_id);
> +		}
> +
> +		err = device_create_managed_software_node(&pdev->dev, props,
> +							  NULL);
> +		if (err)
> +			break;
> +
> +		/* Come back later if you want the RIS too */
> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> +		if (err)
> +			break;
> +
> +		err = platform_device_add(pdev);
> +		if (err)
> +			break;
> +	}
> +
> +	if (err)
> +		platform_device_put(pdev);
> +
> +	return err;
> +}
> +
> +int acpi_mpam_count_msc(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
Trivial so feel free to ignore.
Perhaps should aim for consistency.  Whilst I prefer pointers for this stuff
PPTT did use unsigned longs.

> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +		if (tbl_msc->length > table_end - table_offset)
> +			return -EINVAL;
> +		table_offset += tbl_msc->length;
> +
> +		count++;
> +	}
> +
> +	return count;
> +}
> +

Could reorder to put acpi_mpam_parse and this use of it together?

> +/*
> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);

> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index c5fd92cda487..af449964426b 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))

I'd use if (!IS_ERR_OR_NULL(_T)) not because it is functionally necessary but
because it will let the compiler optimize this out if it can tell that in a given
path _T is NULL (I think it was Peter Z who pointed this out in a similar interface
a while back).


I'd like an opinion from Rafael on this in general.



> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-10 20:42 ` [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-09-11 13:35   ` Jonathan Cameron
  2025-09-23 16:41     ` James Morse
  2025-09-17 11:03   ` Ben Horgan
  2025-10-03  3:53   ` Gavin Shan
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 13:35 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:47 +0000
James Morse <james.morse@arm.com> wrote:

> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Hi James,

Various comments inline.  You can ignore the do/while(0)
one but I'll probably forget and send more grumpy comments about it ;)

Jonathan
> ---
> Changes since v1:
>  * Avoid selecting driver on other architectrues.
>  * Removed PCC support stub.
>  * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
>  * Clarified a comment.
>  * Stopped using mpam_num_msc as an id,a and made it atomic.
>  * Size of -1 returned from cache_of_calculate_id()
>  * Renamed some struct members.
>  * Made a bunch of pr_err() dev_err_ocne().
>  * Used more cleanup magic.
>  * Inlined a print message.
>  * Fixed error propagation from mpam_dt_parse_resources().
>  * Moved cache accessibility checks earlier.
> 
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/resctrl/Kconfig         |  14 +++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 180 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  65 ++++++++++++
>  7 files changed, 267 insertions(+)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6487c511bdc6..93e563e1cce4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER if EXPERT

To me that wants a comment as it's unusual.

>  	select ACPI_MPAM if ACPI
>  	help
>  	  Memory System Resource Partitioning and Monitoring (MPAM) is an

> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..c30532a3a3a4
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,14 @@
> +menuconfig ARM64_MPAM_DRIVER
> +	bool "MPAM driver"
> +	depends on ARM64 && ARM64_MPAM && EXPERT
> +	help
> +	  MPAM driver for System IP, e,g. caches and memory controllers.
> +
> +if ARM64_MPAM_DRIVER
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver"
> +	depends on ARM64_MPAM_DRIVER

The depends on should make the if unnecessary.

> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> +
> +endif

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..efc4738e3b4d
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,180 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>

> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	u32 affinity_id;
> +	int err;
> +
> +	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +				       &affinity_id);
> +	if (err)
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +	else
> +		acpi_pptt_get_cpus_from_container(affinity_id,
> +						  &msc->accessibility);
> +
> +	return 0;
> +
> +	return err;

Curious. I'd do a build test after each patch before v3. A couple of
places would have failed or given helpful warnings so far.

> +}

> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	do {

I might well have moaned about this before, but I really dislike a do while(0)
if it doesn't fit on my screen (and my eyesight is poor so that's not this
many lines).  To me a non trivial case of this is almost always a place
where a '_do' function would have made it more readable. 

I'm also not a fan of scoped_guard() plus breaks because it feels like
it is dependent on an implementation detail but maybe it's clearer than this.


> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +		if (!msc) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		mutex_init(&msc->probe_lock);
> +		mutex_init(&msc->part_sel_lock);
> +		mutex_init(&msc->outer_mon_sel_lock);
> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		msc->id = pdev->id;
> +		msc->pdev = pdev;
> +		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +		INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +		err = update_msc_accessibility(msc);
> +		if (err)
> +			break;
> +		if (cpumask_empty(&msc->accessibility)) {
> +			dev_err_once(dev, "MSC is not accessible from any CPU!");
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> +					     &msc->pcc_subspace_id))
> +			msc->iface = MPAM_IFACE_MMIO;
> +		else
> +			msc->iface = MPAM_IFACE_PCC;
> +
> +		if (msc->iface == MPAM_IFACE_MMIO) {
> +			void __iomem *io;
> +
> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +								    &msc_res);
> +			if (IS_ERR(io)) {
> +				dev_err_once(dev, "Failed to map MSC base address\n");
> +				err = PTR_ERR(io);
> +				break;
> +			}
> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +			msc->mapped_hwpage = io;
> +		}
> +
> +		list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +		platform_set_drvdata(pdev, msc);
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +	}
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);

Is it worth bothering to remove?  We failed probe anyway if we got here
and it's not expected to happen on real systems so I'd just leave it around
so that you can exit early above.

I'm also not following why the msc check is relevant if you do want to do
this. Can only get here without msc if the allocation failed. Why would
we leave the driver loaded in only that case?

> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> +		pr_info("Discovered all MSC\n");
> +
> +	return err;
> +}

> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..7c63d590fc98
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>

spinlock.h

> +#include <linux/resctrl.h>

Not spotting anything rsctl yet.  So maybe this belongs later.

> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        all_msc_list;
> +
> +	int			id;

I'd follow (approx) include what you use principles to make later header
shuffling easier. So a forward def for this.

> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			pcc_subspace_id;
> +	struct mbox_client	pcc_cl;
> +	struct pcc_mbox_chan	*pcc_chan;

Forward def or include acpi/pcc.h

> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only taken during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs;
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	/*
> +	 * mon_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_MON_SEL.
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		outer_mon_sel_lock;
> +	raw_spinlock_t		inner_mon_sel_lock;
> +	unsigned long		inner_mon_sel_flags;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity);

Where is this?

> +
> +#endif /* MPAM_INTERNAL_H */



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
  2025-09-11 10:46   ` Jonathan Cameron
@ 2025-09-11 14:08   ` Ben Horgan
  2025-09-19 16:10     ` James Morse
  2025-10-02  3:55   ` Fenghua Yu
  2025-10-03  0:17   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-11 14:08 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter.  The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
> 
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
> 
> Gid rid of the levels parameter, which has no remaining purpose.

Nit: s/Gid/Get/

> 
> Fix acpi_get_cache_info() to match.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> ---
Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
@ 2025-09-11 14:22   ` Jonathan Cameron
  2025-09-26 17:52     ` James Morse
  2025-09-11 16:30   ` Markus Elfring
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 14:22 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:42:48 +0000
James Morse <james.morse@arm.com> wrote:

> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> Add support for creating and destroying structures to allow a hierarchy
> of resources to be created.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Various minor things inline.

Biggest is I think maybe just moving to explicit reference counts
rather than using the empty list for that might end up easier to
read. Mostly because everyone knows what a kref_put() is about.

Obviously a bit pointless in practice, but I prefer not to think too
hard.


> ---
> Changes since v1:
>  * Fixed a comp/vmsc typo.
>  * Removed duplicate description from the commit message.
>  * Moved parenthesis in the add_to_garbage() macro.
>  * Check for out of range ris_idx when creating ris.
>  * Removed GFP as probe_lock is no longer a spin lock.
>  * Removed alloc flag as ended up searching the lists itself.
>  * Added a comment about affinity masks not overlapping.
> 
> Changes since RFC:
>  * removed a pr_err() debug message that crept in.
> ---
>  drivers/resctrl/mpam_devices.c  | 406 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  90 +++++++
>  include/linux/arm_mpam.h        |   8 +-
>  3 files changed, 493 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index efc4738e3b4d..c7f4981b3545 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -18,7 +18,6 @@
>  #include <linux/printk.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> -#include <linux/srcu.h>

Why does this include no longer make sense?


>  #include <linux/types.h>
>  
>  #include "mpam_internal.h"
> @@ -31,7 +30,7 @@
>  static DEFINE_MUTEX(mpam_list_lock);
>  static LIST_HEAD(mpam_all_msc);
>  
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;

...

> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)

Whilst this obviously works, I'd rather pass garbage to init_garbage
instead of the containing structure (where type varies)

> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
> +	if (!vmsc)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(vmsc);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}

> +static struct mpam_component *
> +mpam_component_get(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		if (comp->comp_id == id)
> +			return comp;
> +	}
> +
> +	return mpam_component_alloc(class, id);
> +}

> +static struct mpam_class *
> +mpam_class_get(u8 level_idx, enum mpam_class_types type)
> +{
> +	bool found = false;
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		if (class->type == type && class->level == level_idx) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (found)
> +		return class;

Maybe this gets more complex later, but if it doesn't, just return class where you set
found above.  Matches with pattern used in mpam_component_get() above.


> +
> +	return mpam_class_alloc(level_idx, type);
> +}


> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	/*
> +	 * It is assumed affinities don't overlap. If they do the class becomes
> +	 * unusable immediately.
> +	 */
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, &msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
> +	add_to_garbage(ris);
> +	ris->garbage.pdev = pdev;
> +
> +	if (list_empty(&vmsc->ris))

See below. I think it is worth using an explicit reference count even
though that will only reach zero when the list is empty.

> +		mpam_vmsc_destroy(vmsc);
> +}


> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
> +		return -EINVAL;
> +
> +	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
> +		return -EBUSY;
> +
> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(ris);
> +
> +	class = mpam_class_get(class_id, type);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);
> +
> +	comp = mpam_component_get(class, component_id);
> +	if (IS_ERR(comp)) {
> +		if (list_empty(&class->components))
> +			mpam_class_destroy(class);

Maybe just reference count the classes with a kref and do a put here?

> +		return PTR_ERR(comp);
> +	}
> +
> +	vmsc = mpam_vmsc_get(comp, msc);
> +	if (IS_ERR(vmsc)) {
> +		if (list_empty(&comp->vmsc))
> +			mpam_comp_destroy(comp);
Similar to classes I wonder if simple reference counting is cleaner.
> +		return PTR_ERR(vmsc);
> +	}
> +
> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> +	if (err) {
> +		if (list_empty(&vmsc->ris))

and here as well.

> +			mpam_vmsc_destroy(vmsc);
> +		return err;
> +	}
> +
> +	ris->ris_idx = ris_idx;
> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> +	ris->vmsc = vmsc;
> +
> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +
> +	return 0;
> +}

>  /*
>   * An MSC can control traffic from a set of CPUs, but may only be accessible
>   * from a (hopefully wider) set of CPUs. The common reason for this is power
> @@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>  		return;
>  
>  	mutex_lock(&mpam_list_lock);
> -	platform_set_drvdata(pdev, NULL);
> -	list_del_rcu(&msc->all_msc_list);
> -	synchronize_srcu(&mpam_srcu);
> +	mpam_msc_destroy(msc);

I'd suggest introducing mpam_msc_destroy() in the earlier patch. Doesn't make a huge
difference but 2 out of 3 things removed here would be untouched if you do that.

>  	mutex_unlock(&mpam_list_lock);
> +
> +	mpam_free_garbage();
>  }





^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
  2025-09-11 13:17   ` Jonathan Cameron
@ 2025-09-11 14:56   ` Lorenzo Pieralisi
  2025-09-19 16:11     ` James Morse
  2025-09-16 13:17   ` [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready Zeng Heng
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 200+ messages in thread
From: Lorenzo Pieralisi @ 2025-09-11 14:56 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, Sep 10, 2025 at 08:42:46PM +0000, James Morse wrote:
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>

I have not noticed anything wrong - just some remarks/questions
below.

> ---
> Changes since v1:
>  * Whitespace.
>  * Gave GLOBAL_AFFINITY a pre-processor'd name.
>  * Fixed assumption that there are zero functional dependencies.
>  * Bounds check walking of the MSC RIS.
>  * More bounds checking in the main table walk.
>  * Check for nonsense numbers of function dependencies.
>  * Smattering of pr_debug() to help folk feeding line-noise to the parser.
>  * Changed the comment flavour on the SPDX string.
>  * Removed additional table check.
>  * More comment wrangling.
> 
> Changes since RFC:
>  * Used DEFINE_RES_IRQ_NAMED() and friends macros.
>  * Additional error handling.
>  * Check for zero sized MSC.
>  * Allow table revisions greater than 1. (no spec for revision 0!)
>  * Use cleanup helpers to retrive ACPI tables, which allows some functions
>    to be folded together.
> ---
>  arch/arm64/Kconfig          |   1 +
>  drivers/acpi/arm64/Kconfig  |   3 +
>  drivers/acpi/arm64/Makefile |   1 +
>  drivers/acpi/arm64/mpam.c   | 361 ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c       |   2 +-
>  include/linux/acpi.h        |  12 ++
>  include/linux/arm_mpam.h    |  48 +++++
>  7 files changed, 427 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/acpi/arm64/mpam.c
>  create mode 100644 include/linux/arm_mpam.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4be8a13505bf..6487c511bdc6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>  	help
>  	  Memory System Resource Partitioning and Monitoring (MPAM) is an
>  	  optional extension to the Arm architecture that allows each
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>  
>  config ACPI_APMT
>  	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>  obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>  obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>  obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>  obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>  obj-$(CONFIG_ARM_AMBA)		+= amba.o
>  obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..fd9cfa143676
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,361 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)

Nit: For consistency, not sure why MODE/TYPE_MASK and
AFFINITY_{TYPE/VALID} aren't but that's not a big deal.

BIT(3) to be consistent with the table should be (?)

#define ACPI_MPAM_MSC_AFFINITY_TYPE_MASK	BIT(3)

to match the Table 5 (and then add defines for possible values).

> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)

ACPI_MPAM_MSC_AFFINITY_VALID_MASK ? (or remove _MASK from IRQ_{MODE/TYPE) ?

Just noticed - feel free to ignore this altogether.

> +static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
> +				   u32 flags, int *irq,
> +				   u32 processor_container_uid)
> +{
> +	int sense;
> +
> +	if (!intid)
> +		return false;
> +
> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return false;
> +
> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
> +
> +	if (16 <= intid && intid < 32 && processor_container_uid != GLOBAL_AFFINITY) {
> +		pr_err_once("Partitioned interrupts not supported\n");
> +		return false;
> +	}

Please add a comment to explain what you mean here (ie PPIs partitioning
isn't supported). I will have to change this code anyway to cater for the
GICv5 interrupt model given the hardcoded intid values.

Is the condition allowed by the MPAM architecture so the MPAM table are
legitimate (but not supported in Linux) ?

> +
> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> +	if (*irq <= 0) {
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> +			    intid);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, aff;
> +	int irq;
> +
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->overflow_interrupt_affinity;
> +	else
> +		aff = GLOBAL_AFFINITY;
> +	if (acpi_mpam_register_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> +
> +	flags = tbl_msc->error_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->error_interrupt_affinity;
> +	else
> +		aff = GLOBAL_AFFINITY;
> +	if (acpi_mpam_register_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	int i, err;
> +	char *ptr, *table_end;
> +	struct acpi_mpam_resource_node *resource;
> +
> +	ptr = (char *)(tbl_msc + 1);
> +	table_end = ptr + tbl_msc->length;
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		u64 max_deps, remaining_table;
> +
> +		if (ptr + sizeof(*resource) > table_end)
> +			return -EINVAL;
> +
> +		resource = (struct acpi_mpam_resource_node *)ptr;
> +
> +		remaining_table = table_end - ptr;
> +		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
> +		if (resource->num_functional_deps > max_deps) {
> +			pr_debug("MSC has impossible number of functional dependencies\n");
> +			return -EINVAL;
> +		}
> +
> +		err = acpi_mpam_parse_resource(msc, resource);
> +		if (err)
> +			return err;
> +
> +		ptr += sizeof(*resource);
> +		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
> +	}
> +
> +	return 0;
> +}
> +
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int err;
> +
> +	memset(&hid, 0, sizeof(hid));

Jonathan already commented on this.

> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (err >= sizeof(uid)) {
> +		pr_debug("Failed to convert uid of device for power management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);

Is !buddy a FW error to be logged ?

> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case 0:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case 0xa:

Worth giving those constants 0x0,0xa a name ?

> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int __init acpi_mpam_parse(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct property_entry props[4]; /* needs a sentinel */
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int next_res, next_prop, err = 0;
> +	struct acpi_device *companion;
> +	struct platform_device *pdev;
> +	enum mpam_msc_iface iface;
> +	struct resource res[3];
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		if (table_offset > table_end) {
> +			pr_debug("MSC entry overlaps end of ACPI table\n");
> +			break;
> +		}
> +
> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the MSC structure. This MSC will still be counted,
> +		 * meaning the MPAM driver can't probe against all MSC, and
> +		 * will never be enabled. There is no way to enable it safely,
> +		 * because we cannot determine safe system-wide partid and pmg
> +		 * ranges in this situation.
> +		 */
> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
> +			continue;
> +		}

This is a bit obscure - the comment too requires some explanation
("This MSC will still be counted", not very clear what that means).

> +
> +		if (!tbl_msc->mmio_size) {
> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (decode_interface_type(tbl_msc, &iface)) {
> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		next_res = 0;
> +		next_prop = 0;
> +		memset(res, 0, sizeof(res));
> +		memset(props, 0, sizeof(props));
> +
> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
> +		if (!pdev) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		/* Some power management is described in the namespace: */
> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +		if (err > 0 && err < sizeof(uid)) {
> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> +			if (companion)
> +				ACPI_COMPANION_SET(&pdev->dev, companion);
> +			else
> +				pr_debug("MSC.%u: missing namespace entry\n", tbl_msc->identifier);

Here you are linking the platform device to a namespace companion.
That's what will make sure that a) the ACPI namespace scan won't add an
additional platform device for ARMHAA5C and b) MSIs works -> through
the related IORT named component.

Correct ?

> +		}
> +
> +		if (iface == MPAM_IFACE_MMIO) {
> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +							       tbl_msc->mmio_size,
> +							       "MPAM:MSC");
> +		} else if (iface == MPAM_IFACE_PCC) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> +								tbl_msc->base_address);
> +			next_prop++;
> +		}
> +
> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);

Do we _really_ have to resolve IRQs here or we can postpone them at
driver probe time like RIS resources (if I understand correctly how
it is done - by copying table data into platform data) ?

GICv5 hat in mind - good as it is for GICv3.

> +		err = platform_device_add_resources(pdev, res, next_res);
> +		if (err)
> +			break;
> +
> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +							tbl_msc->max_nrdy_usec);
> +
> +		/*
> +		 * The MSC's CPU affinity is described via its linked power
> +		 * management device, but only if it points at a Processor or
> +		 * Processor Container.
> +		 */
> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
> +								acpi_id);
> +		}
> +
> +		err = device_create_managed_software_node(&pdev->dev, props,
> +							  NULL);
> +		if (err)
> +			break;
> +
> +		/* Come back later if you want the RIS too */

I read this as: copy table data to the device so that RIS resources can
be parsed later.

Right ? I think it is worth updating the comment to clarify it.

> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> +		if (err)
> +			break;
> +
> +		err = platform_device_add(pdev);
> +		if (err)
> +			break;
> +	}
> +
> +	if (err)
> +		platform_device_put(pdev);

I won't comment on the clean-up here as Jonathan did it already.

> +
> +	return err;
> +}
> +
> +int acpi_mpam_count_msc(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +		if (tbl_msc->length > table_end - table_offset)
> +			return -EINVAL;
> +		table_offset += tbl_msc->length;
> +
> +		count++;
> +	}
> +
> +	return count;
> +}
> +
> +/*
> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
> + * initialised from postcore_initcall().

I think we will end up setting in stone init ordering for these
components created out of static tables (I mean sequencing them in a
centralized way) but if that works for the current driver that's fine
for the time being.

> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index fa9bb8c8ce95..835e3795ede3 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
>  	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>  	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>  	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> -	ACPI_SIG_NBFT };
> +	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index c5fd92cda487..af449964426b 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))

Jonathan commented on this already - worth getting Rafael's opinion,
I am fine either way.

I have not found anything that should block this code (other than code
that I know I will have to rework when GICv5 ACPI support gets in) so:

Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..3d6c39c667c3
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +#define GLOBAL_AFFINITY		~0
> +
> +struct mpam_msc;
> +
> +enum mpam_msc_iface {
> +	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
> +	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
> +};
> +
> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +/* Parse the ACPI description of resources entries for this MSC. */
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else
> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +					    struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
> +#endif
> +
> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* __LINUX_ARM_MPAM_H */
> -- 
> 2.39.5
> 


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-09-10 20:42 ` [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-09-11 15:00   ` Jonathan Cameron
  2025-10-17 18:53     ` James Morse
  2025-09-12  7:33   ` Markus Elfring
  2025-10-06 23:25   ` Gavin Shan
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 15:00 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:42:49 +0000
James Morse <james.morse@arm.com> wrote:

> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
Hi James,

I'm not sure why some things ended up in this patch and others didn't.
MPAMCFG_EN for example isn't here.

If doing a separate 'register defines' patch I'd do the lot as of
the current spec.

> 
> Link: https://developer.arm.com/documentation/ihi0099/latest/

Maybe link a specific version? I'm not sure if I'm looking at is the same one
as you were when you wrote this. That will become worse over time.  I'm definitely
seeing extra bits in a number of registers.

I'm lazy enough not to go see if the cover letter calls out a version.

Anyhow, various small things on ordering that would have made this easier to review
against the spec.

Jonathan


> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v1:
>  * Whitespace.
>  * Added constants for CASSOC and XCL.
>  * Merged FLT/CTL defines.
>  * Fixed MSMON_CFG_CSU_CTL_TYPE_CSU definition.
> 
> Changes since RFC:
>  * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
>  * Whitepsace churn.
>  * Cite a more recent document.
>  * Removed some stale feature, fixed some names etc.
> ---
>  drivers/resctrl/mpam_internal.h | 267 ++++++++++++++++++++++++++++++++
>  1 file changed, 267 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 02e9576ece6b..109f03df46c2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -152,4 +152,271 @@ extern struct list_head mpam_classes;
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
>  
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/

Maybe be friendly and give some section number references.

> + */
> +#define MPAM_ARCHITECTURE_V1    0x10
> +
> +/* Memory mapped control pages: */
> +/* ID Register offsets in the memory mapped page */
> +#define MPAMF_IDR		0x0000  /* features id register */
> +#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */

Any reason this one is out of order with respect to the addresses?

> +#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
> +#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
> +#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
> +#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
> +#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
> +#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
> +#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
> +#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
> +#define MPAMF_IIDR		0x0018  /* implementer id register */
> +#define MPAMF_AIDR		0x0020  /* architectural id register */

These 3 as well. I'm not sure what the ordering is conveying but probably easier to just
to put them in address order.

There are some other cases of this below.


> +/* MPAMF_IIDR - MPAM implementation ID register */
> +#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
> +#define MPAMF_IIDR_PRODUCTID_SHIFT	20
> +#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
> +#define MPAMF_IIDR_VARIANT_SHIFT	16
> +#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
> +#define MPAMF_IIDR_REVISON_SHIFT	12
> +#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
> +#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0
I'd expect to see FIELD_GET/ PREP rather than use of shifts. Can we drop the defines?

Pick an order for reg field definitions. Until here they've been low to high.


> +/* Error conditions in accessing memory mapped registers */
> +#define MPAM_ERRCODE_NONE			0
> +#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
> +#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
> +#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
> +#define MPAM_ERRCODE_REQ_PMG_RANGE		4
> +#define MPAM_ERRCODE_MONITOR_RANGE		5
> +#define MPAM_ERRCODE_INTPARTID_RANGE		6
> +#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7

Seems there are more in latest spec..
> +
> +/*
> + * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
> + *                    usage monitor control register
> + * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
> + *                     bandwidth usage monitor control register
> + */
> +#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
> +#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
> +#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
> +#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
> +#define MSMON_CFG_x_CTL_SCLEN			BIT(19)
On the spec I'm looking at this is reserved in CSU_CTL

> +#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
> +#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
> +#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
> +#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
> +#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
> +#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
> +#define MSMON_CFG_x_CTL_EN			BIT(31)

I guess this combining of definitions will show some advante in common code
later but right now it seems confusing given not all bits are present in both.

> +
> +#define MSMON_CFG_MBWU_CTL_TYPE_MBWU			0x42
> +#define MSMON_CFG_CSU_CTL_TYPE_CSU			0x43
> +
> +/*
> + * MSMON_CFG_CSU_FLT -  Memory system performance monitor configure cache storage
> + *                      usage monitor filter register
> + * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
> + *                      bandwidth usage monitor filter register
> + */
> +#define MSMON_CFG_x_FLT_PARTID			GENMASK(15, 0)
> +#define MSMON_CFG_x_FLT_PMG			GENMASK(23, 16)
> +
> +#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
> +#define MSMON_CFG_CSU_FLT_XCL			BIT(31)
> +
> +/*
> + * MSMON_CSU - Memory system performance monitor cache storage usage monitor
> + *            register
> + * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
> + *                     capture register
> + * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
> + *               monitor register
> + * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
> + *                     capture register
> + */
> +#define MSMON___VALUE		GENMASK(30, 0)
> +#define MSMON___NRDY		BIT(31)
> +#define MSMON___NRDY_L		BIT(63)
> +#define MSMON___L_VALUE		GENMASK(43, 0)
> +#define MSMON___LWD_VALUE	GENMASK(62, 0)
> +
> +/*
> + * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
> + *                  generation register
> + */
> +#define MSMON_CAPT_EVNT_NOW	BIT(0)
> +
>  #endif /* MPAM_INTERNAL_H */



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-09-11 15:07   ` Jonathan Cameron
  2025-09-29 17:44     ` James Morse
  2025-09-12 10:42   ` Ben Horgan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 15:07 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen

On Wed, 10 Sep 2025 20:42:50 +0000
James Morse <james.morse@arm.com> wrote:

> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
> 
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed as each CPU's online call is made.
> 
> This adds the low-level MSC register accessors.
> 
> Once all MSCs reported by the firmware have been probed from a CPU in
> their respective cpu-affinity set, the probe-time cpuhp callbacks are
> replaced.  The replacement callbacks will ultimately need to handle
> save/restore of the runtime MSC state across power transitions, but for
> now there is nothing to do in them: so do nothing.
> 
> The architecture's context switch code will be enabled by a static-key,
> this can be set by mpam_enable(), but must be done from process context,
> not a cpuhp callback because both take the cpuhp lock.
> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
> to test if all the MSCs have been probed. If probing fails, mpam_disable()
> is scheduled to unregister the cpuhp callbacks and free memory.
> 
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Trivial suggestion inline. Either way
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> +
> +/* Before mpam is enabled, try to probe new MSC */
> +static int mpam_discovery_cpu_online(unsigned int cpu)
> +{
> +	int err = 0;
> +	struct mpam_msc *msc;
> +	bool new_device_probed = false;
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		mutex_lock(&msc->probe_lock);
> +		if (!msc->probed)
> +			err = mpam_msc_hw_probe(msc);
> +		mutex_unlock(&msc->probe_lock);
> +
> +		if (!err)
> +			new_device_probed = true;
> +		else
> +			break;
Unless this going to get more complex why not

		if (err)
			break;

		new_device_probed = true;
> +	}
> +
> +	if (new_device_probed && !err)
> +		schedule_work(&mpam_enable_work);
> +	if (err) {
> +		mpam_disable_reason = "error during probing";
> +		schedule_work(&mpam_broken_work);
> +	}
> +
> +	return err;
> +}

> +static void mpam_enable_once(void)
> +{
> +	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
> +
> +	pr_info("MPAM enabled\n");

Feels too noisy given it should be easy enough to tell. pr_dbg() perhaps.


> +}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-09-10 20:42 ` [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
@ 2025-09-11 15:18   ` Jonathan Cameron
  2025-09-29 17:44     ` James Morse
  2025-09-12 11:11   ` Ben Horgan
  2025-10-03 18:58   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 15:18 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:51 +0000
James Morse <james.morse@arm.com> wrote:

> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may also have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> The limits from requestors (e.g. CPUs) also need taking into account.
> 
> While doing this, RIS entries that firmware didn't describe are created
> under MPAM_CLASS_UNKNOWN.
> 
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
Trivial stuff inline.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
> Changes since v1:
>  * Change to lock ordering now that the list-lock mutex isn't held from
>    the cpuhp call.
>  * Removed irq-unmaksed assert in requestor register.
>  * Changed captialisation in print message.
> ---
>  drivers/resctrl/mpam_devices.c  | 150 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |   6 ++
>  include/linux/arm_mpam.h        |  14 +++
>  3 files changed, 169 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index c265376d936b..24dc81c15ec8 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c


> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
> +{
> +	int err = 0;
> +
> +	spin_lock(&partid_max_lock);
guard() perhaps so you can return early in the error pat and avoid
need for local variable err.

> +	if (!partid_max_init) {
> +		mpam_partid_max = partid_max;
> +		mpam_pmg_max = pmg_max;
> +		partid_max_init = true;
> +	} else if (!partid_max_published) {
> +		mpam_partid_max = min(mpam_partid_max, partid_max);
> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
> +	} else {
> +		/* New requestors can't lower the values */
> +		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
> +			err = -EBUSY;
> +	}
> +	spin_unlock(&partid_max_lock);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(mpam_register_requestor);

> @@ -470,9 +547,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  	return err;
>  }
>  
> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> +						   u8 ris_idx)
> +{
> +	int err;
> +	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (!test_bit(ris_idx, &msc->ris_idxs)) {
> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> +					     0, 0);
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +
> +	list_for_each_entry(ris, &msc->ris, msc_list) {
> +		if (ris->ris_idx == ris_idx) {
> +			found = ris;
I'd return ris;

Then can do return ERR_PTR(-ENOENT) below and not bother with found.

Ignore if this gets more complex later.

> +			break;
> +		}
> +	}
> +
> +	return found;
> +}

> @@ -675,9 +813,18 @@ static struct platform_driver mpam_msc_driver = {
>  
>  static void mpam_enable_once(void)
>  {
> +	/*
> +	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> +	 * longer change.
> +	 */
> +	spin_lock(&partid_max_lock);
> +	partid_max_published = true;
> +	spin_unlock(&partid_max_lock);
> +
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>  
> -	pr_info("MPAM enabled\n");
> +	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
> +	       mpam_partid_max + 1, mpam_pmg_max + 1);

Not sure why pr_info before and printk now.  

>  }




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-10 20:42 ` [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-09-11 15:24   ` Jonathan Cameron
  2025-09-29 17:44     ` James Morse
  2025-09-11 15:31   ` Ben Horgan
  2025-10-05  0:09   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 15:24 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:52 +0000
James Morse <james.morse@arm.com> wrote:

> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms
> where MSC are not accesible from every CPU. This makes an irqsave
> spinlock the obvious lock to protect these registers. On systems with SCMI
> mailboxes it must be able to sleep, meaning a mutex must be used. The
> SCMI platforms can't support an overflow interrupt.
> 
> Clearly these two can't exist for one MSC at the same time.
> 
> Add helpers for the MON_SEL locking. The outer lock must be taken in a
> pre-emptible context before the inner lock can be taken. On systems with
> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
> will fail to be 'taken' if the caller is unable to sleep. This will allow
> callers to fail without having to explicitly check the interface type of
> each MSC.

Comments talk about outer locks, but not actually seeing that in the current code.

> 
> Signed-off-by: James Morse <james.morse@arm.com>

 ---
> Change since v1:
>  * Made accesses to outer_lock_held READ_ONCE() for torn values in the failure
>    case.
Comment on wrong patch?  No READ_ONCE() in here.

> ---
>  drivers/resctrl/mpam_devices.c  |  3 +--
>  drivers/resctrl/mpam_internal.h | 37 +++++++++++++++++++++++++++++----
>  2 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 24dc81c15ec8..a26b012452e2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -748,8 +748,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  
>  		mutex_init(&msc->probe_lock);
>  		mutex_init(&msc->part_sel_lock);
> -		mutex_init(&msc->outer_mon_sel_lock);
> -		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		mpam_mon_sel_lock_init(msc);
>  		msc->id = pdev->id;
>  		msc->pdev = pdev;
>  		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 828ce93c95d5..4cc44d4e21c4 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -70,12 +70,17 @@ struct mpam_msc {
>  
>  	/*
>  	 * mon_sel_lock protects access to the MSC hardware registers that are
> -	 * affected by MPAMCFG_MON_SEL.
> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> +	 * Access to mon_sel is needed from both process and interrupt contexts,
> +	 * but is complicated by firmware-backed platforms that can't make any
> +	 * access unless they can sleep.
> +	 * Always use the mpam_mon_sel_lock() helpers.
> +	 * Accessed to mon_sel need to be able to fail if they occur in the wrong
> +	 * context.
>  	 * If needed, take msc->probe_lock first.
>  	 */
> -	struct mutex		outer_mon_sel_lock;
> -	raw_spinlock_t		inner_mon_sel_lock;
> -	unsigned long		inner_mon_sel_flags;
> +	raw_spinlock_t		_mon_sel_lock;
> +	unsigned long		_mon_sel_flags;
>  
>  	void __iomem		*mapped_hwpage;
>  	size_t			mapped_hwpage_sz;
> @@ -83,6 +88,30 @@ struct mpam_msc {
>  	struct mpam_garbage	garbage;
>  };
>  
> +/* Returning false here means accesses to mon_sel must fail and report an error. */
> +static inline bool __must_check mpam_mon_sel_lock(struct mpam_msc *msc)
> +{
> +	WARN_ON_ONCE(msc->iface != MPAM_IFACE_MMIO);
> +
> +	raw_spin_lock_irqsave(&msc->_mon_sel_lock, msc->_mon_sel_flags);
> +	return true;
> +}
> +
> +static inline void mpam_mon_sel_unlock(struct mpam_msc *msc)
> +{
> +	raw_spin_unlock_irqrestore(&msc->_mon_sel_lock, msc->_mon_sel_flags);
> +}
> +
> +static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> +{
> +	lockdep_assert_held_once(&msc->_mon_sel_lock);
> +}
> +
> +static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
> +{
> +	raw_spin_lock_init(&msc->_mon_sel_lock);
> +}
> +
>  struct mpam_class {
>  	/* mpam_components in this class */
>  	struct list_head	components;



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
  2025-09-11 10:59   ` Jonathan Cameron
@ 2025-09-11 15:27   ` Lorenzo Pieralisi
  2025-09-19 16:10     ` James Morse
  2025-10-02  4:30   ` Fenghua Yu
  2025-10-03  0:23   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Lorenzo Pieralisi @ 2025-09-11 15:27 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, Sep 10, 2025 at 08:42:43PM +0000, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
>  * Removed a confusing comment.
>  * Clarified the kernel doc.
> 
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  5 ++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 7af7d62597df..c5f2a51d280b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>  				     entry->length);
>  	}
>  }
> +
> +/*
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,

* The PPTT table must be rev 3 or later.

> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
> + * the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_table_header *table;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;

Same comment as in another patch - don't think you want to stop parsing
here.

> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..5bdca5546697 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1542,6 +1542,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
>  int find_acpi_cpu_topology_package(unsigned int cpu);
>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>  void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> +int find_acpi_cache_level_from_id(u32 cache_id);
>  #else
>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>  {
> @@ -1565,6 +1566,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  }
>  static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>  						     cpumask_t *cpus) { }
> +static inline int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	return -EINVAL;

return -ENOENT;

Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

> +}
>  #endif
>  
>  void acpi_arch_init(void);
> -- 
> 2.39.5
> 


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-10 20:42 ` [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-09-11 15:29   ` Jonathan Cameron
  2025-09-29 17:45     ` James Morse
  2025-09-11 15:37   ` Ben Horgan
  2025-10-05  0:53   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-11 15:29 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:53 +0000
James Morse <james.morse@arm.com> wrote:

> Expand the probing support with the control and monitor types
> we can use with resctrl.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>

A few trivial things inline.
LGTM
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Out of time for today, fingers crossed I can get to the others tomorrow.

J
>  static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  {
>  	u64 idr;
> @@ -592,6 +736,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	mutex_lock(&msc->part_sel_lock);
>  	idr = mpam_msc_read_idr(msc);
>  	mutex_unlock(&msc->part_sel_lock);
> +
Stray change  - push it to earlier patch.

>  	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>  
>  	/* Use these values so partid/pmg always starts with a valid value */
> @@ -614,6 +759,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  		mutex_unlock(&mpam_list_lock);
>  		if (IS_ERR(ris))
>  			return PTR_ERR(ris);
> +		ris->idr = idr;
> +
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		mpam_ris_hw_probe(ris);
> +		mutex_unlock(&msc->part_sel_lock);
>  	}
>  
>  	spin_lock(&partid_max_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4cc44d4e21c4..5ae5d4eee8ec 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>  	raw_spin_lock_init(&msc->_mon_sel_lock);
>  }
>  
> +/*
> + * When we compact the supported features, we don't care what they are.
> + * Storing them as a bitmap makes life easy.
> + */
> +typedef u16 mpam_features_t;

Maybe use a bitmap type and avoid the need to be careful on sizing etc?

> +
> +/* Bits for mpam_features_t */
> +enum mpam_device_features {
> +	mpam_feat_ccap_part = 0,
> +	mpam_feat_cpor_part,
> +	mpam_feat_mbw_part,
> +	mpam_feat_mbw_min,
> +	mpam_feat_mbw_max,
> +	mpam_feat_mbw_prop,
> +	mpam_feat_msmon,
> +	mpam_feat_msmon_csu,
> +	mpam_feat_msmon_csu_capture,
> +	mpam_feat_msmon_csu_hw_nrdy,
> +	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_capture,
> +	mpam_feat_msmon_mbwu_rwbw,
> +	mpam_feat_msmon_mbwu_hw_nrdy,
> +	mpam_feat_msmon_capt,
> +	MPAM_FEATURE_LAST,

If it's always meant to be LAST, I'd drop the trailing comma.

> +};
> +static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);





^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-10 20:42 ` [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
  2025-09-11 15:24   ` Jonathan Cameron
@ 2025-09-11 15:31   ` Ben Horgan
  2025-09-29 17:44     ` James Morse
  2025-10-05  0:09   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-11 15:31 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms
> where MSC are not accesible from every CPU. This makes an irqsave
> spinlock the obvious lock to protect these registers. On systems with SCMI
> mailboxes it must be able to sleep, meaning a mutex must be used. The
> SCMI platforms can't support an overflow interrupt.
> 
> Clearly these two can't exist for one MSC at the same time.
> 
> Add helpers for the MON_SEL locking. The outer lock must be taken in a
> pre-emptible context before the inner lock can be taken. On systems with
> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
> will fail to be 'taken' if the caller is unable to sleep. This will allow
> callers to fail without having to explicitly check the interface type of
> each MSC.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Change since v1:
>  * Made accesses to outer_lock_held READ_ONCE() for torn values in the failure
>    case.
> ---
>  drivers/resctrl/mpam_devices.c  |  3 +--
>  drivers/resctrl/mpam_internal.h | 37 +++++++++++++++++++++++++++++----
>  2 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 24dc81c15ec8..a26b012452e2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -748,8 +748,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  
>  		mutex_init(&msc->probe_lock);
>  		mutex_init(&msc->part_sel_lock);
> -		mutex_init(&msc->outer_mon_sel_lock);
> -		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		mpam_mon_sel_lock_init(msc);
>  		msc->id = pdev->id;
>  		msc->pdev = pdev;
>  		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 828ce93c95d5..4cc44d4e21c4 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -70,12 +70,17 @@ struct mpam_msc {
>  
>  	/*
>  	 * mon_sel_lock protects access to the MSC hardware registers that are
> -	 * affected by MPAMCFG_MON_SEL.
> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> +	 * Access to mon_sel is needed from both process and interrupt contexts,
> +	 * but is complicated by firmware-backed platforms that can't make any
> +	 * access unless they can sleep.
> +	 * Always use the mpam_mon_sel_lock() helpers.
> +	 * Accessed to mon_sel need to be able to fail if they occur in the wrong
> +	 * context.
>  	 * If needed, take msc->probe_lock first.
>  	 */
> -	struct mutex		outer_mon_sel_lock;
> -	raw_spinlock_t		inner_mon_sel_lock;
> -	unsigned long		inner_mon_sel_flags;
> +	raw_spinlock_t		_mon_sel_lock;
> +	unsigned long		_mon_sel_flags;
>  

These stale variables can be removed in the patch that introduced them,
outer_mon_sel_lock, inner_mon_sel_lock, inner_mon_sel_flags. Jonathan
has already pointed out the stale comment and paragraph in the commit
message.

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-10 20:42 ` [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
  2025-09-11 15:29   ` Jonathan Cameron
@ 2025-09-11 15:37   ` Ben Horgan
  2025-09-29 17:45     ` James Morse
  2025-10-05  0:53   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-11 15:37 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> Expand the probing support with the control and monitor types
> we can use with resctrl.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * added an underscore to a variable name.
> 
> Changes since RFC:
>  * Made mpam_ris_hw_probe_hw_nrdy() more in C.
>  * Added static assert on features bitmap size.
> ---
>  drivers/resctrl/mpam_devices.c  | 151 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  53 +++++++++++
>  2 files changed, 204 insertions(+)
> 
[snip]
>  
>  	spin_lock(&partid_max_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4cc44d4e21c4..5ae5d4eee8ec 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>  	raw_spin_lock_init(&msc->_mon_sel_lock);
>  }
>  
> +/*
> + * When we compact the supported features, we don't care what they are.
> + * Storing them as a bitmap makes life easy.
> + */
> +typedef u16 mpam_features_t;
> +
> +/* Bits for mpam_features_t */
> +enum mpam_device_features {
> +	mpam_feat_ccap_part = 0,
> +	mpam_feat_cpor_part,
> +	mpam_feat_mbw_part,
> +	mpam_feat_mbw_min,
> +	mpam_feat_mbw_max,
> +	mpam_feat_mbw_prop,
> +	mpam_feat_msmon,
> +	mpam_feat_msmon_csu,
> +	mpam_feat_msmon_csu_capture,
> +	mpam_feat_msmon_csu_hw_nrdy,
> +	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_capture,
> +	mpam_feat_msmon_mbwu_rwbw,
> +	mpam_feat_msmon_mbwu_hw_nrdy,
> +	mpam_feat_msmon_capt,
> +	MPAM_FEATURE_LAST,
> +};

I added a garbled comment about this for v1. What I was trying to say is
that I don't think this quite matches what resctrl supports. For
instance, I don't think mpam_feat_ccap_part matches a resctrl feature.

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-10 20:43 ` [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-09-11 15:46   ` Ben Horgan
  2025-09-12 15:08     ` Ben Horgan
  2025-10-06 15:59     ` James Morse
  2025-09-12 13:21   ` Jonathan Cameron
  2025-09-25  2:30   ` Fenghua Yu
  2 siblings, 2 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-11 15:46 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:43, James Morse wrote:
> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
> 
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
> 
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Added XCL support.
>  * Merged FLT/CTL constants.
>  * a spelling mistake in a comment.
>  * moved structrues around.
> ---
>  drivers/resctrl/mpam_devices.c  | 226 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  19 +++
>  2 files changed, 245 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index cf190f896de1..1543c33c5d6a 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -898,6 +898,232 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	return 0;
>  }
>  
> +struct mon_read {
> +	struct mpam_msc_ris		*ris;
> +	struct mon_cfg			*ctx;
> +	enum mpam_device_features	type;
> +	u64				*val;
> +	int				err;
> +};
> +
> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				   u32 *flt_val)
> +{
> +	struct mon_cfg *ctx = m->ctx;
> +
> +	/*
> +	 * For CSU counters its implementation-defined what happens when not
> +	 * filtering by partid.
> +	 */
> +	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
> +
> +	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
> +	if (m->ctx->match_pmg) {
> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
> +		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
> +	}
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
> +			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
> +					       ctx->csu_exclude_clean);
> +
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
> +			*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
> +
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				    u32 *flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +/* Remove values set by the hardware to prevent apparent mismatches. */
> +static void clean_msmon_ctl_val(u32 *cur_ctl)
> +{
> +	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +}
> +
> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> +				     u32 flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	/*
> +	 * Write the ctl_val with the enable bit cleared, reset the counter,
> +	 * then enable counter.
> +	 */
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, CSU, 0);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +/* Call with MSC lock held */
> +static void __ris_msmon_read(void *arg)
> +{
> +	u64 now;
> +	bool nrdy = false;
> +	struct mon_read *m = arg;
> +	struct mon_cfg *ctx = m->ctx;
> +	struct mpam_msc_ris *ris = m->ris;
> +	struct mpam_props *rprops = &ris->props;
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> +
> +	if (!mpam_mon_sel_lock(msc)) {
> +		m->err = -EIO;
> +		return;
> +	}
> +	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
> +		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +	/*
> +	 * Read the existing configuration to avoid re-writing the same values.
> +	 * This saves waiting for 'nrdy' on subsequent reads.
> +	 */
> +	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
> +	clean_msmon_ctl_val(&cur_ctl);
> +	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> +	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> +		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		now = mpam_read_monsel_reg(msc, CSU);
> +		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		now = mpam_read_monsel_reg(msc, MBWU);
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	default:
> +		m->err = -EINVAL;
> +		break;
> +	}
> +	mpam_mon_sel_unlock(msc);
> +
> +	if (nrdy) {
> +		m->err = -EBUSY;
> +		return;
> +	}
> +
> +	now = FIELD_GET(MSMON___VALUE, now);
> +	*m->val += now;
> +}
> +
> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> +	int err, idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {

This can be list_for_each_entry_srcu(). (I thought I'd already commented
but turns out that was on another patch.)

> +		msc = vmsc->msc;
> +
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {

Also here.

[...]

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
  2025-09-11 14:22   ` Jonathan Cameron
@ 2025-09-11 16:30   ` Markus Elfring
  2025-09-26 17:52     ` James Morse
  2025-10-03 16:54   ` Fenghua Yu
  2025-10-06 23:13   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Markus Elfring @ 2025-09-11 16:30 UTC (permalink / raw)
  To: James Morse, linux-acpi, linux-arm-kernel
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang, Ben Horgan,
	Carl Worth, Catalin Marinas, D Scott Phillips, Danilo Krummrich,
	Dave Martin, David Hildenbrand, Drew Fustini, Fenghua Yu,
	Greg Kroah-Hartman, Hanjun Guo, Jamie Iles, Jonathan Cameron,
	Koba Ko, Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rob Herring, Rohit Mathew, Shanker Donthineni,
	Sudeep Holla, Shaopeng Tan, Wang ShaoBo, Will Deacon, Xin Hao

…
> +++ b/drivers/resctrl/mpam_devices.c
…
> > +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id)
> +{
> +	int err;
> +
> +	mutex_lock(&mpam_list_lock);
> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> +				     component_id);
> +	mutex_unlock(&mpam_list_lock);
…

Under which circumstances would you become interested to apply a statement
like “guard(mutex)(&mpam_list_lock);”?
https://elixir.bootlin.com/linux/v6.17-rc5/source/include/linux/mutex.h#L228

Regards,
Markus


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-09-10 20:42 ` [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
  2025-09-11 15:00   ` Jonathan Cameron
@ 2025-09-12  7:33   ` Markus Elfring
  2025-10-06 23:25   ` Gavin Shan
  2 siblings, 0 replies; 200+ messages in thread
From: Markus Elfring @ 2025-09-12  7:33 UTC (permalink / raw)
  To: James Morse, linux-acpi, linux-arm-kernel
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang, Ben Horgan,
	Carl Worth, Catalin Marinas, D Scott Phillips, Danilo Krummrich,
	Dave Martin, David Hildenbrand, Drew Fustini, Fenghua Yu,
	Greg Kroah-Hartman, Hanjun Guo, Jamie Iles, Jonathan Cameron,
	Koba Ko, Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rob Herring, Rohit Mathew, Shanker Donthineni,
	Shaopeng Tan, Sudeep Holla, Wang ShaoBo, Will Deacon, Xin Hao

…
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -152,4 +152,271 @@ extern struct list_head mpam_classes;
…
> +/* Error conditions in accessing memory mapped registers */
> +#define MPAM_ERRCODE_NONE			0
> +#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
…
> +#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
…

How do you think about to use an enumeration for such a value collection?
(Is there a need to extend implementation details in similar ways
at further source code places?)

Regards,
Markus


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-09-10 20:42 ` [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-09-12 10:14   ` Ben Horgan
  2025-10-02  5:06   ` Fenghua Yu
  2025-10-03  0:32   ` Gavin Shan
  2 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 10:14 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it. As MPAM is only found on arm64
> platforms, the arm64 tree is the most natural home for the Kconfig
> option.
> 
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and to register properties
> of CPUs with the MPAM driver.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> CC: Dave Martin <dave.martin@arm.com>
> ---
> Changes since v1:
>  * Help text rewritten by Dave.
> ---
>  arch/arm64/Kconfig | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a6..4be8a13505bf 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2060,6 +2060,29 @@ config ARM64_TLB_RANGE
>  	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>  	  range of input addresses.
>  
> +config ARM64_MPAM
> +	bool "Enable support for MPAM"
> +	help
> +	  Memory System Resource Partitioning and Monitoring (MPAM) is an
> +	  optional extension to the Arm architecture that allows each
> +	  transaction issued to the memory system to be labelled with a
> +	  Partition identifier (PARTID) and Performance Monitoring Group
> +	  identifier (PMG).
> +
> +	  Memory system components, such as the caches, can be configured with
> +	  policies to control how much of various physical resources (such as
> +	  memory bandwidth or cache memory) the transactions labelled with each
> +	  PARTID can consume.  Depending on the capabilities of the hardware,
> +	  the PARTID and PMG can also be used as filtering criteria to measure
> +	  the memory system resource consumption of different parts of a
> +	  workload.
> +
> +	  Use of this extension requires CPU support, support in the
> +	  Memory System Components (MSC), and a description from firmware
> +	  of where the MSCs are in the address space.
> +
> +	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
> +
>  endmenu # "ARMv8.4 architectural features"
>  
>  menu "ARMv8.5 architectural features"

Seems good to me. I guess we can consider separately whether we want
this to be default or not.

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
  2025-09-11 15:07   ` Jonathan Cameron
@ 2025-09-12 10:42   ` Ben Horgan
  2025-09-29 17:44     ` James Morse
  2025-10-03 17:56   ` Fenghua Yu
  2025-10-06 23:42   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 10:42 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen

Hi James,

On 9/10/25 21:42, James Morse wrote:
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
> 
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed as each CPU's online call is made.
> 
> This adds the low-level MSC register accessors.
> 
> Once all MSCs reported by the firmware have been probed from a CPU in
> their respective cpu-affinity set, the probe-time cpuhp callbacks are
> replaced.  The replacement callbacks will ultimately need to handle
> save/restore of the runtime MSC state across power transitions, but for
> now there is nothing to do in them: so do nothing.
> 
> The architecture's context switch code will be enabled by a static-key,
> this can be set by mpam_enable(), but must be done from process context,
> not a cpuhp callback because both take the cpuhp lock.
> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
> to test if all the MSCs have been probed. If probing fails, mpam_disable()
> is scheduled to unregister the cpuhp callbacks and free memory.
> 
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Removed register bounds check. If the firmware tables are wrong the
>    resulting translation fault should be enough to debug this.
>  * Removed '&' in front of a function pointer.
>  * Pulled mpam_disable() into this patch.
>  * Disable mpam when probing fails to avoid extra work on broken platforms.
>  * Added mpam_disbale_reason as there are now two non-debug reasons for this
>    to happen.

Looks good to me.

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-09-10 20:42 ` [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
  2025-09-11 15:18   ` Jonathan Cameron
@ 2025-09-12 11:11   ` Ben Horgan
  2025-09-29 17:44     ` James Morse
  2025-10-03 18:58   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 11:11 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may also have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> The limits from requestors (e.g. CPUs) also need taking into account.
> 
> While doing this, RIS entries that firmware didn't describe are created
> under MPAM_CLASS_UNKNOWN.
> 
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Change to lock ordering now that the list-lock mutex isn't held from
>    the cpuhp call.
>  * Removed irq-unmaksed assert in requestor register.
>  * Changed captialisation in print message.
> ---
>  drivers/resctrl/mpam_devices.c  | 150 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |   6 ++
>  include/linux/arm_mpam.h        |  14 +++
>  3 files changed, 169 insertions(+), 1 deletion(-)

Looks good to me. I think Jonathan's comment on getting rid of the local
variable, 'found', is worthwhile.

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-10 20:42 ` [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-09-12 11:25   ` Ben Horgan
  2025-09-12 14:52     ` Ben Horgan
  2025-09-30 17:06     ` James Morse
  2025-09-12 11:55   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 11:25 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> When a CPU comes online, it may bring a newly accessible MSC with
> it. Only the default partid has its value reset by hardware, and
> even then the MSC might not have been reset since its config was
> previously dirtyied. e.g. Kexec.
> 
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
> 
> MSC are also reset when CPUs are taken offline to cover cases where
> firmware doesn't reset the MSC over reboot using UEFI, or kexec
> where there is no firmware involvement.
> 
> If the configuration for a RIS has not been touched since it was
> brought online, it does not need resetting again.
> 
> To reset, write the maximum values for all discovered controls.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Last bitmap write will always be non-zero.
>  * Dropped READ_ONCE() - teh value can no longer change.
>  * Write 0 to proporitional stride, remove the bwa_fract variable.
>  * Removed nested srcu lock, the assert should cover it.
> ---
>  drivers/resctrl/mpam_devices.c  | 117 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |   8 +++
>  2 files changed, 125 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index cd8e95fa5fd6..0353313cf284 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -7,6 +7,7 @@
>  #include <linux/atomic.h>
>  #include <linux/arm_mpam.h>
>  #include <linux/bitfield.h>
> +#include <linux/bitmap.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -777,8 +778,110 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	return 0;
>  }
>  
> +static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> +{
> +	u32 num_words, msb;
> +	u32 bm = ~0;
> +	int i;
> +
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	if (wd == 0)
> +		return;
> +
> +	/*
> +	 * Write all ~0 to all but the last 32bit-word, which may
> +	 * have fewer bits...
> +	 */
> +	num_words = DIV_ROUND_UP(wd, 32);
> +	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
> +		__mpam_write_reg(msc, reg, bm);
> +
> +	/*
> +	 * ....and then the last (maybe) partial 32bit word. When wd is a
> +	 * multiple of 32, msb should be 31 to write a full 32bit word.
> +	 */
> +	msb = (wd - 1) % 32;
> +	bm = GENMASK(msb, 0);
> +	__mpam_write_reg(msc, reg, bm);
> +}
> +
> +static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +{
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +	struct mpam_props *rprops = &ris->props;
> +
> +	mpam_assert_srcu_read_lock_held();
> +
> +	mutex_lock(&msc->part_sel_lock);
> +	__mpam_part_sel(ris->ris_idx, partid, msc);
> +
> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> +		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> +		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> +		mpam_write_partsel_reg(msc, MBW_MIN, 0);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> +		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
> +		mpam_write_partsel_reg(msc, MBW_PROP, 0);

If mpam_feat_ccap_part is already in enum mpam_device_features then the
reset would belong here but I expect it is better just to introduce
mpam_feat_ccap_part later (patch 21). I also commented on this feature
introduction split on patch 13.

> +	mutex_unlock(&msc->part_sel_lock);
> +}
> +
> +static void mpam_reset_ris(struct mpam_msc_ris *ris)
> +{
> +	u16 partid, partid_max;
> +
> +	mpam_assert_srcu_read_lock_held();
> +
> +	if (ris->in_reset_state)
> +		return;
> +
> +	spin_lock(&partid_max_lock);
> +	partid_max = mpam_partid_max;
> +	spin_unlock(&partid_max_lock);
> +	for (partid = 0; partid < partid_max; partid++)
> +		mpam_reset_ris_partid(ris, partid);
> +}
> +
> +static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> +{
> +	struct mpam_msc_ris *ris;
> +
> +	mpam_assert_srcu_read_lock_held();

Unneeded? Checked in list_for_each_entry_srcu().> +
> +	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
> +		mpam_reset_ris(ris);
> +
> +		/*
> +		 * Set in_reset_state when coming online. The reset state
> +		 * for non-zero partid may be lost while the CPUs are offline.
> +		 */
> +		ris->in_reset_state = online;
> +	}
> +}
> +
>  static int mpam_cpu_online(unsigned int cpu)
>  {
> +	int idx;
> +	struct mpam_msc *msc;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		if (atomic_fetch_inc(&msc->online_refs) == 0)
> +			mpam_reset_msc(msc, true);
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
>  	return 0;
>  }
>  
> @@ -818,6 +921,20 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
>  
>  static int mpam_cpu_offline(unsigned int cpu)
>  {
> +	int idx;
> +	struct mpam_msc *msc;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		if (atomic_dec_and_test(&msc->online_refs))
> +			mpam_reset_msc(msc, false);
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index eace5ba871f3..6e047fbd3512 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -5,6 +5,7 @@
>  #define MPAM_INTERNAL_H
>  
>  #include <linux/arm_mpam.h>
> +#include <linux/atomic.h>
>  #include <linux/cpumask.h>
>  #include <linux/io.h>
>  #include <linux/llist.h>
> @@ -45,6 +46,7 @@ struct mpam_msc {
>  	struct pcc_mbox_chan	*pcc_chan;
>  	u32			nrdy_usec;
>  	cpumask_t		accessibility;
> +	atomic_t		online_refs;
>  
>  	/*
>  	 * probe_lock is only taken during discovery. After discovery these
> @@ -223,6 +225,7 @@ struct mpam_msc_ris {
>  	u8			ris_idx;
>  	u64			idr;
>  	struct mpam_props	props;
> +	bool			in_reset_state;
>  
>  	cpumask_t		affinity;
>  
> @@ -242,6 +245,11 @@ struct mpam_msc_ris {
>  extern struct srcu_struct mpam_srcu;
>  extern struct list_head mpam_classes;
>  
> +static inline void mpam_assert_srcu_read_lock_held(void)
> +{
> +	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
> +}
> +
>  /* System wide partid/pmg values */
>  extern u16 mpam_partid_max;
>  extern u8 mpam_pmg_max;
Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-10 20:42 ` [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-09-12 11:42   ` Ben Horgan
  2025-10-02 18:02     ` James Morse
  2025-09-12 12:02   ` Jonathan Cameron
  2025-09-25  7:16   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 11:42 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * more complete use of _srcu helpers.
>  * Use guard macro for srcu.
>  * Dropped a might_sleep() - something else will bark.
> ---
>  drivers/resctrl/mpam_devices.c | 56 ++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index e7faf453b5d7..a9d3c4b09976 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -842,8 +842,6 @@ static int mpam_reset_ris(void *arg)
>  	u16 partid, partid_max;
>  	struct mpam_msc_ris *ris = arg;
>  
> -	mpam_assert_srcu_read_lock_held();
> -

Remove where it is introduced. There is already one in
mpam_reset_ris_partid() at that time.

>  	if (ris->in_reset_state)
>  		return 0;

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-09-10 20:42 ` [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-09-12 11:49   ` Jonathan Cameron
  2025-09-29 17:45     ` James Morse
  2025-10-05  1:28   ` Fenghua Yu
  1 sibling, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 11:49 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:42:54 +0000
James Morse <james.morse@arm.com> wrote:

> To make a decision about whether to expose an mpam class as
> a resctrl resource we need to know its overall supported
> features and properties.
> 
> Once we've probed all the resources, we can walk the tree
> and produce overall values by merging the bitmaps. This
> eliminates features that are only supported by some MSC
> that make up a component or class.
> 
> If bitmap properties are mismatched within a component we
> cannot support the mismatched feature.
> 
> Care has to be taken as vMSC may hold mismatched RIS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

A trivial things inline.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> +/*
> + * Combine two props fields.
> + * If this is for controls that alias the same resource, it is safe to just
> + * copy the values over. If two aliasing controls implement the same scheme
> + * a safe value must be picked.
> + * For non-aliasing controls, these control different resources, and the
> + * resulting safe value must be compatible with both. When merging values in
> + * the tree, all the aliasing resources must be handled first.
> + * On mismatch, parent is modified.
> + */
> +static void __props_mismatch(struct mpam_props *parent,
> +			     struct mpam_props *child, bool alias)
> +{
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
> +		parent->cpbm_wd = child->cpbm_wd;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
> +				   cpbm_wd, alias)) {
> +		pr_debug("%s cleared cpor_part\n", __func__);
> +		mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
> +		parent->cpbm_wd = 0;
> +	}
> +
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
> +		parent->mbw_pbm_bits = child->mbw_pbm_bits;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
> +				   mbw_pbm_bits, alias)) {
> +		pr_debug("%s cleared mbw_part\n", __func__);
> +		mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
> +		parent->mbw_pbm_bits = 0;
> +	}
> +
> +	/* bwa_wd is a count of bits, fewer bits means less precision */
> +	if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {

Seems like an overly long line given other local wrapping.

> +		parent->bwa_wd = child->bwa_wd;
> +	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
> +				     bwa_wd, alias)) {
> +		pr_debug("%s took the min bwa_wd\n", __func__);
> +		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
> +	}
> +
> +	/* For num properties, take the minimum */
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
> +		parent->num_csu_mon = child->num_csu_mon;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
> +				   num_csu_mon, alias)) {
> +		pr_debug("%s took the min num_csu_mon\n", __func__);
> +		parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
> +	}
> +
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
> +		parent->num_mbwu_mon = child->num_mbwu_mon;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
> +				   num_mbwu_mon, alias)) {
> +		pr_debug("%s took the min num_mbwu_mon\n", __func__);
> +		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
> +	}

> +
> +/*
> + * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
> + * For 'num' properties we can just take the minimum.
> + * For properties where the mismatched unused bits would make a difference, we
> + * nobble the class feature, as we can't configure all the resources.
> + * e.g. The L3 cache is composed of two resources with 13 and 17 portion
> + * bitmaps respectively.
> + */
> +static void
> +__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)

I'm not really sure what the __ prefix denotes here.

> +{
> +	struct mpam_props *cprops = &class->props;
> +	struct mpam_props *vprops = &vmsc->props;
> +
> +	lockdep_assert_held(&mpam_list_lock); /* we modify class */
> +
> +	pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
> +		 dev_name(&vmsc->msc->pdev->dev),
> +		 (long)cprops->features, (long)vprops->features);

According to https://docs.kernel.org/core-api/printk-formats.html
should be fine using %x for u16 values. So why dance through a cast to long?
> +
> +	/* Take the safe value for any common features */
> +	__props_mismatch(cprops, vprops, false);
> +}
> +
> +static void
> +__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
> +{
> +	struct mpam_props *rprops = &ris->props;
> +	struct mpam_props *vprops = &vmsc->props;
> +
> +	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
> +
> +	pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
> +		 dev_name(&vmsc->msc->pdev->dev),
> +		 (long)vprops->features, (long)rprops->features);

Same as above comment on casts being unnecessary.

> +
> +	/*
> +	 * Merge mismatched features - Copy any features that aren't common,
> +	 * but take the safe value for any common features.
> +	 */
> +	__props_mismatch(vprops, rprops, true);
> +}
> +
> +/*
> + * Copy the first component's first vMSC's properties and features to the
> + * class. __class_props_mismatch() will remove conflicts.
> + * It is not possible to have a class with no components, or a component with
> + * no resources. The vMSC properties have already been built.

If it's not possible do we need the defensive _or_null and error checks?

> + */
> +static void mpam_enable_init_class_features(struct mpam_class *class)
> +{
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_component *comp;
> +
> +	comp = list_first_entry_or_null(&class->components,
> +					struct mpam_component, class_list);
> +	if (WARN_ON(!comp))
> +		return;
> +
> +	vmsc = list_first_entry_or_null(&comp->vmsc,
> +					struct mpam_vmsc, comp_list);
> +	if (WARN_ON(!vmsc))
> +		return;
> +
> +	class->props = vmsc->props;
> +}

> +/*
> + * Merge all the common resource features into class.
> + * vmsc features are bitwise-or'd together, this must be done first.

I'm not sure what 'this' is here - I think it's a missing plural that has
me confused.  Perhaps 'these must be done first.'

> + * Next the class features are the bitwise-and of all the vmsc features.
> + * Other features are the min/max as appropriate.
> + *
> + * To avoid walking the whole tree twice, the class->nrdy_usec property is
> + * updated when working with the vmsc as it is a max(), and doesn't need
> + * initialising first.

Perhaps state that this comment is about what happens in each call of
mpam_enable_merge_vmsc_features()  Or move the comment to that function.

> + */
> +static void mpam_enable_merge_features(struct list_head *all_classes_list)
> +{
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, all_classes_list, classes_list) {
> +		list_for_each_entry(comp, &class->components, class_list)
> +			mpam_enable_merge_vmsc_features(comp);
> +
> +		mpam_enable_init_class_features(class);
> +
> +		list_for_each_entry(comp, &class->components, class_list)
> +			mpam_enable_merge_class_features(comp);
> +	}
> +}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-10 20:42 ` [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
  2025-09-12 11:25   ` Ben Horgan
@ 2025-09-12 11:55   ` Jonathan Cameron
  2025-09-30 17:06     ` James Morse
  2025-09-30  2:51   ` Shaopeng Tan (Fujitsu)
       [not found]   ` <1f084a23-7211-4291-99b6-7f5192fb9096@nvidia.com>
  3 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 11:55 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:55 +0000
James Morse <james.morse@arm.com> wrote:

> When a CPU comes online, it may bring a newly accessible MSC with
> it. Only the default partid has its value reset by hardware, and
> even then the MSC might not have been reset since its config was
> previously dirtyied. e.g. Kexec.
> 
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
> 
> MSC are also reset when CPUs are taken offline to cover cases where
> firmware doesn't reset the MSC over reboot using UEFI, or kexec
> where there is no firmware involvement.
> 
> If the configuration for a RIS has not been touched since it was
> brought online, it does not need resetting again.
> 
> To reset, write the maximum values for all discovered controls.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
Just one trivial passing comment from me.

Jonathan

> ---
> Changes since RFC:
>  * Last bitmap write will always be non-zero.
>  * Dropped READ_ONCE() - teh value can no longer change.
>  * Write 0 to proporitional stride, remove the bwa_fract variable.
>  * Removed nested srcu lock, the assert should cover it.
> ---
>  drivers/resctrl/mpam_devices.c  | 117 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |   8 +++
>  2 files changed, 125 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index cd8e95fa5fd6..0353313cf284 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c

>  
> @@ -818,6 +921,20 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
>  
>  static int mpam_cpu_offline(unsigned int cpu)
>  {
> +	int idx;
> +	struct mpam_msc *msc;
> +
> +	idx = srcu_read_lock(&mpam_srcu);

Might be worth using
guard(srcu)(&mpam_srcu);
here but only real advantage it bring is in hiding the local idx variable
away.  

> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		if (atomic_dec_and_test(&msc->online_refs))
> +			mpam_reset_msc(msc, false);
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
>  	return 0;
>  }




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-09-10 20:42 ` [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-09-12 11:57   ` Jonathan Cameron
  2025-10-01  9:50     ` James Morse
  2025-10-05 21:08   ` Fenghua Yu
  1 sibling, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 11:57 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:42:56 +0000
James Morse <james.morse@arm.com> wrote:

> Resetting RIS entries from the cpuhp callback is easy as the
> callback occurs on the correct CPU. This won't be true for any other
> caller that wants to reset or configure an MSC.
> 
> Add a helper that schedules the provided function if necessary.
> 
> Callers should take the cpuhp lock to prevent the cpuhp callbacks from
> changing the MSC state.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-10 20:42 ` [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
  2025-09-12 11:42   ` Ben Horgan
@ 2025-09-12 12:02   ` Jonathan Cameron
  2025-09-30 17:06     ` James Morse
  2025-09-25  7:16   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 12:02 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:57 +0000
James Morse <james.morse@arm.com> wrote:

> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * more complete use of _srcu helpers.
>  * Use guard macro for srcu.

I'm not seeing a strong reason for doing this for the case here and not
for cases in earlier patches like in mpam_cpu_online()  I'm a fan of using
these broadly in a given code base, so would guard(srcu) in those earlier patches
as well.

Anyhow, one other trivial thing inline that you can ignore or not as you wish.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


>  * Dropped a might_sleep() - something else will bark.
> ---
>  drivers/resctrl/mpam_devices.c | 56 ++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index e7faf453b5d7..a9d3c4b09976 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -842,8 +842,6 @@ static int mpam_reset_ris(void *arg)
>  	u16 partid, partid_max;
>  	struct mpam_msc_ris *ris = arg;
>  
> -	mpam_assert_srcu_read_lock_held();
> -
>  	if (ris->in_reset_state)
>  		return 0;
>  
> @@ -1340,8 +1338,56 @@ static void mpam_enable_once(void)
>  	       mpam_partid_max + 1, mpam_pmg_max + 1);
>  }
>  
> +static void mpam_reset_component_locked(struct mpam_component *comp)
> +{
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		msc = vmsc->msc;

Might be worth reducing scope of msc and ris

> +
> +		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
> +					 srcu_read_lock_held(&mpam_srcu)) {
> +			if (!ris->in_reset_state)
> +				mpam_touch_msc(msc, mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +		}
> +	}
> +}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
@ 2025-09-12 12:12   ` Jonathan Cameron
  2025-10-02 18:02     ` James Morse
  2025-09-12 14:40   ` Ben Horgan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 12:12 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:58 +0000
James Morse <james.morse@arm.com> wrote:

> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the ESR register, so no locking is needed.
> The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning
> it can't be done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
A few comments inline.


> @@ -1318,11 +1405,172 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
>  	}
>  }
>  
> +static char *mpam_errcode_names[16] = {
> +	[0] = "No error",

I think you had a bunch of defines for these in an earlier patch.  Can we use
that to index here instead of [0] etc. 

> +	[1] = "PARTID_SEL_Range",
> +	[2] = "Req_PARTID_Range",
> +	[3] = "MSMONCFG_ID_RANGE",
> +	[4] = "Req_PMG_Range",
> +	[5] = "Monitor_Range",
> +	[6] = "intPARTID_Range",
> +	[7] = "Unexpected_INTERNAL",
> +	[8] = "Undefined_RIS_PART_SEL",
> +	[9] = "RIS_No_Control",
> +	[10] = "Undefined_RIS_MON_SEL",
> +	[11] = "RIS_No_Monitor",
> +	[12 ... 15] = "Reserved"
> +};


> +static void mpam_unregister_irqs(void)
> +{
> +	int irq, idx;
> +	struct mpam_msc *msc;
> +
> +	cpus_read_lock();

	guard(cpus_read_lock)();
	guard(srcu)(&mpam_srcu);

> +	/* take the lock as free_irq() can sleep */
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags))
> +			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
> +
> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags)) {
> +			if (irq_is_percpu(irq)) {
> +				msc->reenable_error_ppi = 0;
> +				free_percpu_irq(irq, msc->error_dev_id);
> +			} else {
> +				devm_free_irq(&msc->pdev->dev, irq, msc);
> +			}
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +	cpus_read_unlock();
> +}
> +
>  static void mpam_enable_once(void)
>  {
> -	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> -	mutex_unlock(&mpam_list_lock);
> +	int err;
>  
>  	/*
>  	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> @@ -1332,6 +1580,27 @@ static void mpam_enable_once(void)
>  	partid_max_published = true;
>  	spin_unlock(&partid_max_lock);
>  
> +	/*
> +	 * If all the MSC have been probed, enabling the IRQs happens next.
> +	 * That involves cross-calling to a CPU that can reach the MSC, and
> +	 * the locks must be taken in this order:
> +	 */
> +	cpus_read_lock();
> +	mutex_lock(&mpam_list_lock);
> +	mpam_enable_merge_features(&mpam_classes);
> +
> +	err = mpam_register_irqs();
> +	if (err)
> +		pr_warn("Failed to register irqs: %d\n", err);

Perhaps move the print into the if (err) below?

> +
> +	mutex_unlock(&mpam_list_lock);
> +	cpus_read_unlock();
> +
> +	if (err) {
> +		schedule_work(&mpam_broken_work);
> +		return;
> +	}

> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 6e047fbd3512..f04a9ef189cf 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -32,6 +32,10 @@ struct mpam_garbage {
>  	struct platform_device	*pdev;
>  };
>  
> +/* Bit positions for error_irq_flags */
> +#define	MPAM_ERROR_IRQ_REQUESTED  0
> +#define	MPAM_ERROR_IRQ_HW_ENABLED 1

If there aren't going to be load more of these (I've not really thought
about whether there might) then using a bitmap for these seems to add complexity
that we wouldn't see with 
bool error_irq_req;
bool error_irq_hw_enabled;


> +
>  struct mpam_msc {
>  	/* member of mpam_all_msc */
>  	struct list_head        all_msc_list;
> @@ -46,6 +50,11 @@ struct mpam_msc {
>  	struct pcc_mbox_chan	*pcc_chan;
>  	u32			nrdy_usec;
>  	cpumask_t		accessibility;
> +	bool			has_extd_esr;
> +
> +	int				reenable_error_ppi;
> +	struct mpam_msc * __percpu	*error_dev_id;
> +
>  	atomic_t		online_refs;
>  
>  	/*
> @@ -54,6 +63,7 @@ struct mpam_msc {
>  	 */
>  	struct mutex		probe_lock;
>  	bool			probed;
> +	unsigned long		error_irq_flags;
>  	u16			partid_max;
>  	u8			pmg_max;
>  	unsigned long		ris_idxs;



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-10 20:42 ` [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-09-12 12:13   ` Jonathan Cameron
  2025-10-03 18:03     ` James Morse
  2025-09-12 14:42   ` Ben Horgan
  2025-09-26  2:31   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 12:13 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:42:59 +0000
James Morse <james.morse@arm.com> wrote:

> Once all the MSC have been probed, the system wide usable number of
> PARTID is known and the configuration arrays can be allocated.
> 
> After this point, checking all the MSC have been probed is pointless,
> and the cpuhp callbacks should restore the configuration, instead of
> just resetting the MSC.
> 
> Add a static key to enable this behaviour. This will also allow MPAM
> to be disabled in repsonse to an error, and the architecture code to
> enable/disable the context switch of the MPAM system registers.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Seems fine to me (other than the TODO move to arch code
that should probably be resolved.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-09-10 20:43 ` [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-09-12 12:22   ` Jonathan Cameron
  2025-10-07 11:11     ` James Morse
  2025-09-12 15:00   ` Ben Horgan
  2025-09-25  6:53   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 12:22 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:43:00 +0000
James Morse <james.morse@arm.com> wrote:

> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate
> a configuration array for each component, and reprogram each RIS
> configuration from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
Trivial comments
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> +
> +static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
> +{
> +	memset(reset_cfg, 0, sizeof(*reset_cfg));

Might as well do the following and skip the memset.

	*reset_cfg = (struct mpam_config) {
		.features = ~0,
		.cpbm = ~0,
		.mbw_pbm = ~0,
		.mbw_max = MPAM...
		.reset_cpbm = true,
		.reset_mbw_pbm = true,
	};
> +
> +	reset_cfg->features = ~0;
> +	reset_cfg->cpbm = ~0;
> +	reset_cfg->mbw_pbm = ~0;
> +	reset_cfg->mbw_max = MPAMCFG_MBW_MAX_MAX;
> +
> +	reset_cfg->reset_cpbm = true;
> +	reset_cfg->reset_mbw_pbm = true;
> +}

> +static int mpam_allocate_config(void)
> +{
> +	int err = 0;

Always set before use. Maybe push down so it is in tighter scope and
can declare and initialize to final value in one line.

> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		list_for_each_entry(comp, &class->components, class_list) {
> +			err = __allocate_component_cfg(comp);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return 0;
> +}


> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index b69fa9199cb4..17570d9aae9b 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -169,11 +169,7 @@ struct mpam_props {
>  	u16			num_mbwu_mon;
>  };
>  
> -static inline bool mpam_has_feature(enum mpam_device_features feat,
> -				    struct mpam_props *props)
> -{
> -	return (1 << feat) & props->features;
> -}
> +#define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)

If this is worth doing push it back to original introduction.
I'm not sure it is necessary.

Jonathan





^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features
  2025-09-10 20:43 ` [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-09-12 13:07   ` Jonathan Cameron
  2025-10-03 18:05     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:07 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Zeng Heng

On Wed, 10 Sep 2025 20:43:01 +0000
James Morse <james.morse@arm.com> wrote:

> MPAM supports more features than are going to be exposed to resctrl.
> For partid other than 0, the reset values of these controls isn't
> known.
> 
> Discover the rest of the features so they can be reset to avoid any
> side effects when resctrl is in use.
> 
> PARTID narrowing allows MSC/RIS to support less configuration space than
> is usable. If this feature is found on a class of device we are likely
> to use, then reduce the partid_max to make it usable. This allows us
> to map a PARTID to itself.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> CC: Zeng Heng <zengheng4@huawei.com>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>

A few trivial things inline.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

>  int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>  {
>  	int err = 0;
> @@ -667,10 +676,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  	struct mpam_msc *msc = ris->vmsc->msc;
>  	struct device *dev = &msc->pdev->dev;
>  	struct mpam_props *props = &ris->props;
> +	struct mpam_class *class = ris->vmsc->comp->class;
>  
>  	lockdep_assert_held(&msc->probe_lock);
>  	lockdep_assert_held(&msc->part_sel_lock);
>  
> +	/* Cache Capacity Partitioning */
> +	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
> +		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
> +
> +		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
> +		if (props->cmax_wd &&
> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_softlim, props);
> +
> +		if (props->cmax_wd &&
> +		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_cmax, props);
> +
> +		if (props->cmax_wd &&
> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_cmin, props);
> +
> +		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
> +
Trivial but blank line here feels inconsistent with local style. I'd drop it.
> +		if (props->cassoc_wd &&
> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_cassoc, props);
> +	}
> +

> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 17570d9aae9b..326ba9114d70 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -136,25 +136,34 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>   * When we compact the supported features, we don't care what they are.
>   * Storing them as a bitmap makes life easy.
>   */
> -typedef u16 mpam_features_t;
> +typedef u32 mpam_features_t;

This is strengthening my view that this should just be a DECLARE_BITMAP(MPAM_FEATURE_LAST)
in the appropriate places.

>  
>  /* Bits for mpam_features_t */
>  enum mpam_device_features {
> -	mpam_feat_ccap_part = 0,
> +	mpam_feat_cmax_softlim,
> +	mpam_feat_cmax_cmax,
> +	mpam_feat_cmax_cmin,
> +	mpam_feat_cmax_cassoc,
>  	mpam_feat_cpor_part,
>  	mpam_feat_mbw_part,
>  	mpam_feat_mbw_min,
>  	mpam_feat_mbw_max,
>  	mpam_feat_mbw_prop,
> +	mpam_feat_intpri_part,
> +	mpam_feat_intpri_part_0_low,
> +	mpam_feat_dspri_part,
> +	mpam_feat_dspri_part_0_low,
>  	mpam_feat_msmon,
>  	mpam_feat_msmon_csu,
>  	mpam_feat_msmon_csu_capture,
> +	mpam_feat_msmon_csu_xcl,
>  	mpam_feat_msmon_csu_hw_nrdy,
>  	mpam_feat_msmon_mbwu,
>  	mpam_feat_msmon_mbwu_capture,
>  	mpam_feat_msmon_mbwu_rwbw,
>  	mpam_feat_msmon_mbwu_hw_nrdy,
>  	mpam_feat_msmon_capt,
> +	mpam_feat_partid_nrw,
>  	MPAM_FEATURE_LAST,
>  };
>  static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
> @@ -165,6 +174,10 @@ struct mpam_props {
>  	u16			cpbm_wd;
>  	u16			mbw_pbm_bits;
>  	u16			bwa_wd;
> +	u16			cmax_wd;
> +	u16			cassoc_wd;
> +	u16			intpri_wd;
> +	u16			dspri_wd;
>  	u16			num_csu_mon;
>  	u16			num_mbwu_mon;
>  };



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors
  2025-09-10 20:43 ` [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-09-12 13:11   ` Jonathan Cameron
  2025-10-06 14:57     ` James Morse
  2025-10-06 15:56     ` James Morse
  0 siblings, 2 replies; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:11 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:43:02 +0000
James Morse <james.morse@arm.com> wrote:

> MPAM's MSC support a number of monitors, each of which supports
> bandwidth counters, or cache-storage-utilisation counters. To use
> a counter, a monitor needs to be configured. Add helpers to allocate
> and free CSU or MBWU monitors.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

One minor requested change inline that will probably otherwise get picked
up by someone's cleanup script

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/resctrl/mpam_devices.c  |  2 ++
>  drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index f536ebbcf94e..cf190f896de1 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -340,6 +340,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
>  	class->level = level_idx;
>  	class->type = type;
>  	INIT_LIST_HEAD_RCU(&class->classes_list);
> +	ida_init(&class->ida_csu_mon);
> +	ida_init(&class->ida_mbwu_mon);
>  
>  	list_add_rcu(&class->classes_list, &mpam_classes);
>  
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 326ba9114d70..81c4c2bfea3d 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -210,6 +210,9 @@ struct mpam_class {
>  	/* member of mpam_classes */
>  	struct list_head	classes_list;
>  
> +	struct ida		ida_csu_mon;
> +	struct ida		ida_mbwu_mon;
> +
>  	struct mpam_garbage	garbage;
>  };
>  
> @@ -288,6 +291,38 @@ struct mpam_msc_ris {
>  	struct mpam_garbage	garbage;
>  };
>  
> +static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> +{
> +	struct mpam_props *cprops = &class->props;
> +
> +	if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
> +		return -EOPNOTSUPP;
> +
> +	return ida_alloc_range(&class->ida_csu_mon, 0, cprops->num_csu_mon - 1,
> +			       GFP_KERNEL);
> +}
> +
> +static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
> +{
> +	ida_free(&class->ida_csu_mon, csu_mon);
> +}
> +
> +static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
> +{
> +	struct mpam_props *cprops = &class->props;
> +
> +	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
> +		return -EOPNOTSUPP;
> +
> +	return ida_alloc_range(&class->ida_mbwu_mon, 0,
> +			       cprops->num_mbwu_mon - 1, GFP_KERNEL);

ida_alloc_max() - which is just a wrapper that sets the minimum to 0
but none the less perhaps conveys things slightly better.

> +}
> +
> +static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
> +{
> +	ida_free(&class->ida_mbwu_mon, mbwu_mon);
> +}
> +
>  /* List of all classes - protected by srcu*/
>  extern struct srcu_struct mpam_srcu;
>  extern struct list_head mpam_classes;



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-10 20:43 ` [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
  2025-09-11 15:46   ` Ben Horgan
@ 2025-09-12 13:21   ` Jonathan Cameron
  2025-10-09 17:48     ` James Morse
  2025-09-25  2:30   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:21 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:43:03 +0000
James Morse <james.morse@arm.com> wrote:

> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
> 
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
> 
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
Hi James,

A couple of minor comments inline,

Jonathan

> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> +				     u32 flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	/*
> +	 * Write the ctl_val with the enable bit cleared, reset the counter,
> +	 * then enable counter.
> +	 */
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, CSU, 0);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;

Given nothing to do later, I'd just return at end of each case.
Entirely up to you though as this is just a personal style preference.

> +	default:
> +		return;
> +	}
> +}

> +
> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> +	int err, idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	idx = srcu_read_lock(&mpam_srcu);

	guard(srcu)(&mpam_srcu);

Then you can do direct returns on errors which looks like it will simplify
things somewhat by letting you just return on err.


> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
I'd bring the declaration down here as well.
		struct mpam_msc *msc = vmsc->msc;
Could bring ris down here as well.

> +
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			arg->ris = ris;
> +
> +			err = smp_call_function_any(&msc->accessibility,
> +						    __ris_msmon_read, arg,
> +						    true);
> +			if (!err && arg->err)
> +				err = arg->err;
> +			if (err)
> +				break;
> +		}
> +		if (err)
> +			break;

This won't be needed if you returned on error above.

> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
> +	return err;
And you only reach here with above changes if err == 0 so return 0; appropriate.
> +}
> +
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features type, u64 *val)
> +{
> +	int err;
> +	struct mon_read arg;
> +	u64 wait_jiffies = 0;
> +	struct mpam_props *cprops = &comp->class->props;
> +
> +	might_sleep();
> +
> +	if (!mpam_is_enabled())
> +		return -EIO;
> +
> +	if (!mpam_has_feature(type, cprops))
> +		return -EOPNOTSUPP;
> +
> +	memset(&arg, 0, sizeof(arg));
Either use = { }; at declaration or maybe
	arg = (struct mon_read) {
		.ctx = ctx,
		.type = type,
		.val = val,
	};

rather than bothering with separate memset.

> +	arg.ctx = ctx;
> +	arg.type = type;
> +	arg.val = val;
> +	*val = 0;
> +
> +	err = _msmon_read(comp, &arg);
> +	if (err == -EBUSY && comp->class->nrdy_usec)
> +		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
> +
> +	while (wait_jiffies)
> +		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
> +
> +	if (err == -EBUSY) {
> +		memset(&arg, 0, sizeof(arg));
Same as above. 
> +		arg.ctx = ctx;
> +		arg.type = type;
> +		arg.val = val;
> +		*val = 0;
> +
> +		err = _msmon_read(comp, &arg);
> +	}
> +
> +	return err;
> +}



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-09-10 20:43 ` [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-09-12 13:24   ` Jonathan Cameron
  2025-10-09 17:48     ` James Morse
  2025-09-12 15:55   ` Ben Horgan
  1 sibling, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:24 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:43:04 +0000
James Morse <james.morse@arm.com> wrote:

> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
> 
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
Trivial comment inline.  I haven't spent enough time thinking about this
to give a proper review so no tags yet.

Jonathan
 
> ---
> Changes since v1:
>  * Fixed lock/unlock typo.
> ---
>  drivers/resctrl/mpam_devices.c  | 154 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  23 +++++
>  2 files changed, 175 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 1543c33c5d6a..eeb62ed94520 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -918,6 +918,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>  	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>  
>  	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
> +
Unrelated change.  If it makes sense figure out where to push it back to.

>  	if (m->ctx->match_pmg) {
>  		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>  		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters
  2025-09-10 20:43 ` [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-09-12 13:27   ` Jonathan Cameron
  2025-10-09 17:48     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:27 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:43:05 +0000
James Morse <james.morse@arm.com> wrote:

> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> mpam v0.1 and versions above v1.0 support optional long counter for
> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
> indicating support for long counters. As of now, a 44 bit counter
> represented by HAS_LONG field (bit 30) and a 63 bit counter represented
> by LWD (bit 29) can be optionally integrated. Probe for these counters
> and set corresponding feature bits if any of these counters are present.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Hi Rohit, James.

I'd like a little more justification of the 'front facing' use for the first
feature bit.  To me that seems confusing but I may well be missing why
we can't have 3 exclusive features.

Jonathan

> ---
>  drivers/resctrl/mpam_devices.c  | 23 ++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  9 +++++++++
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index eeb62ed94520..bae9fa9441dc 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -795,7 +795,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
>  		}
>  		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
> -			bool hw_managed;
> +			bool has_long, hw_managed;
>  			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
>  
>  			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
> @@ -805,6 +805,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
>  				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
>  
> +			/*
> +			 * Treat long counter and its extension, lwd as mutually
> +			 * exclusive feature bits. Though these are dependent
> +			 * fields at the implementation level, there would never
> +			 * be a need for mpam_feat_msmon_mbwu_44counter (long
> +			 * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
> +			 * bits to be set together.
> +			 *
> +			 * mpam_feat_msmon_mbwu isn't treated as an exclusive
> +			 * bit as this feature bit would be used as the "front
> +			 * facing feature bit" for any checks related to mbwu
> +			 * monitors.

Why do we need such a 'front facing' bit?  Why isn't it sufficient just to
add a little helper or macro to find out if mbwu is turned on?

> +			 */
> +			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumon_idr);
> +			if (props->num_mbwu_mon && has_long) {
> +				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumon_idr))
> +					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
> +				else
> +					mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
> +			}
> +
>  			/* Is NRDY hardware managed? */
>  			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
>  			if (hw_managed)
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 725c2aefa8a2..c190826dfbda 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -158,7 +158,16 @@ enum mpam_device_features {
>  	mpam_feat_msmon_csu_capture,
>  	mpam_feat_msmon_csu_xcl,
>  	mpam_feat_msmon_csu_hw_nrdy,
> +
> +	/*
> +	 * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
> +	 * counter would be used. The exact counter used is decided based on the
> +	 * status of mpam_feat_msmon_mbwu_44counter/mpam_feat_msmon_mbwu_63counter
> +	 * as well.
> +	 */
>  	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_44counter,
> +	mpam_feat_msmon_mbwu_63counter,
>  	mpam_feat_msmon_mbwu_capture,
>  	mpam_feat_msmon_mbwu_rwbw,
>  	mpam_feat_msmon_mbwu_hw_nrdy,



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported
  2025-09-10 20:43 ` [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-09-12 13:29   ` Jonathan Cameron
  2025-10-10 16:53     ` James Morse
  2025-09-26  4:51   ` Fenghua Yu
  1 sibling, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:29 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

On Wed, 10 Sep 2025 20:43:06 +0000
James Morse <james.morse@arm.com> wrote:

> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
> the RIS, use long/LWD counter instead of the regular 31 bit mbwu
> counter.
> 
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-10 20:43 ` [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-09-12 13:33   ` Jonathan Cameron
  2025-10-10 16:53     ` James Morse
  2025-09-18  2:35   ` Shaopeng Tan (Fujitsu)
  2025-09-26  4:11   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:33 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:43:07 +0000
James Morse <james.morse@arm.com> wrote:

> resctrl expects to reset the bandwidth counters when the filesystem
> is mounted.
> 
> To allow this, add a helper that clears the saved mbwu state. Instead
> of cross calling to each CPU that can access the component MSC to
> write to the counter, set a flag that causes it to be zero'd on the
> the next read. This is easily done by forcing a configuration update.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
Minor comments inline.

Jonathan

> @@ -1245,6 +1257,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>  	return err;
>  }
>  
> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
> +{
> +	int idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	if (!mpam_is_enabled())
> +		return;
> +
> +	idx = srcu_read_lock(&mpam_srcu);

Maybe guard() though it doesn't add that much here.

> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {

Reason not to use _srcu variants?

> +		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
> +			continue;
> +
> +		msc = vmsc->msc;
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
> +				continue;
> +
> +			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
> +				continue;
> +
> +			ris->mbwu_state[ctx->mon].correction = 0;
> +			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
> +			mpam_mon_sel_unlock(msc);
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
>  static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>  {
>  	u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index c190826dfbda..7cbcafe8294a 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -223,10 +223,12 @@ struct mon_cfg {
>  
>  /*
>   * Changes to enabled and cfg are protected by the msc->lock.
> - * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
> + * Changes to reset_on_next_read, prev_val and correction are protected by the
> + * msc's mon_sel_lock.
Getting close to the point where a list of one per line would reduce churn.
If you anticipate adding more to this in future I'd definitely consider it.
e.g.
 * msc's mon_sel_lcok protects:
 * - reset_on_next_read
 * - prev_val
 * - correction
 */
>   */
>  struct msmon_mbwu_state {
>  	bool		enabled;
> +	bool		reset_on_next_read;
>  	struct mon_cfg	cfg;
>  
>  	/* The value last read from the hardware. Used to detect overflow. */
> @@ -393,6 +395,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
>  
>  int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>  		    enum mpam_device_features, u64 *val);
> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
>  
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-10 20:43 ` [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-09-12 13:37   ` Jonathan Cameron
  2025-10-10 16:53     ` James Morse
  2025-09-12 16:06   ` Ben Horgan
  2025-09-26  2:35   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:37 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:43:08 +0000
James Morse <james.morse@arm.com> wrote:

> The bitmap reset code has been a source of bugs. Add a unit test.
> 
> This currently has to be built in, as the rest of the driver is
> builtin.
> 
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
Few trivial comments inline.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/resctrl/Kconfig             | 10 +++++
>  drivers/resctrl/mpam_devices.c      |  4 ++
>  drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
>  3 files changed, 82 insertions(+)
>  create mode 100644 drivers/resctrl/test_mpam_devices.c
> 
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> index c30532a3a3a4..ef59b3057d5d 100644
> --- a/drivers/resctrl/Kconfig
> +++ b/drivers/resctrl/Kconfig
> @@ -5,10 +5,20 @@ menuconfig ARM64_MPAM_DRIVER
>  	  MPAM driver for System IP, e,g. caches and memory controllers.
>  
>  if ARM64_MPAM_DRIVER
> +
>  config ARM64_MPAM_DRIVER_DEBUG
>  	bool "Enable debug messages from the MPAM driver"
>  	depends on ARM64_MPAM_DRIVER

Doing this under an if for the same isn't useful. So if you want to do this
style I'd do it before adding this earlier config option.

>  	help
>  	  Say yes here to enable debug messages from the MPAM driver.
>  
> +config MPAM_KUNIT_TEST
> +	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
> +	depends on KUNIT=y
> +	default KUNIT_ALL_TESTS
> +	help
> +	  Enable this option to run tests in the MPAM driver.
> +
> +	  If unsure, say N.
> +
>  endif

> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> new file mode 100644
> index 000000000000..3e7058f7601c
> --- /dev/null
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -0,0 +1,68 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2024 Arm Ltd.
> +/* This file is intended to be included into mpam_devices.c */
> +
> +#include <kunit/test.h>
> +
> +static void test_mpam_reset_msc_bitmap(struct kunit *test)
> +{
> +	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
> +	struct mpam_msc fake_msc = {0};

= { }; is sufficient and what newer c specs have adopted to mean
fill everything including holes in structures with 0.  There are some
tests that ensure that behavior applies with older compilers + the options
we use for building the kernel.

> +	u32 *test_result;
> +
> +	if (!buf)
> +		return;
> +
> +	fake_msc.mapped_hwpage = buf;
> +	fake_msc.mapped_hwpage_sz = SZ_16K;
> +	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
> +
> +	mutex_init(&fake_msc.part_sel_lock);
> +	mutex_lock(&fake_msc.part_sel_lock);

Perhaps add a comment to say this is to satisfy lock markings?
Otherwise someone might wonder why mutex_init() immediately followed
by taking the lock maskes sense.

> +
> +	test_result = (u32 *)(buf + MPAMCFG_CPBM);
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 1);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 1);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mutex_unlock(&fake_msc.part_sel_lock);
> +}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-10 20:43 ` [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-09-12 13:41   ` Jonathan Cameron
  2025-10-10 16:54     ` James Morse
  2025-09-12 16:01   ` Ben Horgan
  2025-09-26  2:36   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-12 13:41 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On Wed, 10 Sep 2025 20:43:09 +0000
James Morse <james.morse@arm.com> wrote:

> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
> 
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
Nice in general though I didn't go through the test expected results etc.
A few comments inline.

Thanks and looking forward to seeing this go in.

Jonathan

> ---
> Changes since v1:
>  * Waggled some words in comments.
>  * Moved a bunch of variables to be global - shuts up a compiler warning.
> ---
>  drivers/resctrl/mpam_internal.h     |   8 +-
>  drivers/resctrl/test_mpam_devices.c | 321 ++++++++++++++++++++++++++++
>  2 files changed, 328 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 7cbcafe8294a..6119e4573187 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -20,6 +20,12 @@
>  
>  DECLARE_STATIC_KEY_FALSE(mpam_enabled);
>  
> +#ifdef CONFIG_MPAM_KUNIT_TEST
> +#define PACKED_FOR_KUNIT __packed
> +#else
> +#define PACKED_FOR_KUNIT
> +#endif
> +
>  static inline bool mpam_is_enabled(void)
>  {
>  	return static_branch_likely(&mpam_enabled);
> @@ -189,7 +195,7 @@ struct mpam_props {
>  	u16			dspri_wd;
>  	u16			num_csu_mon;
>  	u16			num_mbwu_mon;
> -};
> +} PACKED_FOR_KUNIT;

Add a comment on 'why'.

>  
>  #define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
>  
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> index 3e7058f7601c..4eca8590c691 100644
> --- a/drivers/resctrl/test_mpam_devices.c
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -4,6 +4,325 @@
>  
>  #include <kunit/test.h>
>  
> +/*
> + * This test catches fields that aren't being sanitised - but can't tell you
> + * which one...
> + */
> +static void test__props_mismatch(struct kunit *test)
> +{
> +	struct mpam_props parent = { 0 };
> +	struct mpam_props child;
> +
> +	memset(&child, 0xff, sizeof(child));
> +	__props_mismatch(&parent, &child, false);
> +
> +	memset(&child, 0, sizeof(child));
> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +
> +	memset(&child, 0xff, sizeof(child));
> +	__props_mismatch(&parent, &child, true);
> +
> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +}
> +
> +static struct list_head fake_classes_list;
> +static struct mpam_class fake_class = { 0 };
> +static struct mpam_component fake_comp1 = { 0 };
> +static struct mpam_component fake_comp2 = { 0 };
> +static struct mpam_vmsc fake_vmsc1 = { 0 };
> +static struct mpam_vmsc fake_vmsc2 = { 0 };
> +static struct mpam_msc fake_msc1 = { 0 };
> +static struct mpam_msc fake_msc2 = { 0 };
> +static struct mpam_msc_ris fake_ris1 = { 0 };
> +static struct mpam_msc_ris fake_ris2 = { 0 };
> +static struct platform_device fake_pdev = { 0 };
> +
> +static void test_mpam_enable_merge_features(struct kunit *test)
> +{
> +#define RESET_FAKE_HIEARCHY()	do {				\
> +	INIT_LIST_HEAD(&fake_classes_list);			\
> +								\
> +	memset(&fake_class, 0, sizeof(fake_class));		\

Maybe just use a function?  Seems to be changing stuff that is
global mostly anyway so seems like it won't need large numbers
of parameters or anything like that.

> +	fake_class.level = 3;					\
> +	fake_class.type = MPAM_CLASS_CACHE;			\
> +	INIT_LIST_HEAD_RCU(&fake_class.components);		\
> +	INIT_LIST_HEAD(&fake_class.classes_list);		\
> +								\
> +	memset(&fake_comp1, 0, sizeof(fake_comp1));		\
> +	memset(&fake_comp2, 0, sizeof(fake_comp2));		\
> +	fake_comp1.comp_id = 1;					\
> +	fake_comp2.comp_id = 2;					\
> +	INIT_LIST_HEAD(&fake_comp1.vmsc);			\
> +	INIT_LIST_HEAD(&fake_comp1.class_list);			\
> +	INIT_LIST_HEAD(&fake_comp2.vmsc);			\
> +	INIT_LIST_HEAD(&fake_comp2.class_list);			\
> +								\
> +	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));		\
> +	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));		\
> +	INIT_LIST_HEAD(&fake_vmsc1.ris);			\
> +	INIT_LIST_HEAD(&fake_vmsc1.comp_list);			\
> +	fake_vmsc1.msc = &fake_msc1;				\
> +	INIT_LIST_HEAD(&fake_vmsc2.ris);			\
> +	INIT_LIST_HEAD(&fake_vmsc2.comp_list);			\
> +	fake_vmsc2.msc = &fake_msc2;				\
> +								\
> +	memset(&fake_ris1, 0, sizeof(fake_ris1));		\
> +	memset(&fake_ris2, 0, sizeof(fake_ris2));		\
> +	fake_ris1.ris_idx = 1;					\
> +	INIT_LIST_HEAD(&fake_ris1.msc_list);			\
> +	fake_ris2.ris_idx = 2;					\
> +	INIT_LIST_HEAD(&fake_ris2.msc_list);			\
> +								\
> +	fake_msc1.pdev = &fake_pdev;				\
> +	fake_msc2.pdev = &fake_pdev;				\
> +								\
> +	list_add(&fake_class.classes_list, &fake_classes_list);	\
> +} while (0)
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	mutex_lock(&mpam_list_lock);
> +
> +	/* One Class+Comp, two RIS in one vMSC with common features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = NULL;
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
  2025-09-12 12:12   ` Jonathan Cameron
@ 2025-09-12 14:40   ` Ben Horgan
  2025-10-02 18:03     ` James Morse
  2025-09-12 15:22   ` Dave Martin
  2025-09-25  6:33   ` Fenghua Yu
  3 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 14:40 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the ESR register, so no locking is needed.
> The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning
> it can't be done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Made mpam_unregister_irqs() safe to race with itself.
>  * Removed threaded interrupts.
>  * Schedule mpam_disable() from cpuhp callback in the case of an error.
>  * Added mpam_disable_reason.
>  * Use alloc_percpu()
> 
> Changes since RFC:
>  * Use guard marco when walking srcu list.
>  * Use INTEN macro for enabling interrupts.
>  * Move partid_max_published up earlier in mpam_enable_once().
> ---
>  drivers/resctrl/mpam_devices.c  | 277 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  10 ++
>  2 files changed, 284 insertions(+), 3 deletions(-)
> 

>  
> +static int __setup_ppi(struct mpam_msc *msc)
> +{
> +	int cpu;
> +	struct device *dev = &msc->pdev->dev;
> +
> +	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
> +	if (!msc->error_dev_id)
> +		return -ENOMEM;
> +
> +	for_each_cpu(cpu, &msc->accessibility) {
> +		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
> +
> +		if (empty) {

I'm confused about how this if conditioned can be satisfied. Isn't the
alloc clearing msc->error_dev_id for each cpu and then it's only getting
set for each cpu later in the iteration.

> +			dev_err_once(dev, "MSC shares PPI with %s!\n",
> +				     dev_name(&empty->pdev->dev));
> +			return -EBUSY;
> +		}
> +		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
> +{
> +	int irq;
> +
> +	irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +	if (irq <= 0)
> +		return 0;
> +
> +	/* Allocate and initialise the percpu device pointer for PPI */
> +	if (irq_is_percpu(irq))
> +		return __setup_ppi(msc);
> +
> +	/* sanity check: shared interrupts can be routed anywhere? */
> +	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
> +		pr_err_once("msc:%u is a private resource with a shared error interrupt",
> +			    msc->id);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * An MSC can control traffic from a set of CPUs, but may only be accessible
>   * from a (hopefully wider) set of CPUs. The common reason for this is power
> @@ -1060,6 +1143,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  			break;
>  		}
>  
> +		err = mpam_msc_setup_error_irq(msc);
> +		if (err)
> +			break;
> +
>  		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>  					     &msc->pcc_subspace_id))
>  			msc->iface = MPAM_IFACE_MMIO;
> @@ -1318,11 +1405,172 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
>  	}
>  }
>  
> +static char *mpam_errcode_names[16] = {
> +	[0] = "No error",
> +	[1] = "PARTID_SEL_Range",
> +	[2] = "Req_PARTID_Range",
> +	[3] = "MSMONCFG_ID_RANGE",
> +	[4] = "Req_PMG_Range",
> +	[5] = "Monitor_Range",
> +	[6] = "intPARTID_Range",
> +	[7] = "Unexpected_INTERNAL",
> +	[8] = "Undefined_RIS_PART_SEL",
> +	[9] = "RIS_No_Control",
> +	[10] = "Undefined_RIS_MON_SEL",
> +	[11] = "RIS_No_Monitor",
> +	[12 ... 15] = "Reserved"
> +};
> +
> +static int mpam_enable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
> +
> +	return 0;
> +}
> +
> +/* This can run in mpam_disable(), and the interrupt handler on the same CPU */
> +static int mpam_disable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, 0);
> +
> +	return 0;
> +}
> +
> +static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
> +{
> +	u64 reg;
> +	u16 partid;
> +	u8 errcode, pmg, ris;
> +
> +	if (WARN_ON_ONCE(!msc) ||
> +	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &msc->accessibility)))
> +		return IRQ_NONE;
> +
> +	reg = mpam_msc_read_esr(msc);
> +
> +	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
> +	if (!errcode)
> +		return IRQ_NONE;
> +
> +	/* Clear level triggered irq */
> +	mpam_msc_zero_esr(msc);
> +
> +	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
> +	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
> +	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
> +
> +	pr_err_ratelimited("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
> +			   msc->id, mpam_errcode_names[errcode], partid, pmg,
> +			   ris);
> +
> +	/* Disable this interrupt. */
> +	mpam_disable_msc_ecr(msc);
> +
> +	/*
> +	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
> +	 * unregister the interrupt from the threaded part of the handler.
> +	 */
> +	mpam_disable_reason = "hardware error interrupt";
> +	schedule_work(&mpam_broken_work);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
> +{
> +	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
> +
> +	return __mpam_irq_handler(irq, msc);
> +}
> +
> +static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
> +{
> +	struct mpam_msc *msc = dev_id;
> +
> +	return __mpam_irq_handler(irq, msc);
> +}
> +
> +static int mpam_register_irqs(void)
> +{
> +	int err, irq;
> +	struct mpam_msc *msc;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
> +		/* We anticipate sharing the interrupt with other MSCs */
> +		if (irq_is_percpu(irq)) {
> +			err = request_percpu_irq(irq, &mpam_ppi_handler,
> +						 "mpam:msc:error",
> +						 msc->error_dev_id);
> +			if (err)
> +				return err;
> +
> +			msc->reenable_error_ppi = irq;
> +			smp_call_function_many(&msc->accessibility,
> +					       &_enable_percpu_irq, &irq,
> +					       true);
> +		} else {
> +			err = devm_request_irq(&msc->pdev->dev,irq,
> +					       &mpam_spi_handler, IRQF_SHARED,
> +					       "mpam:msc:error", msc);
> +			if (err)
> +				return err;
> +		}
> +
> +		set_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags);
> +		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
> +		set_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags);
> +	}
> +
> +	return 0;
> +}
> +
> +static void mpam_unregister_irqs(void)
> +{
> +	int irq, idx;
> +	struct mpam_msc *msc;
> +
> +	cpus_read_lock();
> +	/* take the lock as free_irq() can sleep */
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags))
> +			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
> +
> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags)) {
> +			if (irq_is_percpu(irq)) {
> +				msc->reenable_error_ppi = 0;
> +				free_percpu_irq(irq, msc->error_dev_id);
> +			} else {
> +				devm_free_irq(&msc->pdev->dev, irq, msc);
> +			}
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +	cpus_read_unlock();
> +}
> +
>  static void mpam_enable_once(void)
>  {
> -	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> -	mutex_unlock(&mpam_list_lock);
> +	int err;
>  
>  	/*
>  	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> @@ -1332,6 +1580,27 @@ static void mpam_enable_once(void)
>  	partid_max_published = true;
>  	spin_unlock(&partid_max_lock);
>  
> +	/*
> +	 * If all the MSC have been probed, enabling the IRQs happens next.
> +	 * That involves cross-calling to a CPU that can reach the MSC, and
> +	 * the locks must be taken in this order:
> +	 */
> +	cpus_read_lock();
> +	mutex_lock(&mpam_list_lock);
> +	mpam_enable_merge_features(&mpam_classes);
> +
> +	err = mpam_register_irqs();
> +	if (err)
> +		pr_warn("Failed to register irqs: %d\n", err);
> +
> +	mutex_unlock(&mpam_list_lock);
> +	cpus_read_unlock();
> +
> +	if (err) {
> +		schedule_work(&mpam_broken_work);
> +		return;
> +	}
> +
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>  
>  	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
> @@ -1397,6 +1666,8 @@ void mpam_disable(struct work_struct *ignored)
>  	}
>  	mutex_unlock(&mpam_cpuhp_state_lock);
>  
> +	mpam_unregister_irqs();
> +
>  	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>  				 srcu_read_lock_held(&mpam_srcu))
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 6e047fbd3512..f04a9ef189cf 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -32,6 +32,10 @@ struct mpam_garbage {
>  	struct platform_device	*pdev;
>  };
>  
> +/* Bit positions for error_irq_flags */
> +#define	MPAM_ERROR_IRQ_REQUESTED  0
> +#define	MPAM_ERROR_IRQ_HW_ENABLED 1
> +
>  struct mpam_msc {
>  	/* member of mpam_all_msc */
>  	struct list_head        all_msc_list;
> @@ -46,6 +50,11 @@ struct mpam_msc {
>  	struct pcc_mbox_chan	*pcc_chan;
>  	u32			nrdy_usec;
>  	cpumask_t		accessibility;
> +	bool			has_extd_esr;
> +
> +	int				reenable_error_ppi;
> +	struct mpam_msc * __percpu	*error_dev_id;
> +
>  	atomic_t		online_refs;
>  
>  	/*
> @@ -54,6 +63,7 @@ struct mpam_msc {
>  	 */
>  	struct mutex		probe_lock;
>  	bool			probed;
> +	unsigned long		error_irq_flags;
>  	u16			partid_max;
>  	u8			pmg_max;
>  	unsigned long		ris_idxs;


Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-10 20:42 ` [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
  2025-09-12 12:13   ` Jonathan Cameron
@ 2025-09-12 14:42   ` Ben Horgan
  2025-10-03 18:03     ` James Morse
  2025-09-26  2:31   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 14:42 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> Once all the MSC have been probed, the system wide usable number of
> PARTID is known and the configuration arrays can be allocated.
> 
> After this point, checking all the MSC have been probed is pointless,
> and the cpuhp callbacks should restore the configuration, instead of
> just resetting the MSC.
> 
> Add a static key to enable this behaviour. This will also allow MPAM
> to be disabled in repsonse to an error, and the architecture code to
> enable/disable the context switch of the MPAM system registers.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 12 ++++++++++++
>  drivers/resctrl/mpam_internal.h |  8 ++++++++
>  2 files changed, 20 insertions(+)
> 

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-12 11:25   ` Ben Horgan
@ 2025-09-12 14:52     ` Ben Horgan
  2025-09-30 17:06       ` James Morse
  2025-09-30 17:06     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 14:52 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich



On 9/12/25 12:25, Ben Horgan wrote:
> Hi James,
> 
> On 9/10/25 21:42, James Morse wrote:
>> When a CPU comes online, it may bring a newly accessible MSC with
>> it. Only the default partid has its value reset by hardware, and
>> even then the MSC might not have been reset since its config was
>> previously dirtyied. e.g. Kexec.
>>
>> Any in-use partid must have its configuration restored, or reset.
>> In-use partids may be held in caches and evicted later.
>>
>> MSC are also reset when CPUs are taken offline to cover cases where
>> firmware doesn't reset the MSC over reboot using UEFI, or kexec
>> where there is no firmware involvement.
>>
>> If the configuration for a RIS has not been touched since it was
>> brought online, it does not need resetting again.
>>
>> To reset, write the maximum values for all discovered controls.
>>
>> CC: Rohit Mathew <Rohit.Mathew@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> Changes since RFC:
>>  * Last bitmap write will always be non-zero.
>>  * Dropped READ_ONCE() - teh value can no longer change.
>>  * Write 0 to proporitional stride, remove the bwa_fract variable.
>>  * Removed nested srcu lock, the assert should cover it.
>> ---
>>  drivers/resctrl/mpam_devices.c  | 117 ++++++++++++++++++++++++++++++++
>>  drivers/resctrl/mpam_internal.h |   8 +++
>>  2 files changed, 125 insertions(+)
>>
>> +
>> +static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>> +{
>> +	struct mpam_msc_ris *ris;
>> +
>> +	mpam_assert_srcu_read_lock_held();
> 
> Unneeded? Checked in list_for_each_entry_srcu().> +

If you do get rid of this then that leaves one use of the helper,
mpam_assert_srcu_read_lock_held(), and so the helper could go.

> Thanks,
> 
> Ben
> 
> 

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-09-10 20:43 ` [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
  2025-09-12 12:22   ` Jonathan Cameron
@ 2025-09-12 15:00   ` Ben Horgan
  2025-09-25  6:53   ` Fenghua Yu
  2 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 15:00 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:43, James Morse wrote:
> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate
> a configuration array for each component, and reprogram each RIS
> configuration from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Switched entry_rcu to srcu versions.
> 
> Changes since RFC:
>  * Added a comment about the ordering around max_partid.
>  * Allocate configurations after interrupts are registered to reduce churn.
>  * Added mpam_assert_partid_sizes_fixed();
>  * Make reset use an all-ones instead of zero config.
> ---
>  drivers/resctrl/mpam_devices.c  | 269 +++++++++++++++++++++++++++++---
>  drivers/resctrl/mpam_internal.h |  29 +++-
>  2 files changed, 271 insertions(+), 27 deletions(-)
> 

Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-11 15:46   ` Ben Horgan
@ 2025-09-12 15:08     ` Ben Horgan
  2025-10-06 16:00       ` James Morse
  2025-10-06 15:59     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 15:08 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/11/25 16:46, Ben Horgan wrote:
> Hi James,
> 
> On 9/10/25 21:43, James Morse wrote:
>> Reading a monitor involves configuring what you want to monitor, and
>> reading the value. Components made up of multiple MSC may need values
>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>> The maximum 'not ready' time should have been provided by firmware.
>>
>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>> not ready, then wait the full timeout value before trying again.
>>
>> CC: Shanker Donthineni <sdonthineni@nvidia.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> Changes since v1:
>>  * Added XCL support.
>>  * Merged FLT/CTL constants.
>>  * a spelling mistake in a comment.
>>  * moved structrues around.
>> ---
>>  drivers/resctrl/mpam_devices.c  | 226 ++++++++++++++++++++++++++++++++
>>  drivers/resctrl/mpam_internal.h |  19 +++
>>  2 files changed, 245 insertions(+)
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index cf190f896de1..1543c33c5d6a 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> +
>> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>> +				   u32 *flt_val)
>> +{
>> +	struct mon_cfg *ctx = m->ctx;
>> +
>> +	/*
>> +	 * For CSU counters its implementation-defined what happens when not
>> +	 * filtering by partid.
>> +	 */
>> +	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>> +
>> +	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
>> +	if (m->ctx->match_pmg) {
>> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>> +		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
>> +	}
>> +
>> +	switch (m->type) {
>> +	case mpam_feat_msmon_csu:
>> +		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
>> +
>> +		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
>> +			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
>> +					       ctx->csu_exclude_clean);
>> +
>> +		break;
>> +	case mpam_feat_msmon_mbwu:
>> +		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;

As you mentioned offline, this zeroes the other bits in *ctl_val.


Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
  2025-09-12 12:12   ` Jonathan Cameron
  2025-09-12 14:40   ` Ben Horgan
@ 2025-09-12 15:22   ` Dave Martin
  2025-10-03 18:02     ` James Morse
  2025-09-25  6:33   ` Fenghua Yu
  3 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-12 15:22 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi James,

On Wed, Sep 10, 2025 at 08:42:58PM +0000, James Morse wrote:
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the ESR register, so no locking is needed.

Nit: MPAMF_ESR?  (Casual readers may confuse it with ESR_ELx.
Formally, there is no MPAM "ESR" register, though people familiar with
the spec will of course know what you're referring to.)

> The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning
> it can't be done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Made mpam_unregister_irqs() safe to race with itself.
>  * Removed threaded interrupts.
>  * Schedule mpam_disable() from cpuhp callback in the case of an error.
>  * Added mpam_disable_reason.
>  * Use alloc_percpu()
> 
> Changes since RFC:
>  * Use guard marco when walking srcu list.
>  * Use INTEN macro for enabling interrupts.
>  * Move partid_max_published up earlier in mpam_enable_once().
> ---
>  drivers/resctrl/mpam_devices.c  | 277 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  10 ++
>  2 files changed, 284 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index a9d3c4b09976..e7e4afc1ea95 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -14,6 +14,9 @@
>  #include <linux/device.h>
>  #include <linux/errno.h>
>  #include <linux/gfp.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
>  #include <linux/list.h>
>  #include <linux/lockdep.h>
>  #include <linux/mutex.h>
> @@ -166,6 +169,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
>  	return (idr_high << 32) | idr_low;
>  }
>  
> +static void mpam_msc_zero_esr(struct mpam_msc *msc)

Nit: Maybe clear_esr?  (The fact that setting the ERRCODE and OVRWR
fields to zero clears the interrupt and prepares for unambiguous
reporting of the next error is more of an implementation detail.
It doesn't matter what the rest of the register is set to.)

> +{
> +	__mpam_write_reg(msc, MPAMF_ESR, 0);
> +	if (msc->has_extd_esr)

This deasserts the interrupt (if level-sensitive) and enables the MSC
to report further errors.  If we are unlucky and error occurs now,
won't we splat the newly HW-generated RIS field by:

> +		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);

...?  If so, we will diagnose the wrong RIS when we pump the new error
from MPAMF_ESR.  I think the correct interpretation of the spec may be
that:

 a) software should treat fields in MPAMF_ESR[63:32] as vaild only if
    ERRCODE is nonzero, and

 b) software should never write to MPAMF_ESR[63:32] while ERRCODE is
    zero.

Does this look right?  Should the fields be cleared in the opposite
order?

Or alternatively, is it actually necessary to clear MPAMF_ESR[63:32]
at all?

(The spec seems a bit vague on what software is supposed to do with
this register to ensure correctness...)

> +}
> +
> +static u64 mpam_msc_read_esr(struct mpam_msc *msc)
> +{
> +	u64 esr_high = 0, esr_low;
> +
> +	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
> +	if (msc->has_extd_esr)
> +		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
> +
> +	return (esr_high << 32) | esr_low;
> +}

[...]

> @@ -895,6 +920,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  	}
>  }
>  
> +static void _enable_percpu_irq(void *_irq)
> +{
> +	int *irq = _irq;
> +
> +	enable_percpu_irq(*irq, IRQ_TYPE_NONE);

Can the type vary?  (Maybe this makes no sense on GIC-based systems --
IRQ_TYPE_NONE (or "0") seems overwhelmingly common.)

(Just my lack of familiarity takling, here.)

[...]

> +static int __setup_ppi(struct mpam_msc *msc)
> +{
> +	int cpu;
> +	struct device *dev = &msc->pdev->dev;
> +
> +	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
> +	if (!msc->error_dev_id)
> +		return -ENOMEM;
> +
> +	for_each_cpu(cpu, &msc->accessibility) {
> +		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
> +
> +		if (empty) {
> +			dev_err_once(dev, "MSC shares PPI with %s!\n",
> +				     dev_name(&empty->pdev->dev));
> +			return -EBUSY;
> +		}
> +		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
> +	}

How are PPIs supposed to work?

An individual MSC that is affine to multiple CPUs has no way to
distinguish which CPU an error relates to, and no CPU-specific (or even
RIS-specific) ESR.

So, won't such an interrupt be pointlessly take the interrupt on all
CPUs, which would all fight over the reported event?

Have you encountered any platforms wired up this way?  The spec
recommends not to do this, but does not provide any rationale...

The spec only mentions PPIs in the context of being affine to a single
CPU (PE).  It's not clear to me that any other use of PPIs makes
sense (?)

If we really have to cope with this, maybe it would make sense to pick
a single CPU in the affinity set (though we might have to move it
around if the unlucky CPU is offlined).

[...]

> +static char *mpam_errcode_names[16] = {
> +	[0] = "No error",
> +	[1] = "PARTID_SEL_Range",
> +	[2] = "Req_PARTID_Range",
> +	[3] = "MSMONCFG_ID_RANGE",
> +	[4] = "Req_PMG_Range",
> +	[5] = "Monitor_Range",
> +	[6] = "intPARTID_Range",
> +	[7] = "Unexpected_INTERNAL",
> +	[8] = "Undefined_RIS_PART_SEL",
> +	[9] = "RIS_No_Control",
> +	[10] = "Undefined_RIS_MON_SEL",
> +	[11] = "RIS_No_Monitor",
> +	[12 ... 15] = "Reserved"
> +};
> +
> +static int mpam_enable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
> +
> +	return 0;
> +}

This could also be a switch () { case 0: return "foo";
case 1: return "bar"; ... }, without the explicit table.  This would
avoid having to think about the ERRCODE field growing.  (There are some
RES0 bits looming over it.)

(This also tends to avoid the extra pointer table in .rodata, which
might be of interest if this were a hot path.)

[...]

(Review truncated -- that's the comments I had so far on the previous
series.)

Cheers
---Dave


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-09-10 20:43 ` [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
  2025-09-12 13:24   ` Jonathan Cameron
@ 2025-09-12 15:55   ` Ben Horgan
  2025-10-13 16:29     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 15:55 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:43, James Morse wrote:
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
> 
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Fixed lock/unlock typo.
> ---
>  drivers/resctrl/mpam_devices.c  | 154 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  23 +++++
>  2 files changed, 175 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 1543c33c5d6a..eeb62ed94520 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -918,6 +918,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>  	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>  
>  	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
> +
>  	if (m->ctx->match_pmg) {
>  		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>  		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
> @@ -972,6 +973,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
>  static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  				     u32 flt_val)
>  {
> +	struct msmon_mbwu_state *mbwu_state;
>  	struct mpam_msc *msc = m->ris->vmsc->msc;
>  
>  	/*
> @@ -990,20 +992,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>  		mpam_write_monsel_reg(msc, MBWU, 0);
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +
> +		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> +		if (mbwu_state)
> +			mbwu_state->prev_val = 0;

What's the if condition doing here?

The below could make more sense but I don't think you can get here if
the allocation fails.

if (m->ris->mbwu_state)

> +
>  		break;
>  	default:
>  		return;
>  	}
>  }
>  
> +static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
> +{
> +	/* TODO: scaling, and long counters */
> +	return GENMASK_ULL(30, 0);
> +}
> +
>  /* Call with MSC lock held */
>  static void __ris_msmon_read(void *arg)
>  {
> -	u64 now;
>  	bool nrdy = false;
>  	struct mon_read *m = arg;
> +	u64 now, overflow_val = 0;
>  	struct mon_cfg *ctx = m->ctx;
>  	struct mpam_msc_ris *ris = m->ris;
> +	struct msmon_mbwu_state *mbwu_state;
>  	struct mpam_props *rprops = &ris->props;
>  	struct mpam_msc *msc = m->ris->vmsc->msc;
>  	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> @@ -1031,11 +1045,30 @@ static void __ris_msmon_read(void *arg)
>  		now = mpam_read_monsel_reg(msc, CSU);
>  		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
>  			nrdy = now & MSMON___NRDY;
> +		now = FIELD_GET(MSMON___VALUE, now);
>  		break;
>  	case mpam_feat_msmon_mbwu:
>  		now = mpam_read_monsel_reg(msc, MBWU);
>  		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
>  			nrdy = now & MSMON___NRDY;
> +		now = FIELD_GET(MSMON___VALUE, now);
> +
> +		if (nrdy)
> +			break;
> +
> +		mbwu_state = &ris->mbwu_state[ctx->mon];
> +		if (!mbwu_state)
> +			break;
> +
> +		/* Add any pre-overflow value to the mbwu_state->val */
> +		if (mbwu_state->prev_val > now)
> +			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
> +
> +		mbwu_state->prev_val = now;
> +		mbwu_state->correction += overflow_val;
> +
> +		/* Include bandwidth consumed before the last hardware reset */
> +		now += mbwu_state->correction;
>  		break;
>  	default:
>  		m->err = -EINVAL;
> @@ -1048,7 +1081,6 @@ static void __ris_msmon_read(void *arg)
>  		return;
>  	}
>  
> -	now = FIELD_GET(MSMON___VALUE, now);
>  	*m->val += now;
>  }
>  
> @@ -1261,6 +1293,67 @@ static int mpam_reprogram_ris(void *_arg)
>  	return 0;
>  }
>  
> +/* Call with MSC lock held */
> +static int mpam_restore_mbwu_state(void *_ris)
> +{
> +	int i;
> +	struct mon_read mwbu_arg;
> +	struct mpam_msc_ris *ris = _ris;
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		if (ris->mbwu_state[i].enabled) {
> +			mwbu_arg.ris = ris;
> +			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
> +			mwbu_arg.type = mpam_feat_msmon_mbwu;
> +
> +			__ris_msmon_read(&mwbu_arg);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/* Call with MSC lock and held */
> +static int mpam_save_mbwu_state(void *arg)
> +{
> +	int i;
> +	u64 val;
> +	struct mon_cfg *cfg;
> +	u32 cur_flt, cur_ctl, mon_sel;
> +	struct mpam_msc_ris *ris = arg;
> +	struct msmon_mbwu_state *mbwu_state;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		mbwu_state = &ris->mbwu_state[i];
> +		cfg = &mbwu_state->cfg;
> +
> +		if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
> +			return -EIO;
> +
> +		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
> +			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
> +
> +		val = mpam_read_monsel_reg(msc, MBWU);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +
> +		cfg->mon = i;
> +		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
> +		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
> +		cfg->partid = FIELD_GET(MSMON_CFG_x_FLT_PARTID, cur_flt);
> +		mbwu_state->correction += val;
> +		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
> +		mpam_mon_sel_unlock(msc);
> +	}
> +
> +	return 0;
> +}
> +
>  static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
>  {
>  	memset(reset_cfg, 0, sizeof(*reset_cfg));
> @@ -1335,6 +1428,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  		 * for non-zero partid may be lost while the CPUs are offline.
>  		 */
>  		ris->in_reset_state = online;
> +
> +		if (mpam_is_enabled() && !online)
> +			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
>  	}
>  }
>  
> @@ -1369,6 +1465,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
>  			mpam_reprogram_ris_partid(ris, partid, cfg);
>  		}
>  		ris->in_reset_state = reset;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
> +			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
>  	}
>  }
>  
> @@ -2091,11 +2190,33 @@ static void mpam_unregister_irqs(void)
>  
>  static void __destroy_component_cfg(struct mpam_component *comp)
>  {
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
>  	add_to_garbage(comp->cfg);
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		if (mpam_mon_sel_lock(msc)) {
> +			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
> +				add_to_garbage(ris->mbwu_state);
> +			mpam_mon_sel_unlock(msc);
> +		}
> +	}
>  }
>  
>  static int __allocate_component_cfg(struct mpam_component *comp)
>  {
> +	int err = 0;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct msmon_mbwu_state *mbwu_state;
> +
> +	lockdep_assert_held(&mpam_list_lock);
>  	mpam_assert_partid_sizes_fixed();
>  
>  	if (comp->cfg)
> @@ -2106,6 +2227,35 @@ static int __allocate_component_cfg(struct mpam_component *comp)
>  		return -ENOMEM;
>  	init_garbage(comp->cfg);
>  
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (!vmsc->props.num_mbwu_mon)
> +			continue;
> +
> +		msc = vmsc->msc;
> +		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
> +			if (!ris->props.num_mbwu_mon)
> +				continue;
> +
> +			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
> +					     sizeof(*ris->mbwu_state),
> +					     GFP_KERNEL);
> +			if (!mbwu_state) {
> +				__destroy_component_cfg(comp);
> +				err = -ENOMEM;
> +				break;
> +			}
> +
> +			if (mpam_mon_sel_lock(msc)) {
> +				init_garbage(mbwu_state);
> +				ris->mbwu_state = mbwu_state;
> +				mpam_mon_sel_unlock(msc);
> +			}

The if statement is confusing now that mpam_mon_sel_lock()
unconditionally returns true.

> +		}
> +
> +		if (err)
> +			break;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index bb01e7dbde40..725c2aefa8a2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -212,6 +212,26 @@ struct mon_cfg {
>  	enum mon_filter_options opts;
>  };
>  
> +/*
> + * Changes to enabled and cfg are protected by the msc->lock.
> + * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
> + */
> +struct msmon_mbwu_state {
> +	bool		enabled;
> +	struct mon_cfg	cfg;
> +
> +	/* The value last read from the hardware. Used to detect overflow. */
> +	u64		prev_val;
> +
> +	/*
> +	 * The value to add to the new reading to account for power management,
> +	 * and shifts to trigger the overflow interrupt.
> +	 */
> +	u64		correction;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
>  struct mpam_class {
>  	/* mpam_components in this class */
>  	struct list_head	components;
> @@ -304,6 +324,9 @@ struct mpam_msc_ris {
>  	/* parent: */
>  	struct mpam_vmsc	*vmsc;
>  
> +	/* msmon mbwu configuration is preserved over reset */
> +	struct msmon_mbwu_state	*mbwu_state;
> +
>  	struct mpam_garbage	garbage;
>  };
>  

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-10 20:43 ` [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
  2025-09-12 13:41   ` Jonathan Cameron
@ 2025-09-12 16:01   ` Ben Horgan
  2025-10-10 16:54     ` James Morse
  2025-09-26  2:36   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 16:01 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:43, James Morse wrote:
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
> 
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Waggled some words in comments.
>  * Moved a bunch of variables to be global - shuts up a compiler warning.
> ---
>  drivers/resctrl/mpam_internal.h     |   8 +-
>  drivers/resctrl/test_mpam_devices.c | 321 ++++++++++++++++++++++++++++
>  2 files changed, 328 insertions(+), 1 deletion(-)
> 

Looks good to me, I checked the tests for v1. I agree with Jonathan that
you could make RESET_FAKE_HIEARCHY() a function now that you've changed
to use globals.

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-10 20:43 ` [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
  2025-09-12 13:37   ` Jonathan Cameron
@ 2025-09-12 16:06   ` Ben Horgan
  2025-10-10 16:53     ` James Morse
  2025-09-26  2:35   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-12 16:06 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:43, James Morse wrote:
> The bitmap reset code has been a source of bugs. Add a unit test.
> 
> This currently has to be built in, as the rest of the driver is
> builtin.
> 
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/Kconfig             | 10 +++++
>  drivers/resctrl/mpam_devices.c      |  4 ++
>  drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
>  3 files changed, 82 insertions(+)
>  create mode 100644 drivers/resctrl/test_mpam_devices.c
> 
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> index c30532a3a3a4..ef59b3057d5d 100644
> --- a/drivers/resctrl/Kconfig
> +++ b/drivers/resctrl/Kconfig
> @@ -5,10 +5,20 @@ menuconfig ARM64_MPAM_DRIVER
>  	  MPAM driver for System IP, e,g. caches and memory controllers.
>  
>  if ARM64_MPAM_DRIVER
> +

Nit: add the empty line in an earlier patch

Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
  2025-09-11 13:17   ` Jonathan Cameron
  2025-09-11 14:56   ` Lorenzo Pieralisi
@ 2025-09-16 13:17   ` Zeng Heng
  2025-09-19 16:11     ` James Morse
  2025-10-02  3:21   ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table Fenghua Yu
  2025-10-03  0:58   ` Gavin Shan
  4 siblings, 1 reply; 200+ messages in thread
From: Zeng Heng @ 2025-09-16 13:17 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao, zengheng4

After updating the monitor configuration, the first read of the monitoring
result requires waiting for the "not ready" duration before an effective
value can be obtained.

Because a component consists of multiple MPAM instances, after updating the
configuration of each instance, should wait for the "not ready" period of
per single instance before the valid monitoring value can be obtained, not
just wait for once interval per component.

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
It's fine to merge this patch directly into patch 7 of the responding
patchset.
---
 drivers/resctrl/mpam_devices.c | 36 +++++++++++++++-------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2962cd018207..e79a46646863 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1183,11 +1183,14 @@ static void __ris_msmon_read(void *arg)
 	}

 	*m->val += now;
+	m->err = 0;
 }

 static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
 {
 	int err, idx;
+	bool read_again;
+	u64 wait_jiffies;
 	struct mpam_msc *msc;
 	struct mpam_vmsc *vmsc;
 	struct mpam_msc_ris *ris;
@@ -1198,10 +1201,22 @@ static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)

 		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
 			arg->ris = ris;
+			read_again = false;
+again:

 			err = smp_call_function_any(&msc->accessibility,
 						    __ris_msmon_read, arg,
 						    true);
+			if (arg->err == -EBUSY && !read_again) {
+				read_again = true;
+
+				wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+				while (wait_jiffies)
+					wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+				goto again;
+			}
+
 			if (!err && arg->err)
 				err = arg->err;
 			if (err)
@@ -1218,9 +1233,7 @@ static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features type, u64 *val)
 {
-	int err;
 	struct mon_read arg;
-	u64 wait_jiffies = 0;
 	struct mpam_props *cprops = &comp->class->props;

 	might_sleep();
@@ -1237,24 +1250,7 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	arg.val = val;
 	*val = 0;

-	err = _msmon_read(comp, &arg);
-	if (err == -EBUSY && comp->class->nrdy_usec)
-		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
-
-	while (wait_jiffies)
-		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
-
-	if (err == -EBUSY) {
-		memset(&arg, 0, sizeof(arg));
-		arg.ctx = ctx;
-		arg.type = type;
-		arg.val = val;
-		*val = 0;
-
-		err = _msmon_read(comp, &arg);
-	}
-
-	return err;
+	return _msmon_read(comp, &arg);
 }

 void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
--
2.25.1



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-10 20:42 ` [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
  2025-09-11 13:35   ` Jonathan Cameron
@ 2025-09-17 11:03   ` Ben Horgan
  2025-09-29 17:44     ` James Morse
  2025-10-03  3:53   ` Gavin Shan
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-17 11:03 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 21:42, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+= mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG

s/cflags/ccflags/

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-10 20:43 ` [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
  2025-09-12 13:33   ` Jonathan Cameron
@ 2025-09-18  2:35   ` Shaopeng Tan (Fujitsu)
  2025-10-10 16:53     ` James Morse
  2025-09-26  4:11   ` Fenghua Yu
  2 siblings, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-18  2:35 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hello James

> resctrl expects to reset the bandwidth counters when the filesystem is
> mounted.
> 
> To allow this, add a helper that clears the saved mbwu state. Instead of cross
> calling to each CPU that can access the component MSC to write to the counter,
> set a flag that causes it to be zero'd on the the next read. This is easily done by
> forcing a configuration update.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 47
> +++++++++++++++++++++++++++++++--
> drivers/resctrl/mpam_internal.h |  5 +++-
>  2 files changed, 49 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3080a81f0845..8254d6190ca2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1088,9 +1088,11 @@ static u64 mpam_msmon_overflow_val(struct
> mpam_msc_ris *ris)  static void __ris_msmon_read(void *arg)  {
>  	bool nrdy = false;
> +	bool config_mismatch;
>  	struct mon_read *m = arg;
>  	u64 now, overflow_val = 0;
>  	struct mon_cfg *ctx = m->ctx;
> +	bool reset_on_next_read = false;
>  	struct mpam_msc_ris *ris = m->ris;
>  	struct msmon_mbwu_state *mbwu_state;
>  	struct mpam_props *rprops = &ris->props; @@ -1105,6 +1107,14 @@
> static void __ris_msmon_read(void *arg)
>  		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
>  	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> 
> +	if (m->type == mpam_feat_msmon_mbwu) {
> +		mbwu_state = &ris->mbwu_state[ctx->mon];
> +		if (mbwu_state) {
> +			reset_on_next_read =
> mbwu_state->reset_on_next_read;
> +			mbwu_state->reset_on_next_read = false;
> +		}
> +	}
> +
>  	/*
>  	 * Read the existing configuration to avoid re-writing the same values.
>  	 * This saves waiting for 'nrdy' on subsequent reads.
> @@ -1112,7 +1122,10 @@ static void __ris_msmon_read(void *arg)
>  	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
>  	clean_msmon_ctl_val(&cur_ctl);
>  	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> -	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> +	config_mismatch = cur_flt != flt_val ||
> +			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
> +
> +	if (config_mismatch || reset_on_next_read)
>  		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);

mbm_handle_overflow() calls __ris_msmon_read() every second. 
If there are multiple monitor groups, the config_mismatch will "true" every second. 
Then "mbwu_state->prev_val = 0;" in function write_msmon_ctl_flt_vals() will be always run.
This means that for multiple monitoring groups, the MemoryBandwidth monitoring value is cleared every second.
https://lore.kernel.org/lkml/20250910204309.20751-25-james.morse@arm.com/
+		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+		if (mbwu_state)
+			mbwu_state->prev_val = 0;

Best regards,
Shaopeng TAN

>  	switch (m->type) {
> @@ -1145,7 +1158,6 @@ static void __ris_msmon_read(void *arg)
>  		if (nrdy)
>  			break;
> 
> -		mbwu_state = &ris->mbwu_state[ctx->mon];
>  		if (!mbwu_state)
>  			break;
> 
> @@ -1245,6 +1257,37 @@ int mpam_msmon_read(struct mpam_component
> *comp, struct mon_cfg *ctx,
>  	return err;
>  }
> 
> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct
> mon_cfg
> +*ctx) {
> +	int idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	if (!mpam_is_enabled())
> +		return;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		if (!mpam_has_feature(mpam_feat_msmon_mbwu,
> &vmsc->props))
> +			continue;
> +
> +		msc = vmsc->msc;
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			if (!mpam_has_feature(mpam_feat_msmon_mbwu,
> &ris->props))
> +				continue;
> +
> +			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
> +				continue;
> +
> +			ris->mbwu_state[ctx->mon].correction = 0;
> +			ris->mbwu_state[ctx->mon].reset_on_next_read =
> true;
> +			mpam_mon_sel_unlock(msc);
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
>  static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16
> wd)  {
>  	u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h
> b/drivers/resctrl/mpam_internal.h index c190826dfbda..7cbcafe8294a 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -223,10 +223,12 @@ struct mon_cfg {
> 
>  /*
>   * Changes to enabled and cfg are protected by the msc->lock.
> - * Changes to prev_val and correction are protected by the msc's
> mon_sel_lock.
> + * Changes to reset_on_next_read, prev_val and correction are protected
> + by the
> + * msc's mon_sel_lock.
>   */
>  struct msmon_mbwu_state {
>  	bool		enabled;
> +	bool		reset_on_next_read;
>  	struct mon_cfg	cfg;
> 
>  	/* The value last read from the hardware. Used to detect overflow. */
> @@ -393,6 +395,7 @@ int mpam_apply_config(struct mpam_component
> *comp, u16 partid,
> 
>  int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg
> *ctx,
>  		    enum mpam_device_features, u64 *val);
> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct
> mon_cfg
> +*ctx);
> 
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
> cache_level,
>  				   cpumask_t *affinity);
> --
> 2.39.5



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-11 10:43   ` Jonathan Cameron
  2025-09-11 10:48     ` Jonathan Cameron
@ 2025-09-19 16:10     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 11:43, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:41 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> The ACPI MPAM table uses the UID of a processor container specified in
>> the PPTT to indicate the subset of CPUs and cache topology that can
>> access each MPAM System Component (MSC).
>>
>> This information is not directly useful to the kernel. The equivalent
>> cpumask is needed instead.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
>>
>> CC: Dave Martin <dave.martin@arm.com>
>> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Hi James,
> 
> Sorry I missed v1.  Busy few weeks.
> 
> I think one resource leak plus a few suggested changes that
> I'm not that bothered about.



>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 54676e3d82dd..1728545d90b2 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -817,3 +817,86 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>>  					  ACPI_PPTT_ACPI_IDENTICAL);
>>  }
> 
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
>> + *                                       processor container
>> + * @acpi_cpu_id:	The UID of the processor container.
>> + * @cpus:		The resulting CPU mask.
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
>> + * Container, they may exist purely to describe a Private resource. CPUs
>> + * have to be leaves, so a Processor Container is a non-leaf that has the
>> + * 'ACPI Processor ID valid' flag set.
>> + *
>> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
>> + */
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>> +{
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table_hdr;
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	u32 proc_sz;
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	table_hdr = acpi_get_pptt();
> 
> This calls acpi_get_table() so you need to put it again or every call
> to this leaks a reference count.  I messed around with DEFINE_FREE() for this
> but it doesn't fit that well as the underlying call doesn't return the table.
> This one does though so you could do a pptt specific one.  
> 
> Or just acpi_put_table(table_hdr); at exit path from this function.

This is a strange one - you spotted it on your follow up email.


>> +	if (!table_hdr)
>> +		return;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> +			     sizeof(struct acpi_table_pptt));

> Hmm. Not related to this patch but I have no idea why acpi_get_pptt()
> doesn't return a struct acpi_table_pptt as if it did this would be a simple
> + 1 and not require those who only sometimes deal with ACPI code to go
> check what that macro actually does!

Looks like that would have a knock on effect for the types in:
| topology_get_acpi_cpu_tag()
| acpi_find_processor_node()
| cache_setup_acpi_cpu()
| fetch_pptt_node()

... basically everywhere this file uses struct acpi_table_header...

It's simpler to cast it here if you're keen on the '+1' approach, but this pattern is how
this file does this. This is the same as acpi_pptt_leaf_node().


>> +	proc_sz = sizeof(struct acpi_pptt_processor);


> Maybe sizeof (*cpu_node) is more helpful to reader.

(again, same as acpi_pptt_leaf_node())


> Also shorter so you could do
> 	while ((unsigned long)entry + sizeof(*cpu_node) <= table_end)
> 
>> +	while ((unsigned long)entry + proc_sz <= table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
> 
> For me, assigning this before checking the type is inelegant.

I agree, but when in pptt.c ...


> but the nesting does get deep without it so I guess this is ok maybe, though
> I wonder if better reorganized to combine a different bunch of conditions.
> I think this is functionally identival.
> 
> 		if (entry->type == ACPI_PTT_TYPE_PROCESSOR) {
> 			struct acpi_pptt_processor *cpu_node = 
> 				(struct acpi_pptt_processor *)entry;
> 			if ((cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) &&
> 			    (!acpi_pptt_leaf_node(table_hdr, cpu_node) &&
> 			    (cpu_node->acpi_processor_id == acpi_cpu_id)) {
> 				acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> 				break;
> 		
> 			}
> 		}
> 		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> 				     entry->length);

It is the same - I think this is better as it reduces the scope of cpu_node.


> More generally I wonder if it is worth adding a for_each_acpi_pptt_entry() macro.
> There is some precedence in drivers acpi such as for_each_nhlt_endpoint()

When this series had support for ACPI PPI-Partitions there were more PPTT helpers and I
had a function pointer based call_on_every_entry() kind of thing. Jeremy L though it
obscured the flow. A custom for_each_ is probably better.

I may float it as an RFC after these are done. I don't want to make this series any
bigger. It certainly makes the users easier on the eye.
| drivers/acpi/pptt.c | 43 +++++++++++++++----------------------------
| 1 file changed, 15 insertions(+), 28 deletions(-)

Stashed here for posterity:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=pptt/for_each_pptt_entry/v0&id=353ceeba3d39c6b6a10eeb1a59c49649cdf719d8


> That's probably material for another day though unless you think it brings
> enough benefits to do it here.
> 
> 
>> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
>> +			if (!acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +				if (cpu_node->acpi_processor_id == acpi_cpu_id) {
>> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
>> +					break;
>> +				}
>> +			}
>> +		}
>> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>> +				     entry->length);
>> +	}
>> +}
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index 1c5bb1e887cd..f97a9ff678cc 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
>>  int find_acpi_cpu_topology_cluster(unsigned int cpu);
>>  int find_acpi_cpu_topology_package(unsigned int cpu);
>>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>>  #else
>>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>>  {
>> @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>  {
>>  	return -EINVAL;
>>  }
>> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>> +						     cpumask_t *cpus) { }
>>  #endif
>>  
>>  void acpi_arch_init(void);
> 



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-11 10:46   ` Jonathan Cameron
@ 2025-09-19 16:10     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 11:46, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:42 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> In acpi_count_levels(), the initial value of *levels passed by the
>> caller is really an implementation detail of acpi_count_levels(), so it
>> is unreasonable to expect the callers of this function to know what to
>> pass in for this parameter.  The only sensible initial value is 0,
>> which is what the only upstream caller (acpi_get_cache_info()) passes.
>>
>> Use a local variable for the starting cache level in acpi_count_levels(),
>> and pass the result back to the caller via the function return value.
>>
>> Gid rid of the levels parameter, which has no remaining purpose.
>>
>> Fix acpi_get_cache_info() to match.
>>
>> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-11 14:08   ` Ben Horgan
@ 2025-09-19 16:10     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:10 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 11/09/2025 15:08, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> In acpi_count_levels(), the initial value of *levels passed by the
>> caller is really an implementation detail of acpi_count_levels(), so it
>> is unreasonable to expect the callers of this function to know what to
>> pass in for this parameter.  The only sensible initial value is 0,
>> which is what the only upstream caller (acpi_get_cache_info()) passes.
>>
>> Use a local variable for the starting cache level in acpi_count_levels(),
>> and pass the result back to the caller via the function return value.
>>
>> Gid rid of the levels parameter, which has no remaining purpose.
> 
> Nit: s/Gid/Get/

Oops,


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-11 10:59   ` Jonathan Cameron
@ 2025-09-19 16:10     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 11:59, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:43 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.

>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 7af7d62597df..c5f2a51d280b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>>  				     entry->length);
>>  	}
>>  }
>> +
>> +/*
> /**
> 
> It's an exposed interface so nice to have formal kernel-doc and automatic
> checks that brings.
> 
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.

> Silly question but why do we care if this a unified cache?

/me returns from the time-machine trip....

This is legacy, but results in parity with the DT approach.
Really early versions of this generated an ID based on the associated CPUs - like DT
does/would-do today. This value isn't unique for non-unified caches as they have the same
set of CPUs below them, so every use of cache-id used to have to check it was a unified cache.
This isn't a problem for MPAM as there is never likely to be an L1 MSC, but you're right-
it hinders re-use of this.

Since ACPI then went and added the ID to the PPTT, we don't need to check this here.


> It's a bit odd to have a general sounding function fail for split caches.
> The handling would have to be more complex but if we really don't want
> to do it maybe rename the function to find_acpi_unifiedcache_level_from_id()
> and if the general version gets added later we can switch to that.

I'll add the extra work - this avoids the call to acpi_count_levels() which was annoying
Dave as its another walk of the whole table. (I didn't twig in that conversation that the
unified check may no longer be necessary)


>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
>> + * the cache cannot be found.
>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_table_header *table;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table = acpi_get_pptt();
>> +	if (!table)
>> +		return -ENOENT;
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return -ENOENT;
>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> +		/* Start at 1 for L1 */
>> +		for (level = 1; level <= num_levels; level++) {
>> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> +						     level, &cpu_node);
>> +			if (!cache)
>> +				continue;
>> +
>> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> +						cache,
>> +						sizeof(struct acpi_pptt_cache));
> 
> sizeof(*cache) to me makes this more obvious.

Would be the only instance of this in the file - but I agree its more readable, and
results in fewer line breaks, which also helps.


>> +
>> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> +			    cache_v1->cache_id == cache_id)
>> +				return level;
>> +		}
>> +	}
>> +
>> +	return -ENOENT;
>> +}


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-11 15:27   ` Lorenzo Pieralisi
@ 2025-09-19 16:10     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:10 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Lorenzo,

On 11/09/2025 16:27, Lorenzo Pieralisi wrote:
> On Wed, Sep 10, 2025 at 08:42:43PM +0000, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.

>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 7af7d62597df..c5f2a51d280b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>>  				     entry->length);
>>  	}
>>  }
>> +
>> +/*
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later,
> 
> * The PPTT table must be rev 3 or later.

Fixed,


>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
>> + * the cache cannot be found.
>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_table_header *table;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table = acpi_get_pptt();
>> +	if (!table)
>> +		return -ENOENT;
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return -ENOENT;
> 
> Same comment as in another patch - don't think you want to stop parsing
> here.

Yes - this is me throwing my hands up in the air at a nonsense firmware table.
'continue' would be a try harder approach.


>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> +		/* Start at 1 for L1 */
>> +		for (level = 1; level <= num_levels; level++) {
>> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> +						     level, &cpu_node);
>> +			if (!cache)
>> +				continue;
>> +
>> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> +						cache,
>> +						sizeof(struct acpi_pptt_cache));
>> +
>> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> +			    cache_v1->cache_id == cache_id)
>> +				return level;
>> +		}
>> +	}
>> +
>> +	return -ENOENT;
>> +}
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..5bdca5546697 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -1565,6 +1566,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>  }
>>  static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>>  						     cpumask_t *cpus) { }
>> +static inline int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	return -EINVAL;

> return -ENOENT;

Yes, same bug as before. Fixed.


> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

Thanks - I think the change to ignoring unified caches and searching all types is
substantive enough that I won't pick this up because of the changes for v3...



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-09-11 11:06   ` Jonathan Cameron
@ 2025-09-19 16:10     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 12:06, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:44 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>>
>> The driver needs to know which CPUs are associated with the cache.
>> The CPUs may not all be online, so cacheinfo does not have the
>> information.
>>
>> Add a helper to pull this information out of the PPTT.

> Why for this case does it makes sense to not just use acpi_get_pptt()?
> 
> Also you don't introduce the acpi_get_table_reg() helper until patch 6.

I missed fixing this one up. That's done now.


>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index c5f2a51d280b..c379a9952b00 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -966,3 +966,62 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>>  
>>  	return -ENOENT;
>>  }
>> +
>> +/**
>> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
>> + *					   specified cache
>> + * @cache_id: The id field of the unified cache
> 
> Similar comment to previous patch. If we are going to make this unified only
> can we reflect that in the function name.  I worry this will get reused
> and that restriction will surprise.

I agree - the unified restriction turns out only to be of interest to archaeologists.
I've ripped it out.



>> + * @cpus: Where to build the cpumask
>> + *
>> + * Determine which CPUs are below this cache in the PPTT. This allows the property
>> + * to be found even if the CPUs are offline.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
>> + */
>> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	if (IS_ERR(table))
>> +		return -ENOENT;
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (WARN_ON_ONCE(!cpu_node))
>> +			continue;

I'm not sure why this one is a WARN_ON_ONCE() and the other isn't - both mean the PPTT
table is missing CPUs, but this looks like leftover debug. I'll drop it.


>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> +		/* Start at 1 for L1 */
>> +		for (level = 1; level <= num_levels; level++) {
>> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> +						     level, &cpu_node);
>> +			if (!cache)
>> +				continue;
>> +
>> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> +						cache,
>> +						sizeof(struct acpi_pptt_cache));
> 
> sizeof(*cache) makes more sense to me.

Yup, I've done that in the previous one It's not otherwise done in this file - lets see if
someone cares strongly the other way.


>> +
>> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> +			    cache_v1->cache_id == cache_id)
>> +				cpumask_set_cpu(cpu, cpus);
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}

Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-11 13:17   ` Jonathan Cameron
@ 2025-09-19 16:11     ` James Morse
  2025-09-26 14:48       ` Jonathan Cameron
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-19 16:11 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 14:17, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:46 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
>> the MPAM driver with optional discovered data.

> A few comments inline.  Note I've more or less completely forgotten
> what was discussed in RFC 1 so I might well be repeating stuff that
> you replied to then.  Always a problem for me with big complex patch sets!

No worries - I'll have forgotten too. If we can't work it out, it should have had a comment.


>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
>> new file mode 100644
>> index 000000000000..fd9cfa143676
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c
> 
> 
>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>> +				    struct acpi_mpam_resource_node *res)
>> +{
>> +	int level, nid;
>> +	u32 cache_id;
>> +
>> +	switch (res->locator_type) {
>> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>> +		cache_id = res->locator.cache_locator.cache_reference;
>> +		level = find_acpi_cache_level_from_id(cache_id);
>> +		if (level <= 0) {
>> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
>> +			return -EINVAL;
>> +		}
>> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>> +				       level, cache_id);
>> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>> +		if (nid == NUMA_NO_NODE)
>> +			nid = 0;
>> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +				       255, nid);
>> +	default:
>> +		/* These get discovered later and treated as unknown */
> 
> are treated?

Sure,


>> +		return 0;
>> +	}
>> +}
> 
>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>> +				     struct platform_device *pdev,
>> +				     u32 *acpi_id)
>> +{
>> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
>> +	bool acpi_id_valid = false;
>> +	struct acpi_device *buddy;
>> +	char uid[11];
>> +	int err;
>> +
>> +	memset(&hid, 0, sizeof(hid));
> 
>  = {}; above and skip the memset.

Sure,


>> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
>> +	       sizeof(tbl_msc->hardware_id_linked_device));
>> +
>> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>> +		*acpi_id = tbl_msc->instance_id_linked_device;
>> +		acpi_id_valid = true;
>> +	}
>> +
>> +	err = snprintf(uid, sizeof(uid), "%u",
>> +		       tbl_msc->instance_id_linked_device);
>> +	if (err >= sizeof(uid)) {
>> +		pr_debug("Failed to convert uid of device for power management.");
>> +		return acpi_id_valid;
>> +	}
>> +
>> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
>> +	if (buddy)
>> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
>> +
>> +	return acpi_id_valid;
>> +}
> 
>> +
>> +static int __init acpi_mpam_parse(void)
>> +{
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +	char *table_end, *table_offset = (char *)(table + 1);
>> +	struct property_entry props[4]; /* needs a sentinel */

> Perhaps move this and res into the loop and use = {};

>> +	struct acpi_mpam_msc_node *tbl_msc;
>> +	int next_res, next_prop, err = 0;
>> +	struct acpi_device *companion;
>> +	struct platform_device *pdev;
>> +	enum mpam_msc_iface iface;
>> +	struct resource res[3];
> 
> Add a comment here or a check later on why this is large enough.

These two are now in the loop and look like this:
| 		/* pcc, nrdy, affinity and a sentinel */
| 		struct property_entry props[4] = { 0 };
| 		/* mmio, 2xirq, no sentinel. */
|		struct resource res[3] = { 0 };


>> +	char uid[16];
>> +	u32 acpi_id;
>> +
>> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
>> +		return 0;
>> +
>> +	if (table->revision < 1)
>> +		return 0;
>> +
>> +	table_end = (char *)table + table->length;
>> +
>> +	while (table_offset < table_end) {
>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +		table_offset += tbl_msc->length;
>> +
>> +		if (table_offset > table_end) {
>> +			pr_debug("MSC entry overlaps end of ACPI table\n");
>> +			break;

> That this isn't considered an error is a bit subtle and made me wonder
> if there was a use of uninitialized pdev (there isn't because err == 0)

Its somewhat a philosophical arguement. I don't expect the kernel to have to validate
these tables, they're not provided by the user and there quickly becomes a point where
you have to trust them, and they have to be correct.
At the other extreme is the asusmption the table is line-noise and we should check
everything to avoid out of bounds accesses. Dave wanted the diagnostic messages on these.

As this is called from an initcall, the best you get is an inexplicable print message.
(what should we say - update your firmware?)


Silently failing in this code is always safe as the driver has a count of the number of
MSC, and doesn't start accessing the hardware until its found them all.
(this is because to find the system wide minimum value - and its not worth starting if
 its not possible to finish).


> Why not return here?

Just because there was no other return in the loop, and I hate surprise returns.

I'll change it if it avoids thinking about how that platform_device_put() gets skipped!


> 
>> +		}
>> +
>> +		/*
>> +		 * If any of the reserved fields are set, make no attempt to
>> +		 * parse the MSC structure. This MSC will still be counted,
>> +		 * meaning the MPAM driver can't probe against all MSC, and
>> +		 * will never be enabled. There is no way to enable it safely,
>> +		 * because we cannot determine safe system-wide partid and pmg
>> +		 * ranges in this situation.
>> +		 */

> This is decidedly paranoid. I'd normally expect the architecture to be based
> on assumption that is fine for old software to ignore new fields.  ACPI itself
> has fairly firm rules on this (though it goes wrong sometimes :)

Yeah - the MPAM table isn't properly structured as subtables. I don't see how they are
going to extend it if they need to.

The paranoia is that anything set in these reserved fields probably indicates something
the driver needs to know about: a case in point is the way PCC was added.

I'd much prefer we skip creation of MSC devices that have properties we don't understand.
acpi_mpam_count_msc() still counts them - which means the driver never finds all the MSC,
and never touches the hardware.

MPAM isn't a critical feature, its better that it be disabled than make things worse.
(the same attitude holds with the response to the MPAM error interrupt - reset everything
 and pack up shop. This is bettern than accidentally combining important/unimportant
 tasks)


> I'm guessing there is something out there that made this necessary though so
> keep it if you actually need it.

It's a paranoid/violent reaction to the way PCC was added - without something like this,
that would have led to the OS trying to map the 0 page and poking around in it - never
likely to go well.

Doing this does let them pull another PCC without stable kernels going wrong.
Ultimately I think they'll need to replace the table with one that is properly structured.
For now - this is working with what we have.


>> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
>> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
>> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
>> +			continue;
>> +		}
>> +
>> +		if (!tbl_msc->mmio_size) {
>> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
>> +			continue;
>> +		}
>> +
>> +		if (decode_interface_type(tbl_msc, &iface)) {
>> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
>> +			continue;
>> +		}
>> +
>> +		next_res = 0;
>> +		next_prop = 0;
>> +		memset(res, 0, sizeof(res));
>> +		memset(props, 0, sizeof(props));
>> +
>> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
> 
> https://lore.kernel.org/all/20241009124120.1124-13-shiju.jose@huawei.com/
> was a proposal to add a DEFINE_FREE() to clean these up.  Might be worth a revisit.
> Then Greg was against the use it was put to and asking for an example of where
> it helped.  Maybe this is that example.
> 
> If you do want to do that, I'd factor out a bunch of the stuff here as a helper
> so we can have the clean ownership pass of a return_ptr().  
> Similar to what Shiju did here (this is the usecase for platform device that
> Greg didn't like).
> https://lore.kernel.org/all/20241009124120.1124-14-shiju.jose@huawei.com/
> 
> Even without that I think factoring some of this out and hence being able to
> do returns on errors and put the if (err) into the loop would be a nice
> improvement to readability.

If you think its more readable I'll structure it like that.


>> +		if (!pdev) {
>> +			err = -ENOMEM;
>> +			break;
>> +		}
>> +
>> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
>> +			err = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		/* Some power management is described in the namespace: */
>> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
>> +		if (err > 0 && err < sizeof(uid)) {
>> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
>> +			if (companion)
>> +				ACPI_COMPANION_SET(&pdev->dev, companion);
>> +			else
>> +				pr_debug("MSC.%u: missing namespace entry\n", tbl_msc->identifier);
>> +		}
>> +
>> +		if (iface == MPAM_IFACE_MMIO) {
>> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
>> +							       tbl_msc->mmio_size,
>> +							       "MPAM:MSC");
>> +		} else if (iface == MPAM_IFACE_PCC) {
>> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
>> +								tbl_msc->base_address);
>> +			next_prop++;
> 
> Why the double increment? Needs a comment if that is right thing to do.

That's a bug.
I'll add some WARN_ON() when these are consumed as per your earlier suggestion.


>> +		}
>> +
>> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
>> +		err = platform_device_add_resources(pdev, res, next_res);
>> +		if (err)
>> +			break;
>> +
>> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
>> +							tbl_msc->max_nrdy_usec);
>> +
>> +		/*
>> +		 * The MSC's CPU affinity is described via its linked power
>> +		 * management device, but only if it points at a Processor or
>> +		 * Processor Container.
>> +		 */
>> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
>> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
>> +								acpi_id);
>> +		}
>> +
>> +		err = device_create_managed_software_node(&pdev->dev, props,
>> +							  NULL);
>> +		if (err)
>> +			break;
>> +
>> +		/* Come back later if you want the RIS too */
>> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
>> +		if (err)
>> +			break;
>> +
>> +		err = platform_device_add(pdev);
>> +		if (err)
>> +			break;
>> +	}
>> +
>> +	if (err)
>> +		platform_device_put(pdev);
>> +
>> +	return err;
>> +}
>> +
>> +int acpi_mpam_count_msc(void)
>> +{
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +	char *table_end, *table_offset = (char *)(table + 1);
>> +	struct acpi_mpam_msc_node *tbl_msc;
>> +	int count = 0;
>> +
>> +	if (IS_ERR(table))
>> +		return 0;
>> +
>> +	if (table->revision < 1)
>> +		return 0;
>> +
>> +	table_end = (char *)table + table->length;

> Trivial so feel free to ignore.
> Perhaps should aim for consistency.  Whilst I prefer pointers for this stuff
> PPTT did use unsigned longs.

I prefer the pointers, and as it's a separate file, I don't think it needs to be
concistent with PPTT.


>> +
>> +	while (table_offset < table_end) {
>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +		if (!tbl_msc->mmio_size)
>> +			continue;
>> +
>> +		if (tbl_msc->length < sizeof(*tbl_msc))
>> +			return -EINVAL;
>> +		if (tbl_msc->length > table_end - table_offset)
>> +			return -EINVAL;
>> +		table_offset += tbl_msc->length;
>> +
>> +		count++;
>> +	}
>> +
>> +	return count;
>> +}
>> +

> Could reorder to put acpi_mpam_parse and this use of it together?

mpam_msc_driver_init() calls this from subsys_initcall() to know whether its worth
registering the driver at all. Even with that fixed, its still potentially racy: once the
first MSC has been platform_device_add()ed, I figure the driver can probe against that,
and needs to know if this first MSC was the last one.
acpi_mpam_count_msc() needs to be safe to race with acpi_mpam_parse().

This could be forced far enough away in time by only registering the driver after
subsys_initcall_sync() has completed - but the list of dependencies on those is ugly
enough as it is.

I'll add a comment:
/**
 * acpi_mpam_count_msc() - Count the number of MSC described by firmware.
 *
 * Returns the number of of MSC, or zero for an error.
 *
 * This can be called before or in parallel with acpi_mpam_parse().
 */


>> +/*
>> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
>> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
>> + * initialised from postcore_initcall().
>> + */
>> +subsys_initcall_sync(acpi_mpam_parse);

>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index c5fd92cda487..af449964426b 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))

> I'd use if (!IS_ERR_OR_NULL(_T)) not because it is functionally necessary but
> because it will let the compiler optimize this out if it can tell that in a given
> path _T is NULL (I think it was Peter Z who pointed this out in a similar interface
> a while back).

Makes sense,

> I'd like an opinion from Rafael on this in general.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready
  2025-09-16 13:17   ` [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready Zeng Heng
@ 2025-09-19 16:11     ` James Morse
  2025-09-20 10:14       ` Zeng Heng
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-19 16:11 UTC (permalink / raw)
  To: Zeng Heng
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Zeng,

On 16/09/2025 14:17, Zeng Heng wrote:
> After updating the monitor configuration, the first read of the monitoring
> result requires waiting for the "not ready" duration before an effective
> value can be obtained.

May need to wait - some platforms need to do this, some don't.
Yours is the first I've heard of that does this!


> Because a component consists of multiple MPAM instances, after updating the
> configuration of each instance, should wait for the "not ready" period of
> per single instance before the valid monitoring value can be obtained, not
> just wait for once interval per component.

I'm really trying to avoid that ... if you have ~200 MSC pretending to be one thing, you'd
wait 200x the maximum period. On systems with CMN, the number of MSC scales with the
number of CPUs, so 200x isn't totally crazy.

I think the real problem here is the driver doesn't go on to reconfigure MSC-2 if MSC-1
returned not-ready, meaning the "I'll only wait once" logic kicks in and returns not-ready
to the user. (which is presumably what you're seeing?)

Does this solve your problem?:
-----------------%<-----------------
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 404bd4c1fd5e..2f39d0339349 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1395,7 +1395,7 @@ static void __ris_msmon_read(void *arg)

 static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
 {
-       int err, idx;
+       int err, any_err = 0, idx;
        struct mpam_msc *msc;
        struct mpam_vmsc *vmsc;
        struct mpam_msc_ris *ris;
@@ -1412,15 +1412,19 @@ static int _msmon_read(struct mpam_component *comp, stru
ct mon_read *arg)
                                                    true);
                        if (!err && arg->err)
                                err = arg->err;
+
+                       /*
+                        * Save one error to be returned to the caller, but
+                        * keep reading counters so that the get reprogrammed.
+                        * On platforms with NRDY this lets us wait once.
+                        */
                        if (err)
-                               break;
+                               any_err = err;
                }
-               if (err)
-                       break;
        }
        srcu_read_unlock(&mpam_srcu, idx);

-       return err;
+       return any_err;
 }

 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
-----------------%<-----------------


Thanks,

James


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-11 14:56   ` Lorenzo Pieralisi
@ 2025-09-19 16:11     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-19 16:11 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Lorenzo,

On 11/09/2025 15:56, Lorenzo Pieralisi wrote:
> On Wed, Sep 10, 2025 at 08:42:46PM +0000, James Morse wrote:
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
>> the MPAM driver with optional discovered data.

>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
>> new file mode 100644
>> index 000000000000..fd9cfa143676
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c
>> @@ -0,0 +1,361 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
>> +
>> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/arm_mpam.h>
>> +#include <linux/bits.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/platform_device.h>
>> +
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
>> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
>> + */
>> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
>> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
>> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)

> Nit: For consistency, not sure why MODE/TYPE_MASK and
> AFFINITY_{TYPE/VALID} aren't but that's not a big deal.

The missing word is giving me trouble working out what this should be...

There used to be definitions for MODE_LEVEL and MODE_EDGE, hence the MODE_MASK to extract
the bit - but Jonathan pointed out the polarity values were pretty standard, and ocul be
fed to acpi_register_gsi() directly without mapping 0->0 and 1->1.


> BIT(3) to be consistent with the table should be (?)
> 
> #define ACPI_MPAM_MSC_AFFINITY_TYPE_MASK	BIT(3)
> 
> to match the Table 5 (and then add defines for possible values).

Fixed as:
| #define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
| #define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
| #define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1


>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
> 
> ACPI_MPAM_MSC_AFFINITY_VALID_MASK ? (or remove _MASK from IRQ_{MODE/TYPE) ?
> 
> Just noticed - feel free to ignore this altogether.

I think the _MASK stuff encourages use of FIELD_GET(), which in turn encourages
local variables with meaningful names.

But I don't think I need a mask and values defined for the valid bit - which is readable
enough already, or the mode as the bits are passed through directly.


> 
>> +static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
>> +				   u32 flags, int *irq,
>> +				   u32 processor_container_uid)
>> +{
>> +	int sense;
>> +
>> +	if (!intid)
>> +		return false;
>> +
>> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
>> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
>> +		return false;
>> +
>> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
>> +
>> +	if (16 <= intid && intid < 32 && processor_container_uid != GLOBAL_AFFINITY) {
>> +		pr_err_once("Partitioned interrupts not supported\n");
>> +		return false;
>> +	}
> 
> Please add a comment to explain what you mean here (ie PPIs partitioning
> isn't supported).

The error message isn't enough?


> I will have to change this code anyway to cater for the  GICv5 interrupt model given the
> hardcoded intid values.

Ah. I can probably detect PPIs from the table description instead. It's done like this
just because this was where the handling logic used to be, but if that's a nuisance for
the GICv5 handling, I'll do it purely from the table.

I have the code to support ppi-paritions, but its not worth it unless someone builds this.


> Is the condition allowed by the MPAM architecture so the MPAM table are
> legitimate (but not supported in Linux) ?

Yeah, the architecture says the interrupt can be PPI. Global PPI are supported, but not
partitions on ACPI platforms. Because its valid/easy for DT, its hard to say no-one will
ever do this with ACPI. Describing the affinity in the table lets the OS decide whether to
support it.

This ends up as as helper:
| static bool _is_ppi_partition(u32 flags)
| {
| 	u32 aff_type, is_ppi;
| 	bool ret;
|
| 	is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID, flags);
| 	if (!is_ppi)
| 		return false;
|
| 	aff_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
| 	ret = (aff_type == ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
| 	if (ret)
| 		pr_err_once("Partitioned interrupts not supported\n");
|
| 	return ret;
| }

and a call in acpi_mpam_register_irq() to return false for ppi-partitions.
This also allows some simplification in acpi_mpam_parse_irqs().


>> +
>> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
>> +	if (*irq <= 0) {
>> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
>> +			    intid);
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}

>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>> +				     struct platform_device *pdev,
>> +				     u32 *acpi_id)
>> +{
>> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
>> +	bool acpi_id_valid = false;
>> +	struct acpi_device *buddy;
>> +	char uid[11];
>> +	int err;
>> +
>> +	memset(&hid, 0, sizeof(hid));
> 
> Jonathan already commented on this.

Yup, its gone.


>> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
>> +	       sizeof(tbl_msc->hardware_id_linked_device));
>> +
>> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>> +		*acpi_id = tbl_msc->instance_id_linked_device;
>> +		acpi_id_valid = true;
>> +	}
>> +
>> +	err = snprintf(uid, sizeof(uid), "%u",
>> +		       tbl_msc->instance_id_linked_device);
>> +	if (err >= sizeof(uid)) {
>> +		pr_debug("Failed to convert uid of device for power management.");
>> +		return acpi_id_valid;
>> +	}
>> +
>> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
>> +	if (buddy)
>> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);

> Is !buddy a FW error to be logged ?

I'm pretty sure that field is optional, its for platform specific power management.
The spec has "This field must be set to zero if there is no linked device for this MSC".
This is where I expect a search for "\0\0\0\0" to fail, hence its silent.

I think this thing is for the power management of the parent device, whereas the MSC
namespace object is for the power management of the MSC itself. I've no idea why they need
to be separate...


>> +
>> +	return acpi_id_valid;
>> +}
>> +
>> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
>> +				 enum mpam_msc_iface *iface)
>> +{
>> +	switch (tbl_msc->interface_type) {
>> +	case 0:
>> +		*iface = MPAM_IFACE_MMIO;
>> +		return 0;
>> +	case 0xa:
> 
> Worth giving those constants 0x0,0xa a name ?

Sure, added earlier in the file:
| /*
|  * Encodings for the MSC node body interface type field.
|  * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
|  */
| #define ACPI_MPAM_MSC_IFACE_MMIO   0x00
| #define ACPI_MPAM_MSC_IFACE_PCC    0x0a


> 
>> +		*iface = MPAM_IFACE_PCC;
>> +		return 0;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +static int __init acpi_mpam_parse(void)
>> +{
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +	char *table_end, *table_offset = (char *)(table + 1);
>> +	struct property_entry props[4]; /* needs a sentinel */
>> +	struct acpi_mpam_msc_node *tbl_msc;
>> +	int next_res, next_prop, err = 0;
>> +	struct acpi_device *companion;
>> +	struct platform_device *pdev;
>> +	enum mpam_msc_iface iface;
>> +	struct resource res[3];
>> +	char uid[16];
>> +	u32 acpi_id;
>> +
>> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
>> +		return 0;
>> +
>> +	if (table->revision < 1)
>> +		return 0;
>> +
>> +	table_end = (char *)table + table->length;
>> +
>> +	while (table_offset < table_end) {
>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +		table_offset += tbl_msc->length;
>> +
>> +		if (table_offset > table_end) {
>> +			pr_debug("MSC entry overlaps end of ACPI table\n");
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * If any of the reserved fields are set, make no attempt to
>> +		 * parse the MSC structure. This MSC will still be counted,
>> +		 * meaning the MPAM driver can't probe against all MSC, and
>> +		 * will never be enabled. There is no way to enable it safely,
>> +		 * because we cannot determine safe system-wide partid and pmg
>> +		 * ranges in this situation.
>> +		 */
>> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
>> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
>> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
>> +			continue;
>> +		}


> This is a bit obscure - the comment too requires some explanation
> ("This MSC will still be counted", not very clear what that means).

I'll expand it - counted by acpi_mpam_count_msc(). It's trying to explain why its
perfectly safe to just skip MSC here - that prevents the driver from ever touching the
hardware.


>> +
>> +		if (!tbl_msc->mmio_size) {
>> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
>> +			continue;
>> +		}
>> +
>> +		if (decode_interface_type(tbl_msc, &iface)) {
>> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
>> +			continue;
>> +		}
>> +
>> +		next_res = 0;
>> +		next_prop = 0;
>> +		memset(res, 0, sizeof(res));
>> +		memset(props, 0, sizeof(props));
>> +
>> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
>> +		if (!pdev) {
>> +			err = -ENOMEM;
>> +			break;
>> +		}
>> +
>> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
>> +			err = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		/* Some power management is described in the namespace: */
>> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
>> +		if (err > 0 && err < sizeof(uid)) {
>> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
>> +			if (companion)
>> +				ACPI_COMPANION_SET(&pdev->dev, companion);
>> +			else
>> +				pr_debug("MSC.%u: missing namespace entry\n", tbl_msc->identifier);
> 
> Here you are linking the platform device to a namespace companion.
> That's what will make sure that a) the ACPI namespace scan won't add an
> additional platform device for ARMHAA5C and b) MSIs works -> through
> the related IORT named component.
> 
> Correct ?

I don't think anyone would put an MSC behind an IOMMU, so hopefully this is not needed for
MSI. I've not touched the MSI support yet.

2.6 of the spec says this is for power management ... I've hooked it up here because its
there. I assumed the power-management core needed to know about it. It's the same device,
hence calling it the companion, (which is how the spec people referred to it verbally).

I do see a the namespace platform device get created, is that not supposed to happen?
I'm testing this with the FVP, so don't have a way of actually triggering a power
management flow.


>> +		}
>> +
>> +		if (iface == MPAM_IFACE_MMIO) {
>> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
>> +							       tbl_msc->mmio_size,
>> +							       "MPAM:MSC");
>> +		} else if (iface == MPAM_IFACE_PCC) {
>> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
>> +								tbl_msc->base_address);
>> +			next_prop++;
>> +		}
>> +
>> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> 
> Do we _really_ have to resolve IRQs here or we can postpone them at
> driver probe time like RIS resources (if I understand correctly how
> it is done - by copying table data into platform data) ?

Simply because that is the order that DT does it too. We'd probably get away with it, but
the interrupts are a property of the MSC, not the description of the resource it controls...
I'm not sure what doing this would buy us.


> GICv5 hat in mind - good as it is for GICv3.

What changes here for GICv5? I certainly want it to work with whatever irqchip is in use.


>> +		err = platform_device_add_resources(pdev, res, next_res);
>> +		if (err)
>> +			break;
>> +
>> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
>> +							tbl_msc->max_nrdy_usec);
>> +
>> +		/*
>> +		 * The MSC's CPU affinity is described via its linked power
>> +		 * management device, but only if it points at a Processor or
>> +		 * Processor Container.
>> +		 */
>> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
>> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
>> +								acpi_id);
>> +		}
>> +
>> +		err = device_create_managed_software_node(&pdev->dev, props,
>> +							  NULL);
>> +		if (err)
>> +			break;
>> +
>> +		/* Come back later if you want the RIS too */
> 
> I read this as: copy table data to the device so that RIS resources can
> be parsed later.
> 
> Right ? I think it is worth updating the comment to clarify it.

Sure,
	/*
	 * Stash the table entry for acpi_mpam_parse_resources() to discover
	 * what this MSC controls.
	 */


>> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
>> +		if (err)
>> +			break;
>> +
>> +		err = platform_device_add(pdev);
>> +		if (err)
>> +			break;
>> +	}
>> +
>> +	if (err)
>> +		platform_device_put(pdev);
> 
> I won't comment on the clean-up here as Jonathan did it already.
> 
>> +
>> +	return err;
>> +}

>> +
>> +/*
>> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
>> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
>> + * initialised from postcore_initcall().

> I think we will end up setting in stone init ordering for these
> components created out of static tables (I mean sequencing them in a
> centralized way) but if that works for the current driver that's fine
> for the time being.

That'd be better - I've noted the dependencies on each one of these.



>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index c5fd92cda487..af449964426b 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> 
> Jonathan commented on this already - worth getting Rafael's opinion,
> I am fine either way.
> 
> I have not found anything that should block this code (other than code
> that I know I will have to rework when GICv5 ACPI support gets in) so:

I've factored that out.


> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

From your comments I assume you're happy with Jonathan's suggested changes too - which
I've picked up.


Thanks!

James



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready
  2025-09-19 16:11     ` James Morse
@ 2025-09-20 10:14       ` Zeng Heng
  0 siblings, 0 replies; 200+ messages in thread
From: Zeng Heng @ 2025-09-20 10:14 UTC (permalink / raw)
  To: James Morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 2025/9/20 0:11, James Morse wrote:
> Hi Zeng,
> 
> On 16/09/2025 14:17, Zeng Heng wrote:
>> After updating the monitor configuration, the first read of the monitoring
>> result requires waiting for the "not ready" duration before an effective
>> value can be obtained.
> 
> May need to wait - some platforms need to do this, some don't.
> Yours is the first I've heard of that does this!
> 

I'm afraid similar platforms do exist. As long as one component has more
than one MSC, after first updating the component’s monitor every MSC
instance needs to wait for MAX_NRDY_USEC us before reading the monitor
result.

In fact, most platforms don’t have nearly as many performance monitors
as PARTIDs, so the monitors often have to be time-shared, which making
the problem even more pronounced.

> 
>> Because a component consists of multiple MPAM instances, after updating the
>> configuration of each instance, should wait for the "not ready" period of
>> per single instance before the valid monitoring value can be obtained, not
>> just wait for once interval per component.
> 
> I'm really trying to avoid that ... if you have ~200 MSC pretending to be one thing, you'd
> wait 200x the maximum period. On systems with CMN, the number of MSC scales with the
> number of CPUs, so 200x isn't totally crazy.
> > I think the real problem here is the driver doesn't go on to 
reconfigure MSC-2 if MSC-1
> returned not-ready, meaning the "I'll only wait once" logic kicks in and returns not-ready
> to the user. (which is presumably what you're seeing?)

Yes, exactly.

> 
> Does this solve your problem?:
> -----------------%<-----------------
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 404bd4c1fd5e..2f39d0339349 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1395,7 +1395,7 @@ static void __ris_msmon_read(void *arg)
> 
>   static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
>   {
> -       int err, idx;
> +       int err, any_err = 0, idx;
>          struct mpam_msc *msc;
>          struct mpam_vmsc *vmsc;
>          struct mpam_msc_ris *ris;
> @@ -1412,15 +1412,19 @@ static int _msmon_read(struct mpam_component *comp, stru
> ct mon_read *arg)
>                                                      true);
>                          if (!err && arg->err)
>                                  err = arg->err;
> +
> +                       /*
> +                        * Save one error to be returned to the caller, but
> +                        * keep reading counters so that the get reprogrammed.
> +                        * On platforms with NRDY this lets us wait once.
> +                        */
>                          if (err)
> -                               break;
> +                               any_err = err;
>                  }
> -               if (err)
> -                       break;
>          }
>          srcu_read_unlock(&mpam_srcu, idx);
> 
> -       return err;
> +       return any_err;
>   }
> 
>   int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> -----------------%<-----------------
> 

I agree with this modification: Reconfigure all MSCs first, then if any
of them returns EBUSY, wait just once for MAX_NRDY_USEC and re-read
monitor result, this guarantees that the monitor result is valid.



Thanks,
Zeng Heng





^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-11 13:35   ` Jonathan Cameron
@ 2025-09-23 16:41     ` James Morse
  2025-09-26 14:55       ` Jonathan Cameron
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-23 16:41 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 14:35, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:47 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.

> Hi James,
> 
> Various comments inline.  You can ignore the do/while(0)
> one but I'll probably forget and send more grumpy comments about it ;)

I got exposed to this quite a lot in some other project - its better than a goto.
Due to the heavy exposure, I don't think its odd - I'm just used to seeing it.
(not sure its quite Stockholm syndrome yet)


>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 6487c511bdc6..93e563e1cce4 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>>  
>>  config ARM64_MPAM
>>  	bool "Enable support for MPAM"
>> +	select ARM64_MPAM_DRIVER if EXPERT
> 
> To me that wants a comment as it's unusual.

Sure - "# does nothing yet".
This is to stop it being enabled in distros as it can't be used until the resctrl part is
added.


>> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
>> new file mode 100644
>> index 000000000000..c30532a3a3a4
>> --- /dev/null
>> +++ b/drivers/resctrl/Kconfig
>> @@ -0,0 +1,14 @@
>> +menuconfig ARM64_MPAM_DRIVER
>> +	bool "MPAM driver"
>> +	depends on ARM64 && ARM64_MPAM && EXPERT
>> +	help
>> +	  MPAM driver for System IP, e,g. caches and memory controllers.
>> +
>> +if ARM64_MPAM_DRIVER
>> +config ARM64_MPAM_DRIVER_DEBUG
>> +	bool "Enable debug messages from the MPAM driver"
>> +	depends on ARM64_MPAM_DRIVER
> 
> The depends on should make the if unnecessary.

Yup, that was a recent change and I didn't spot the duplication.


>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644
>> index 000000000000..efc4738e3b4d
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -0,0 +1,180 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cacheinfo.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/device.h>
>> +#include <linux/errno.h>
>> +#include <linux/gfp.h>
>> +#include <linux/list.h>
>> +#include <linux/lockdep.h>
>> +#include <linux/mutex.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/printk.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/srcu.h>
>> +#include <linux/types.h>
> 
>> +/*
>> + * An MSC can control traffic from a set of CPUs, but may only be accessible
>> + * from a (hopefully wider) set of CPUs. The common reason for this is power
>> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
>> + * corresponding cache may also be powered off. By making accesses from
>> + * one of those CPUs, we ensure this isn't the case.
>> + */
>> +static int update_msc_accessibility(struct mpam_msc *msc)
>> +{
>> +	u32 affinity_id;
>> +	int err;
>> +
>> +	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
>> +				       &affinity_id);
>> +	if (err)
>> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
>> +	else
>> +		acpi_pptt_get_cpus_from_container(affinity_id,
>> +						  &msc->accessibility);
>> +
>> +	return 0;
>> +
>> +	return err;
> 
> Curious. I'd do a build test after each patch before v3. A couple of
> places would have failed or given helpful warnings so far.

Oddly - the compiler is quite happy with this. Not sure why.
I've fixed it to return err.

This was due to the last second rip-out of the DT support.


>> +}
> 
>> +
>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +	int err;
>> +	struct mpam_msc *msc;
>> +	struct resource *msc_res;
>> +	struct device *dev = &pdev->dev;
>> +	void *plat_data = pdev->dev.platform_data;
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	do {
> 
> I might well have moaned about this before, but I really dislike a do while(0)
> if it doesn't fit on my screen (and my eyesight is poor so that's not this
> many lines).  To me a non trivial case of this is almost always a place
> where a '_do' function would have made it more readable. 

I guess I'm very used to seeing this pattern, so I don't notice it.

I've change this to a do_ thing with all this shoved in another function.


> I'm also not a fan of scoped_guard() plus breaks because it feels like
> it is dependent on an implementation detail but maybe it's clearer than this.

(i.e. the cleanup stuff? I'd always prefer to see the free/unlock call)


>> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +		if (!msc) {
>> +			err = -ENOMEM;
>> +			break;
>> +		}
>> +
>> +		mutex_init(&msc->probe_lock);
>> +		mutex_init(&msc->part_sel_lock);
>> +		mutex_init(&msc->outer_mon_sel_lock);
>> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
>> +		msc->id = pdev->id;
>> +		msc->pdev = pdev;
>> +		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
>> +		INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +		err = update_msc_accessibility(msc);
>> +		if (err)
>> +			break;
>> +		if (cpumask_empty(&msc->accessibility)) {
>> +			dev_err_once(dev, "MSC is not accessible from any CPU!");
>> +			err = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>> +					     &msc->pcc_subspace_id))
>> +			msc->iface = MPAM_IFACE_MMIO;
>> +		else
>> +			msc->iface = MPAM_IFACE_PCC;
>> +
>> +		if (msc->iface == MPAM_IFACE_MMIO) {
>> +			void __iomem *io;
>> +
>> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +								    &msc_res);
>> +			if (IS_ERR(io)) {
>> +				dev_err_once(dev, "Failed to map MSC base address\n");
>> +				err = PTR_ERR(io);
>> +				break;
>> +			}
>> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +			msc->mapped_hwpage = io;
>> +		}
>> +
>> +		list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
>> +		platform_set_drvdata(pdev, msc);
>> +	} while (0);
>> +	mutex_unlock(&mpam_list_lock);
>> +
>> +	if (!err) {
>> +		/* Create RIS entries described by firmware */
>> +		err = acpi_mpam_parse_resources(msc, plat_data);
>> +	}
>> +
>> +	if (err && msc)
>> +		mpam_msc_drv_remove(pdev);
> 
> Is it worth bothering to remove?  We failed probe anyway if we got here
> and it's not expected to happen on real systems so I'd just leave it around
> so that you can exit early above.

Symmetry and the principle of least surprise.
Ideally the probing code wouldn't need to unwind what it did, it can rely on the remove
code to do the necessary. It was a patch from Carl Worth that did this as the
unwind/remove code was duplication.

I agree there is no error that requires it here, but later patches add things like debugfs
entries that need removing before the struct mpam_msc memory can be free'd - potentially
by the devm stuff if the probe function returns an error due to the firmware tables 'RIS'
description being wonky.


> I'm also not following why the msc check is relevant if you do want to do
> this. Can only get here without msc if the allocation failed. Why would
> we leave the driver loaded in only that case?

Simply because the remove/destroy code goes dereferencing msc, so that is the one case
that mustn't get in there. With your do_ version of the above this got simplified.


>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> new file mode 100644
>> index 000000000000..7c63d590fc98
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -0,0 +1,65 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +#ifndef MPAM_INTERNAL_H
>> +#define MPAM_INTERNAL_H
>> +
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/io.h>
>> +#include <linux/mailbox_client.h>
>> +#include <linux/mutex.h>
> 
> spinlock.h

Fixed,


>> +#include <linux/resctrl.h>
> 
> Not spotting anything rsctl yet.  So maybe this belongs later.

There shouldn't be anything that depends on resctrl in this series - looks like
this is a 2018 era bug in the way I carved this up!


>> +#include <linux/sizes.h>
>> +
>> +struct mpam_msc {
>> +	/* member of mpam_all_msc */
>> +	struct list_head        all_msc_list;
>> +
>> +	int			id;
> 
> I'd follow (approx) include what you use principles to make later header
> shuffling easier. So a forward def for this.

-ENOPARSE

I'm sure I'll work this out from your later comments.


>> +	struct platform_device *pdev;
>> +
>> +	/* Not modified after mpam_is_enabled() becomes true */
>> +	enum mpam_msc_iface	iface;
>> +	u32			pcc_subspace_id;
>> +	struct mbox_client	pcc_cl;
>> +	struct pcc_mbox_chan	*pcc_chan;
> 
> Forward def or include acpi/pcc.h

The PCC code got pulled later, and out of this series. I missed these bits.


>> +	u32			nrdy_usec;
>> +	cpumask_t		accessibility;
>> +
>> +	/*
>> +	 * probe_lock is only taken during discovery. After discovery these
>> +	 * properties become read-only and the lists are protected by SRCU.
>> +	 */
>> +	struct mutex		probe_lock;
>> +	unsigned long		ris_idxs;
>> +	u32			ris_max;
>> +
>> +	/* mpam_msc_ris of this component */
>> +	struct list_head	ris;
>> +
>> +	/*
>> +	 * part_sel_lock protects access to the MSC hardware registers that are
>> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>> +	 * by RIS).
>> +	 * If needed, take msc->probe_lock first.
>> +	 */
>> +	struct mutex		part_sel_lock;
>> +
>> +	/*
>> +	 * mon_sel_lock protects access to the MSC hardware registers that are
>> +	 * affected by MPAMCFG_MON_SEL.
>> +	 * If needed, take msc->probe_lock first.
>> +	 */
>> +	struct mutex		outer_mon_sel_lock;
>> +	raw_spinlock_t		inner_mon_sel_lock;
>> +	unsigned long		inner_mon_sel_flags;
>> +
>> +	void __iomem		*mapped_hwpage;
>> +	size_t			mapped_hwpage_sz;
>> +};
>> +
>> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>> +				   cpumask_t *affinity);
> 
> Where is this?

Ugh, more bits of DT - this one is non-obvious because of the name.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-10 20:43 ` [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
  2025-09-11 15:46   ` Ben Horgan
  2025-09-12 13:21   ` Jonathan Cameron
@ 2025-09-25  2:30   ` Fenghua Yu
  2025-10-09 17:48     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-25  2:30 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:43, James Morse wrote:
> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
>
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
>
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Added XCL support.
>   * Merged FLT/CTL constants.
>   * a spelling mistake in a comment.
>   * moved structrues around.
> ---
>   drivers/resctrl/mpam_devices.c  | 226 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  19 +++
>   2 files changed, 245 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index cf190f896de1..1543c33c5d6a 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -898,6 +898,232 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   	return 0;
>   }
>   
> +struct mon_read {
> +	struct mpam_msc_ris		*ris;
> +	struct mon_cfg			*ctx;
> +	enum mpam_device_features	type;
> +	u64				*val;
> +	int				err;
> +};
> +
> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				   u32 *flt_val)
> +{
> +	struct mon_cfg *ctx = m->ctx;
> +
> +	/*
> +	 * For CSU counters its implementation-defined what happens when not
> +	 * filtering by partid.
> +	 */
> +	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
> +
> +	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
> +	if (m->ctx->match_pmg) {
> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
> +		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
> +	}
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
> +			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
> +					       ctx->csu_exclude_clean);
> +
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
> +			*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
> +
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				    u32 *flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +/* Remove values set by the hardware to prevent apparent mismatches. */
> +static void clean_msmon_ctl_val(u32 *cur_ctl)
> +{
> +	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +}
> +
> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> +				     u32 flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	/*
> +	 * Write the ctl_val with the enable bit cleared, reset the counter,
> +	 * then enable counter.
> +	 */
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, CSU, 0);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +/* Call with MSC lock held */
> +static void __ris_msmon_read(void *arg)
> +{
> +	u64 now;
> +	bool nrdy = false;
> +	struct mon_read *m = arg;
> +	struct mon_cfg *ctx = m->ctx;
> +	struct mpam_msc_ris *ris = m->ris;
> +	struct mpam_props *rprops = &ris->props;
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> +
> +	if (!mpam_mon_sel_lock(msc)) {
> +		m->err = -EIO;
> +		return;
> +	}
> +	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
> +		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +	/*
> +	 * Read the existing configuration to avoid re-writing the same values.
> +	 * This saves waiting for 'nrdy' on subsequent reads.
> +	 */
> +	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
> +	clean_msmon_ctl_val(&cur_ctl);
> +	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> +	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> +		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		now = mpam_read_monsel_reg(msc, CSU);
> +		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		now = mpam_read_monsel_reg(msc, MBWU);
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	default:
> +		m->err = -EINVAL;
> +		break;
> +	}
> +	mpam_mon_sel_unlock(msc);
> +
> +	if (nrdy) {
> +		m->err = -EBUSY;
> +		return;
> +	}
> +
> +	now = FIELD_GET(MSMON___VALUE, now);
> +	*m->val += now;
> +}
> +
> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> +	int err, idx;
Can err be initialized to some error code e.g. -ENODEV?
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			arg->ris = ris;
> +
> +			err = smp_call_function_any(&msc->accessibility,
> +						    __ris_msmon_read, arg,
> +						    true);
> +			if (!err && arg->err)
> +				err = arg->err;
> +			if (err)
> +				break;
> +		}
> +		if (err)
> +			break;
> +	}

comp->vmsc or vmsc->ris usually are not empty. But in some condition, 
they can be empty. In that case, uninitialized err value may cause 
unexpected behavior for the callers.

So it's better to initialize err to avoid any complexity.

> +	srcu_read_unlock(&mpam_srcu, idx);
> +
> +	return err;
> +}
> +
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features type, u64 *val)
> +{
> +	int err;
> +	struct mon_read arg;
> +	u64 wait_jiffies = 0;
> +	struct mpam_props *cprops = &comp->class->props;
> +
> +	might_sleep();
> +
> +	if (!mpam_is_enabled())
> +		return -EIO;
> +
> +	if (!mpam_has_feature(type, cprops))
> +		return -EOPNOTSUPP;
> +
> +	memset(&arg, 0, sizeof(arg));
> +	arg.ctx = ctx;
> +	arg.type = type;
> +	arg.val = val;
> +	*val = 0;
> +
> +	err = _msmon_read(comp, &arg);
> +	if (err == -EBUSY && comp->class->nrdy_usec)
> +		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
> +
> +	while (wait_jiffies)
> +		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
> +
> +	if (err == -EBUSY) {
> +		memset(&arg, 0, sizeof(arg));
> +		arg.ctx = ctx;
> +		arg.type = type;
> +		arg.val = val;
> +		*val = 0;
> +
> +		err = _msmon_read(comp, &arg);
> +	}
> +
> +	return err;
> +}
> +
>   static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>   {
>   	u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 81c4c2bfea3d..bb01e7dbde40 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -196,6 +196,22 @@ static inline void mpam_clear_feature(enum mpam_device_features feat,
>   	*supported &= ~(1 << feat);
>   }
>   
> +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> +enum mon_filter_options {
> +	COUNT_BOTH	= 0,
> +	COUNT_WRITE	= 1,
> +	COUNT_READ	= 2,
> +};
> +
> +struct mon_cfg {
> +	u16                     mon;
> +	u8                      pmg;
> +	bool                    match_pmg;
> +	bool			csu_exclude_clean;
> +	u32                     partid;
> +	enum mon_filter_options opts;
> +};
> +
>   struct mpam_class {
>   	/* mpam_components in this class */
>   	struct list_head	components;
> @@ -343,6 +359,9 @@ void mpam_disable(struct work_struct *work);
>   int mpam_apply_config(struct mpam_component *comp, u16 partid,
>   		      struct mpam_config *cfg);
>   
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features, u64 *val);
> +
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
                     ` (2 preceding siblings ...)
  2025-09-12 15:22   ` Dave Martin
@ 2025-09-25  6:33   ` Fenghua Yu
  2025-10-03 18:03     ` James Morse
  3 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-25  6:33 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Xin Hao, peternewman, dfustini,
	amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
>
> Only the irq handler accesses the ESR register, so no locking is needed.
> The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning
> it can't be done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
>
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
>
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
>
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Made mpam_unregister_irqs() safe to race with itself.
>   * Removed threaded interrupts.
>   * Schedule mpam_disable() from cpuhp callback in the case of an error.
>   * Added mpam_disable_reason.
>   * Use alloc_percpu()
>
> Changes since RFC:
>   * Use guard marco when walking srcu list.
>   * Use INTEN macro for enabling interrupts.
>   * Move partid_max_published up earlier in mpam_enable_once().
> ---
>   drivers/resctrl/mpam_devices.c  | 277 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  10 ++
>   2 files changed, 284 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index a9d3c4b09976..e7e4afc1ea95 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -14,6 +14,9 @@
>   #include <linux/device.h>
>   #include <linux/errno.h>
>   #include <linux/gfp.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
>   #include <linux/list.h>
>   #include <linux/lockdep.h>
>   #include <linux/mutex.h>
> @@ -166,6 +169,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
>   	return (idr_high << 32) | idr_low;
>   }
>   
> +static void mpam_msc_zero_esr(struct mpam_msc *msc)
> +{
> +	__mpam_write_reg(msc, MPAMF_ESR, 0);
> +	if (msc->has_extd_esr)
> +		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
> +}
> +
> +static u64 mpam_msc_read_esr(struct mpam_msc *msc)
> +{
> +	u64 esr_high = 0, esr_low;
> +
> +	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
> +	if (msc->has_extd_esr)
> +		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
> +
> +	return (esr_high << 32) | esr_low;
> +}
> +
>   static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
>   {
>   	lockdep_assert_held(&msc->part_sel_lock);
> @@ -754,6 +775,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>   		msc->partid_max = min(msc->partid_max, partid_max);
>   		msc->pmg_max = min(msc->pmg_max, pmg_max);
> +		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
>   
>   		mutex_lock(&mpam_list_lock);
>   		ris = mpam_get_or_create_ris(msc, ris_idx);
> @@ -768,6 +790,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   		mutex_unlock(&msc->part_sel_lock);
>   	}
>   
> +	/* Clear any stale errors */
> +	mpam_msc_zero_esr(msc);
> +
>   	spin_lock(&partid_max_lock);
>   	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
>   	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> @@ -895,6 +920,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>   	}
>   }
>   
> +static void _enable_percpu_irq(void *_irq)
> +{
> +	int *irq = _irq;
> +
> +	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
> +}
> +
>   static int mpam_cpu_online(unsigned int cpu)
>   {
>   	int idx;
> @@ -906,6 +938,9 @@ static int mpam_cpu_online(unsigned int cpu)
>   		if (!cpumask_test_cpu(cpu, &msc->accessibility))
>   			continue;
>   
> +		if (msc->reenable_error_ppi)
> +			_enable_percpu_irq(&msc->reenable_error_ppi);
> +
>   		if (atomic_fetch_inc(&msc->online_refs) == 0)
>   			mpam_reset_msc(msc, true);
>   	}
> @@ -959,6 +994,9 @@ static int mpam_cpu_offline(unsigned int cpu)
>   		if (!cpumask_test_cpu(cpu, &msc->accessibility))
>   			continue;
>   
> +		if (msc->reenable_error_ppi)
> +			disable_percpu_irq(msc->reenable_error_ppi);
> +
>   		if (atomic_dec_and_test(&msc->online_refs))
>   			mpam_reset_msc(msc, false);
>   	}
> @@ -985,6 +1023,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
>   	mutex_unlock(&mpam_cpuhp_state_lock);
>   }
>   
> +static int __setup_ppi(struct mpam_msc *msc)
> +{
> +	int cpu;
> +	struct device *dev = &msc->pdev->dev;
> +
> +	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
> +	if (!msc->error_dev_id)
> +		return -ENOMEM;
> +
> +	for_each_cpu(cpu, &msc->accessibility) {
> +		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
> +
> +		if (empty) {
> +			dev_err_once(dev, "MSC shares PPI with %s!\n",
> +				     dev_name(&empty->pdev->dev));
> +			return -EBUSY;
> +		}
> +		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
> +{
> +	int irq;
> +
> +	irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +	if (irq <= 0)
> +		return 0;
> +
> +	/* Allocate and initialise the percpu device pointer for PPI */
> +	if (irq_is_percpu(irq))
> +		return __setup_ppi(msc);
> +
> +	/* sanity check: shared interrupts can be routed anywhere? */
> +	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
> +		pr_err_once("msc:%u is a private resource with a shared error interrupt",
> +			    msc->id);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>   /*
>    * An MSC can control traffic from a set of CPUs, but may only be accessible
>    * from a (hopefully wider) set of CPUs. The common reason for this is power
> @@ -1060,6 +1143,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>   			break;
>   		}
>   
> +		err = mpam_msc_setup_error_irq(msc);
> +		if (err)
> +			break;
> +
>   		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>   					     &msc->pcc_subspace_id))
>   			msc->iface = MPAM_IFACE_MMIO;
> @@ -1318,11 +1405,172 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
>   	}
>   }
>   
> +static char *mpam_errcode_names[16] = {
> +	[0] = "No error",
> +	[1] = "PARTID_SEL_Range",
> +	[2] = "Req_PARTID_Range",
> +	[3] = "MSMONCFG_ID_RANGE",
> +	[4] = "Req_PMG_Range",
> +	[5] = "Monitor_Range",
> +	[6] = "intPARTID_Range",
> +	[7] = "Unexpected_INTERNAL",
> +	[8] = "Undefined_RIS_PART_SEL",
> +	[9] = "RIS_No_Control",
> +	[10] = "Undefined_RIS_MON_SEL",
> +	[11] = "RIS_No_Monitor",
> +	[12 ... 15] = "Reserved"
> +};
> +
> +static int mpam_enable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
> +
> +	return 0;
> +}
> +
> +/* This can run in mpam_disable(), and the interrupt handler on the same CPU */
> +static int mpam_disable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, 0);
> +
> +	return 0;
> +}
> +
> +static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
> +{
> +	u64 reg;
> +	u16 partid;
> +	u8 errcode, pmg, ris;
> +
> +	if (WARN_ON_ONCE(!msc) ||
> +	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &msc->accessibility)))
> +		return IRQ_NONE;
> +
> +	reg = mpam_msc_read_esr(msc);
> +
> +	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
> +	if (!errcode)
> +		return IRQ_NONE;
> +
> +	/* Clear level triggered irq */
> +	mpam_msc_zero_esr(msc);
> +
> +	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
> +	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
> +	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
> +
> +	pr_err_ratelimited("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
> +			   msc->id, mpam_errcode_names[errcode], partid, pmg,
> +			   ris);
> +
> +	/* Disable this interrupt. */
> +	mpam_disable_msc_ecr(msc);
> +
> +	/*
> +	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
> +	 * unregister the interrupt from the threaded part of the handler.
> +	 */
> +	mpam_disable_reason = "hardware error interrupt";
> +	schedule_work(&mpam_broken_work);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
> +{
> +	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
> +
> +	return __mpam_irq_handler(irq, msc);
> +}
> +
> +static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
> +{
> +	struct mpam_msc *msc = dev_id;
> +
> +	return __mpam_irq_handler(irq, msc);
> +}
> +
> +static int mpam_register_irqs(void)
> +{
> +	int err, irq;
> +	struct mpam_msc *msc;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
> +		/* We anticipate sharing the interrupt with other MSCs */
> +		if (irq_is_percpu(irq)) {
> +			err = request_percpu_irq(irq, &mpam_ppi_handler,
> +						 "mpam:msc:error",
> +						 msc->error_dev_id);
> +			if (err)
> +				return err;
> +
> +			msc->reenable_error_ppi = irq;
> +			smp_call_function_many(&msc->accessibility,
> +					       &_enable_percpu_irq, &irq,
> +					       true);
> +		} else {
> +			err = devm_request_irq(&msc->pdev->dev,irq,
> +					       &mpam_spi_handler, IRQF_SHARED,
> +					       "mpam:msc:error", msc);
> +			if (err)
> +				return err;
> +		}
> +
> +		set_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags);
> +		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
> +		set_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags);
> +	}
> +
> +	return 0;
> +}
> +
> +static void mpam_unregister_irqs(void)
> +{
> +	int irq, idx;
> +	struct mpam_msc *msc;
> +
> +	cpus_read_lock();
> +	/* take the lock as free_irq() can sleep */
> +	idx = srcu_read_lock(&mpam_srcu);
guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags))
> +			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
> +
> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags)) {
> +			if (irq_is_percpu(irq)) {
> +				msc->reenable_error_ppi = 0;
> +				free_percpu_irq(irq, msc->error_dev_id);
> +			} else {
> +				devm_free_irq(&msc->pdev->dev, irq, msc);
> +			}
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +	cpus_read_unlock();
> +}
> +

[SNIP]

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-09-10 20:43 ` [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
  2025-09-12 12:22   ` Jonathan Cameron
  2025-09-12 15:00   ` Ben Horgan
@ 2025-09-25  6:53   ` Fenghua Yu
  2025-10-03 18:04     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-25  6:53 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:43, James Morse wrote:
> When CPUs come online the MSC's original configuration should be restored.
>
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate
> a configuration array for each component, and reprogram each RIS
> configuration from this.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Switched entry_rcu to srcu versions.
>
> Changes since RFC:
>   * Added a comment about the ordering around max_partid.
>   * Allocate configurations after interrupts are registered to reduce churn.
>   * Added mpam_assert_partid_sizes_fixed();
>   * Make reset use an all-ones instead of zero config.
> ---
>   drivers/resctrl/mpam_devices.c  | 269 +++++++++++++++++++++++++++++---
>   drivers/resctrl/mpam_internal.h |  29 +++-
>   2 files changed, 271 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index ec1db5f8b05c..7fd149109c75 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -114,6 +114,16 @@ static LLIST_HEAD(mpam_garbage);
>   /* When mpam is disabled, the printed reason to aid debugging */
>   static char *mpam_disable_reason;
>   
> +/*
> + * Once mpam is enabled, new requestors cannot further reduce the available
> + * partid. Assert that the size is fixed, and new requestors will be turned
> + * away.
> + */
> +static void mpam_assert_partid_sizes_fixed(void)
> +{
> +	WARN_ON_ONCE(!partid_max_published);
> +}
> +
>   static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>   {
>   	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> @@ -363,12 +373,16 @@ static void mpam_class_destroy(struct mpam_class *class)
>   	add_to_garbage(class);
>   }
>   
> +static void __destroy_component_cfg(struct mpam_component *comp);
> +
>   static void mpam_comp_destroy(struct mpam_component *comp)
>   {
>   	struct mpam_class *class = comp->class;
>   
>   	lockdep_assert_held(&mpam_list_lock);
>   
> +	__destroy_component_cfg(comp);
> +
>   	list_del_rcu(&comp->class_list);
>   	add_to_garbage(comp);
>   
> @@ -833,50 +847,105 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>   	__mpam_write_reg(msc, reg, bm);
>   }
>   
> -static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +/* Called via IPI. Call while holding an SRCU reference */
> +static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> +				      struct mpam_config *cfg)
>   {
>   	struct mpam_msc *msc = ris->vmsc->msc;
>   	struct mpam_props *rprops = &ris->props;
>   
> -	mpam_assert_srcu_read_lock_held();
> -
>   	mutex_lock(&msc->part_sel_lock);
>   	__mpam_part_sel(ris->ris_idx, partid, msc);
>   
> -	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
> +	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
> +		if (cfg->reset_cpbm)
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
> +					      rprops->cpbm_wd);
> +		else
> +			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> +	}
>   
> -	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_part, cfg)) {
> +		if (cfg->reset_mbw_pbm)
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
> +					      rprops->mbw_pbm_bits);
> +		else
> +			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
> +	}
>   
> -	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_min, cfg))
>   		mpam_write_partsel_reg(msc, MBW_MIN, 0);
>   
> -	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> -		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_max, cfg))
> +		mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
>   
> -	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
> +	if (mpam_has_feature(mpam_feat_mbw_prop, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_prop, cfg))
>   		mpam_write_partsel_reg(msc, MBW_PROP, 0);
>   	mutex_unlock(&msc->part_sel_lock);
>   }
>   
> +struct reprogram_ris {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_config *cfg;
> +};
> +
> +/* Call with MSC lock held */
> +static int mpam_reprogram_ris(void *_arg)
> +{
> +	u16 partid, partid_max;
> +	struct reprogram_ris *arg = _arg;
> +	struct mpam_msc_ris *ris = arg->ris;
> +	struct mpam_config *cfg = arg->cfg;
> +
> +	if (ris->in_reset_state)
> +		return 0;
> +
> +	spin_lock(&partid_max_lock);
> +	partid_max = mpam_partid_max;
> +	spin_unlock(&partid_max_lock);
> +	for (partid = 0; partid <= partid_max; partid++)
> +		mpam_reprogram_ris_partid(ris, partid, cfg);
> +
> +	return 0;
> +}
> +
> +static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
> +{
> +	memset(reset_cfg, 0, sizeof(*reset_cfg));
> +
> +	reset_cfg->features = ~0;
> +	reset_cfg->cpbm = ~0;
> +	reset_cfg->mbw_pbm = ~0;
> +	reset_cfg->mbw_max = MPAMCFG_MBW_MAX_MAX;
> +
> +	reset_cfg->reset_cpbm = true;
> +	reset_cfg->reset_mbw_pbm = true;
> +}
> +
>   /*
>    * Called via smp_call_on_cpu() to prevent migration, while still being
>    * pre-emptible.
>    */
>   static int mpam_reset_ris(void *arg)
>   {
> -	u16 partid, partid_max;
> +	struct mpam_config reset_cfg;
>   	struct mpam_msc_ris *ris = arg;
> +	struct reprogram_ris reprogram_arg;
>   
>   	if (ris->in_reset_state)
>   		return 0;
>   
> -	spin_lock(&partid_max_lock);
> -	partid_max = mpam_partid_max;
> -	spin_unlock(&partid_max_lock);
> -	for (partid = 0; partid < partid_max; partid++)
> -		mpam_reset_ris_partid(ris, partid);
> +	mpam_init_reset_cfg(&reset_cfg);
> +
> +	reprogram_arg.ris = ris;
> +	reprogram_arg.cfg = &reset_cfg;
> +
> +	mpam_reprogram_ris(&reprogram_arg);
>   
>   	return 0;
>   }
> @@ -922,6 +991,40 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>   	}
>   }
>   
> +static void mpam_reprogram_msc(struct mpam_msc *msc)
> +{
> +	u16 partid;
> +	bool reset;
> +	struct mpam_config *cfg;
> +	struct mpam_msc_ris *ris;
> +
> +	/*
> +	 * No lock for mpam_partid_max as partid_max_published has been
> +	 * set by mpam_enabled(), so the values can no longer change.
> +	 */
> +	mpam_assert_partid_sizes_fixed();
> +
> +	guard(srcu)(&mpam_srcu);
mpam_srcu is locked in caller mpam_cpu_online(). It's unnecessary to 
call guard(srcu)(&mpam_srcu) here again for simpler logic and less overhead.
> +	list_for_each_entry_srcu(ris, &msc->ris, msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!mpam_is_enabled() && !ris->in_reset_state) {
> +			mpam_touch_msc(msc, &mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +			continue;
> +		}
> +
> +		reset = true;
> +		for (partid = 0; partid <= mpam_partid_max; partid++) {
> +			cfg = &ris->vmsc->comp->cfg[partid];
> +			if (cfg->features)
> +				reset = false;
> +
> +			mpam_reprogram_ris_partid(ris, partid, cfg);
> +		}
> +		ris->in_reset_state = reset;
> +	}
> +}
> +
>   static void _enable_percpu_irq(void *_irq)
>   {
>   	int *irq = _irq;
> @@ -944,7 +1047,7 @@ static int mpam_cpu_online(unsigned int cpu)
>   			_enable_percpu_irq(&msc->reenable_error_ppi);
>   
>   		if (atomic_fetch_inc(&msc->online_refs) == 0)
> -			mpam_reset_msc(msc, true);
> +			mpam_reprogram_msc(msc);
>   	}
>   	srcu_read_unlock(&mpam_srcu, idx);

[SNIP]

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-10 20:42 ` [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
  2025-09-12 11:42   ` Ben Horgan
  2025-09-12 12:02   ` Jonathan Cameron
@ 2025-09-25  7:16   ` Fenghua Yu
  2025-10-02 18:02     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-25  7:16 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
>
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * more complete use of _srcu helpers.
>   * Use guard macro for srcu.
>   * Dropped a might_sleep() - something else will bark.
> ---
>   drivers/resctrl/mpam_devices.c | 56 ++++++++++++++++++++++++++++++++--
>   1 file changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index e7faf453b5d7..a9d3c4b09976 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -842,8 +842,6 @@ static int mpam_reset_ris(void *arg)
>   	u16 partid, partid_max;
>   	struct mpam_msc_ris *ris = arg;
>   
> -	mpam_assert_srcu_read_lock_held();
> -
>   	if (ris->in_reset_state)
>   		return 0;
>   
> @@ -1340,8 +1338,56 @@ static void mpam_enable_once(void)
>   	       mpam_partid_max + 1, mpam_pmg_max + 1);
>   }
>   
> +static void mpam_reset_component_locked(struct mpam_component *comp)
> +{
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	

Nested locks on mpam_srcu in this call chain:

mpam_disable() -> mpam_reset_class() -> mpam_reset_class_locked() -> 
mpam_component_locked()

There are redundant locks on mpam_srcu in mpam_disabled(), 
mpam_reset_class_locked(), and mpam_reset_component_locked().

It's better to guard mpam_srcu only in the top function mpam_disable() 
for simpler logic and lower overhead.

> list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		msc = vmsc->msc;
> +
> +		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
> +					 srcu_read_lock_held(&mpam_srcu)) {
> +			if (!ris->in_reset_state)
> +				mpam_touch_msc(msc, mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +		}
> +	}
> +}
> +
> +static void mpam_reset_class_locked(struct mpam_class *class)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(comp, &class->components, class_list,
> +				 srcu_read_lock_held(&mpam_srcu))
> +		mpam_reset_component_locked(comp);
> +}
> +
> +static void mpam_reset_class(struct mpam_class *class)
> +{
> +	cpus_read_lock();
> +	mpam_reset_class_locked(class);
> +	cpus_read_unlock();
> +}
> +
> +/*
> + * Called in response to an error IRQ.
> + * All of MPAMs errors indicate a software bug, restore any modified
> + * controls to their reset values.
> + */
>   void mpam_disable(struct work_struct *ignored)
>   {
> +	int idx;
> +	struct mpam_class *class;
>   	struct mpam_msc *msc, *tmp;
>   
>   	mutex_lock(&mpam_cpuhp_state_lock);
> @@ -1351,6 +1397,12 @@ void mpam_disable(struct work_struct *ignored)
>   	}
>   	mutex_unlock(&mpam_cpuhp_state_lock);
>   
> +	idx = srcu_read_lock(&mpam_srcu);
It's better to change to guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> +				 srcu_read_lock_held(&mpam_srcu))
> +		mpam_reset_class(class);
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
>   	mutex_lock(&mpam_list_lock);
>   	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
>   		mpam_msc_destroy(msc);

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 00/29] arm_mpam: Add basic mpam driver
  2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (28 preceding siblings ...)
  2025-09-10 20:43 ` [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-09-25  7:18 ` Fenghua Yu
  29 siblings, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-09-25  7:18 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> Hello,
>
> The major changes since v1 are:
>   * DT got ripped out - see below.
>   * The mon_sel locking was simplified - but that will come back.
>   
>   Otherwise the myriad of changes are noted on each patch.
>   
> ~
>
> This is just enough MPAM driver for ACPI. DT got ripped out. If you need DT
> support - please share your DTS so the DT folk know the binding is what is
> needed.
> This doesn't contain any of the resctrl code, meaning you can't actually drive it
> from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
> This will change once the user interface is connected up.
>
> This is the initial group of patches that allows the resctrl code to be built
> on top. Including that will increase the number of trees that may need to
> coordinate, so breaking it up make sense.
>
> The locking got simplified, but is still strange - this is because of the 'mpam-fb'
> firmware interface specification that is still alpha. That thing needs to wait for
> an interrupt after every system register write, which significantly impacts the
> driver. Some features just won't work, e.g. reading the monitor registers via
> perf.
>
> I've not found a platform that can test all the behaviours around the monitors,
> so this is where I'd expect the most bugs.
>
> The MPAM spec that describes all the system and MMIO registers can be found here:
> https://developer.arm.com/documentation/ddi0598/db/?lang=en
> (Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
>   This document has the best overview)
>
> The expectation is this will go via the arm64 tree.
>
>
> This series is based on v6.17-rc4, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v2
>
> The rest of the driver can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.17-rc4

Tested-by: Fenghua Yu <fenghuay@nvidia.com>

[SNIP]

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
  2025-09-11 10:43   ` Jonathan Cameron
@ 2025-09-25  9:32   ` Stanimir Varbanov
  2025-10-10 16:54     ` James Morse
  2025-10-02  3:35   ` Fenghua Yu
  2025-10-03  0:15   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Stanimir Varbanov @ 2025-09-25  9:32 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/10/25 11:42 PM, James Morse wrote:
> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT to indicate the subset of CPUs and cache topology that can
> access each MPAM System Component (MSC).
> 
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Replaced commit message with wording from Dave.
>  * Fixed a stray plural.
>  * Moved further down in the file to make use of get_pptt() helper.
>  * Added a break to exit the loop early.
> 
> Changes since RFC:
>  * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()
> 
> Changes since RFC:
>  * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
>  * Added missing : in kernel-doc
>  * Made helper return void as this never actually returns an error.
> ---
>  drivers/acpi/pptt.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  3 ++
>  2 files changed, 86 insertions(+)
> 

<snip>

> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor container
> + * @acpi_cpu_id:	The UID of the processor container.
> + * @cpus:		The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.

Leftover, drop this.

> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;

<snip>

regards,
~Stan



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-10 20:42 ` [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
  2025-09-12 12:13   ` Jonathan Cameron
  2025-09-12 14:42   ` Ben Horgan
@ 2025-09-26  2:31   ` Fenghua Yu
  2025-10-03 18:04     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-26  2:31 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> Once all the MSC have been probed, the system wide usable number of
> PARTID is known and the configuration arrays can be allocated.
>
> After this point, checking all the MSC have been probed is pointless,
> and the cpuhp callbacks should restore the configuration, instead of
> just resetting the MSC.
>
> Add a static key to enable this behaviour. This will also allow MPAM
> to be disabled in repsonse to an error, and the architecture code to
nit...s/repsonse/response/
> enable/disable the context switch of the MPAM system registers.
>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-10 20:43 ` [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
  2025-09-12 13:37   ` Jonathan Cameron
  2025-09-12 16:06   ` Ben Horgan
@ 2025-09-26  2:35   ` Fenghua Yu
  2025-10-10 16:53     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-26  2:35 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:43, James Morse wrote:
> The bitmap reset code has been a source of bugs. Add a unit test.
>
> This currently has to be built in, as the rest of the driver is
> builtin.
>
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-10 20:43 ` [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
  2025-09-12 13:41   ` Jonathan Cameron
  2025-09-12 16:01   ` Ben Horgan
@ 2025-09-26  2:36   ` Fenghua Yu
  2025-10-10 16:54     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-26  2:36 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:43, James Morse wrote:
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
>
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-10 20:43 ` [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
  2025-09-12 13:33   ` Jonathan Cameron
  2025-09-18  2:35   ` Shaopeng Tan (Fujitsu)
@ 2025-09-26  4:11   ` Fenghua Yu
  2025-10-10 16:53     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-09-26  4:11 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:43, James Morse wrote:
> resctrl expects to reset the bandwidth counters when the filesystem
> is mounted.
>
> To allow this, add a helper that clears the saved mbwu state. Instead
> of cross calling to each CPU that can access the component MSC to
> write to the counter, set a flag that causes it to be zero'd on the
> the next read. This is easily done by forcing a configuration update.
>
> Signed-off-by: James Morse <james.morse@arm.com>

Other than the following minor change,

Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>


> ---
>   drivers/resctrl/mpam_devices.c  | 47 +++++++++++++++++++++++++++++++--
>   drivers/resctrl/mpam_internal.h |  5 +++-
>   2 files changed, 49 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3080a81f0845..8254d6190ca2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1088,9 +1088,11 @@ static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
>   static void __ris_msmon_read(void *arg)
>   {
>   	bool nrdy = false;
> +	bool config_mismatch;
>   	struct mon_read *m = arg;
>   	u64 now, overflow_val = 0;
>   	struct mon_cfg *ctx = m->ctx;
> +	bool reset_on_next_read = false;
>   	struct mpam_msc_ris *ris = m->ris;
>   	struct msmon_mbwu_state *mbwu_state;
>   	struct mpam_props *rprops = &ris->props;
> @@ -1105,6 +1107,14 @@ static void __ris_msmon_read(void *arg)
>   		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
>   	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
>   
> +	if (m->type == mpam_feat_msmon_mbwu) {
> +		mbwu_state = &ris->mbwu_state[ctx->mon];
> +		if (mbwu_state) {
> +			reset_on_next_read = mbwu_state->reset_on_next_read;
> +			mbwu_state->reset_on_next_read = false;
> +		}
> +	}
> +
>   	/*
>   	 * Read the existing configuration to avoid re-writing the same values.
>   	 * This saves waiting for 'nrdy' on subsequent reads.
> @@ -1112,7 +1122,10 @@ static void __ris_msmon_read(void *arg)
>   	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
>   	clean_msmon_ctl_val(&cur_ctl);
>   	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> -	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> +	config_mismatch = cur_flt != flt_val ||
> +			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
> +
> +	if (config_mismatch || reset_on_next_read)
>   		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
>   
>   	switch (m->type) {
> @@ -1145,7 +1158,6 @@ static void __ris_msmon_read(void *arg)
>   		if (nrdy)
>   			break;
>   
> -		mbwu_state = &ris->mbwu_state[ctx->mon];
>   		if (!mbwu_state)
>   			break;
>   
> @@ -1245,6 +1257,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>   	return err;
>   }
>   
> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
> +{
> +	int idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	if (!mpam_is_enabled())
> +		return;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
> +			continue;
> +
> +		msc = vmsc->msc;
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
> +				continue;
> +
> +			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
> +				continue;
> +
> +			ris->mbwu_state[ctx->mon].correction = 0;
> +			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
> +			mpam_mon_sel_unlock(msc);
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
>   static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>   {
>   	u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index c190826dfbda..7cbcafe8294a 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -223,10 +223,12 @@ struct mon_cfg {
>   
>   /*
>    * Changes to enabled and cfg are protected by the msc->lock.
> - * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
> + * Changes to reset_on_next_read, prev_val and correction are protected by the
> + * msc's mon_sel_lock.
>    */
>   struct msmon_mbwu_state {
>   	bool		enabled;
> +	bool		reset_on_next_read;
>   	struct mon_cfg	cfg;
>   
>   	/* The value last read from the hardware. Used to detect overflow. */
> @@ -393,6 +395,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
>   
>   int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>   		    enum mpam_device_features, u64 *val);
> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
>   
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported
  2025-09-10 20:43 ` [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported James Morse
  2025-09-12 13:29   ` Jonathan Cameron
@ 2025-09-26  4:51   ` Fenghua Yu
  1 sibling, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-09-26  4:51 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan


On 9/10/25 13:43, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
>
> If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
> the RIS, use long/LWD counter instead of the regular 31 bit mbwu
> counter.
>
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
>
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

> ---
> Changes since v1:
>   * Only clear OFLOW_STATUS_L on MBWU counters.
>
> Changes since RFC:
>   * Commit message wrangling.
>   * Refer to 31 bit counters as opposed to 32 bit (registers).
> ---
>   drivers/resctrl/mpam_devices.c | 91 ++++++++++++++++++++++++++++++----
>   1 file changed, 82 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index bae9fa9441dc..3080a81f0845 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -927,6 +927,48 @@ struct mon_read {
>   	int				err;
>   };
>   
> +static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
> +{
> +	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
> +		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
> +}
> +
> +static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
> +{
> +	int retry = 3;
> +	u32 mbwu_l_low;
> +	u64 mbwu_l_high1, mbwu_l_high2;
> +
> +	mpam_mon_sel_lock_held(msc);
> +
> +	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +	do {
> +		mbwu_l_high1 = mbwu_l_high2;
> +		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
> +		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +
> +		retry--;
> +	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
> +
> +	if (mbwu_l_high1 == mbwu_l_high2)
> +		return (mbwu_l_high1 << 32) | mbwu_l_low;
> +	return MSMON___NRDY_L;
> +}
> +
> +static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
> +{
> +	mpam_mon_sel_lock_held(msc);
> +
> +	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
> +	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
> +}
> +
>   static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>   				   u32 *flt_val)
>   {
> @@ -989,6 +1031,9 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>   static void clean_msmon_ctl_val(u32 *cur_ctl)
>   {
>   	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +
> +	if (FIELD_GET(MSMON_CFG_x_CTL_TYPE, *cur_ctl) == MSMON_CFG_MBWU_CTL_TYPE_MBWU)
> +		*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
>   }
>   
>   static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> @@ -1011,7 +1056,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>   	case mpam_feat_msmon_mbwu:
>   		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
>   		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> -		mpam_write_monsel_reg(msc, MBWU, 0);
> +		if (mpam_ris_has_mbwu_long_counter(m->ris))
> +			mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
> +		else
> +			mpam_write_monsel_reg(msc, MBWU, 0);
> +
>   		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>   
>   		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> @@ -1026,8 +1075,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>   
>   static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
>   {
> -	/* TODO: scaling, and long counters */
> -	return GENMASK_ULL(30, 0);
> +	/* TODO: implement scaling counters */
> +	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
> +		return GENMASK_ULL(62, 0);
> +	else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
> +		return GENMASK_ULL(43, 0);
> +	else
> +		return GENMASK_ULL(30, 0);
>   }
>   
>   /* Call with MSC lock held */
> @@ -1069,10 +1123,24 @@ static void __ris_msmon_read(void *arg)
>   		now = FIELD_GET(MSMON___VALUE, now);
>   		break;
>   	case mpam_feat_msmon_mbwu:
> -		now = mpam_read_monsel_reg(msc, MBWU);
> -		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> -			nrdy = now & MSMON___NRDY;
> -		now = FIELD_GET(MSMON___VALUE, now);
> +		/*
> +		 * If long or lwd counters are supported, use them, else revert
> +		 * to the 31 bit counter.
> +		 */
> +		if (mpam_ris_has_mbwu_long_counter(ris)) {
> +			now = mpam_msc_read_mbwu_l(msc);
> +			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +				nrdy = now & MSMON___NRDY_L;
> +			if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
> +				now = FIELD_GET(MSMON___LWD_VALUE, now);
> +			else
> +				now = FIELD_GET(MSMON___L_VALUE, now);
> +		} else {
> +			now = mpam_read_monsel_reg(msc, MBWU);
> +			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +				nrdy = now & MSMON___NRDY;
> +			now = FIELD_GET(MSMON___VALUE, now);
> +		}
>   
>   		if (nrdy)
>   			break;
> @@ -1360,8 +1428,13 @@ static int mpam_save_mbwu_state(void *arg)
>   		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
>   		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
>   
> -		val = mpam_read_monsel_reg(msc, MBWU);
> -		mpam_write_monsel_reg(msc, MBWU, 0);
> +		if (mpam_ris_has_mbwu_long_counter(ris)) {
> +			val = mpam_msc_read_mbwu_l(msc);
> +			mpam_msc_zero_mbwu_l(msc);
> +		} else {
> +			val = mpam_read_monsel_reg(msc, MBWU);
> +			mpam_write_monsel_reg(msc, MBWU, 0);
> +		}
>   
>   		cfg->mon = i;
>   		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-19 16:11     ` James Morse
@ 2025-09-26 14:48       ` Jonathan Cameron
  2025-10-17 18:50         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-26 14:48 UTC (permalink / raw)
  To: James Morse, Lorenzo Pieralisi, Hanjun Guo
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


Hi James

Just a few things I've picked out to reply to.
Absolutely fine with your other replies.

> 
> >> +	char uid[16];
> >> +	u32 acpi_id;
> >> +
> >> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> >> +		return 0;
> >> +
> >> +	if (table->revision < 1)
> >> +		return 0;
> >> +
> >> +	table_end = (char *)table + table->length;
> >> +
> >> +	while (table_offset < table_end) {
> >> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> >> +		table_offset += tbl_msc->length;
> >> +
> >> +		if (table_offset > table_end) {
> >> +			pr_debug("MSC entry overlaps end of ACPI table\n");
> >> +			break;  
> 
> > That this isn't considered an error is a bit subtle and made me wonder
> > if there was a use of uninitialized pdev (there isn't because err == 0)  
> 
> Its somewhat a philosophical arguement. I don't expect the kernel to have to validate
> these tables, they're not provided by the user and there quickly becomes a point where
> you have to trust them, and they have to be correct.

Potential buffer overrun is to me always worth error out screaming, but I get your
broader point.   Maybe just make it a pr_err()

> At the other extreme is the asusmption the table is line-noise and we should check
> everything to avoid out of bounds accesses. Dave wanted the diagnostic messages on these.
> 
> As this is called from an initcall, the best you get is an inexplicable print message.
> (what should we say - update your firmware?)

Depends on whether you can lean hard on the firmware team. Much easier
for me if I can tell them the board doesn't boot because they got it wrong.

That would have been safer if we had this upstream in advance of hardware, but indeed
a little high risk today as who knows what borked tables are out there.

Personal preference though is to error out on things like this and handle the papering
over at the top level.  Don't put extra effort into checking tables are invalid
but if we happen to notice as part of code safety stuff like sizes then good to scream
about it.

> 
> 
> Silently failing in this code is always safe as the driver has a count of the number of
> MSC, and doesn't start accessing the hardware until its found them all.
> (this is because to find the system wide minimum value - and its not worth starting if
>  its not possible to finish).
> 
> 
> > Why not return here?  
> 
> Just because there was no other return in the loop, and I hate surprise returns.
> 
> I'll change it if it avoids thinking about how that platform_device_put() gets skipped!
> 
> 
> >   
> >> +		}
> >> +
> >> +		/*
> >> +		 * If any of the reserved fields are set, make no attempt to
> >> +		 * parse the MSC structure. This MSC will still be counted,
> >> +		 * meaning the MPAM driver can't probe against all MSC, and
> >> +		 * will never be enabled. There is no way to enable it safely,
> >> +		 * because we cannot determine safe system-wide partid and pmg
> >> +		 * ranges in this situation.
> >> +		 */  
> 
> > This is decidedly paranoid. I'd normally expect the architecture to be based
> > on assumption that is fine for old software to ignore new fields.  ACPI itself
> > has fairly firm rules on this (though it goes wrong sometimes :)  
> 
> Yeah - the MPAM table isn't properly structured as subtables. I don't see how they are
> going to extend it if they need to.
> 
> The paranoia is that anything set in these reserved fields probably indicates something
> the driver needs to know about: a case in point is the way PCC was added.
> 
> I'd much prefer we skip creation of MSC devices that have properties we don't understand.
> acpi_mpam_count_msc() still counts them - which means the driver never finds all the MSC,
> and never touches the hardware.
> 
> MPAM isn't a critical feature, its better that it be disabled than make things worse.
> (the same attitude holds with the response to the MPAM error interrupt - reset everything
>  and pack up shop. This is bettern than accidentally combining important/unimportant
>  tasks)
> 
> 
> > I'm guessing there is something out there that made this necessary though so
> > keep it if you actually need it.  
> 
> It's a paranoid/violent reaction to the way PCC was added - without something like this,
> that would have led to the OS trying to map the 0 page and poking around in it - never
> likely to go well.
> 
> Doing this does let them pull another PCC without stable kernels going wrong.
> Ultimately I think they'll need to replace the table with one that is properly structured.
> For now - this is working with what we have.

Fair enough. I'm too lazy / behind with reviews to go scream via our channels about
problems here.  Paranoia it is.  Maybe we'll end up backporting some 'fixes' that
ignore nicely added fields with appropriate control bits to turn them on.
So be it if that happens.

> 
> 
> >> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
> >> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
> >> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
> >> +			continue;
> >> +		}
> >> +
> >> +		if (!tbl_msc->mmio_size) {
> >> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
> >> +			continue;
> >> +		}
> >> +
> >> +		if (decode_interface_type(tbl_msc, &iface)) {
> >> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
> >> +			continue;
> >> +		}
> >> +
> >> +		next_res = 0;
> >> +		next_prop = 0;
> >> +		memset(res, 0, sizeof(res));
> >> +		memset(props, 0, sizeof(props));
> >> +
> >> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);  
> > 
> > https://lore.kernel.org/all/20241009124120.1124-13-shiju.jose@huawei.com/
> > was a proposal to add a DEFINE_FREE() to clean these up.  Might be worth a revisit.
> > Then Greg was against the use it was put to and asking for an example of where
> > it helped.  Maybe this is that example.
> > 
> > If you do want to do that, I'd factor out a bunch of the stuff here as a helper
> > so we can have the clean ownership pass of a return_ptr().  
> > Similar to what Shiju did here (this is the usecase for platform device that
> > Greg didn't like).
> > https://lore.kernel.org/all/20241009124120.1124-14-shiju.jose@huawei.com/
> > 
> > Even without that I think factoring some of this out and hence being able to
> > do returns on errors and put the if (err) into the loop would be a nice
> > improvement to readability.  
> 
> If you think its more readable I'll structure it like that.

The refactor yes. I'd keep clear of the the DEFINE_FREE() unless you have
some spare time ;)
> 
> 
> >> +		if (!pdev) {
> >> +			err = -ENOMEM;
> >> +			break;
> >> +		}

> >> +int acpi_mpam_count_msc(void)
> >> +{
> >> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> >> +	char *table_end, *table_offset = (char *)(table + 1);
> >> +	struct acpi_mpam_msc_node *tbl_msc;
> >> +	int count = 0;
> >> +
> >> +	if (IS_ERR(table))
> >> +		return 0;
> >> +
> >> +	if (table->revision < 1)
> >> +		return 0;
> >> +
> >> +	table_end = (char *)table + table->length;  
> 
> > Trivial so feel free to ignore.
> > Perhaps should aim for consistency.  Whilst I prefer pointers for this stuff
> > PPTT did use unsigned longs.  
> 
> I prefer the pointers, and as it's a separate file, I don't think it needs to be
> concistent with PPTT.

Fair enough.  Maybe PPTT is ripe for some cleanup once you are done messing with it.
I'm certainly going to add churn now.

J


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-23 16:41     ` James Morse
@ 2025-09-26 14:55       ` Jonathan Cameron
  2025-10-17 18:51         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Jonathan Cameron @ 2025-09-26 14:55 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich


> >> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> >> new file mode 100644
> >> index 000000000000..7c63d590fc98
> >> --- /dev/null
> >> +++ b/drivers/resctrl/mpam_internal.h
> >> @@ -0,0 +1,65 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +// Copyright (C) 2025 Arm Ltd.
> >> +
> >> +#ifndef MPAM_INTERNAL_H
> >> +#define MPAM_INTERNAL_H
> >> +
> >> +#include <linux/arm_mpam.h>
> >> +#include <linux/cpumask.h>
> >> +#include <linux/io.h>
> >> +#include <linux/mailbox_client.h>
> >> +#include <linux/mutex.h>  
> > 
> > spinlock.h  
> 
> Fixed,
> 
> 
> >> +#include <linux/resctrl.h>  
> > 
> > Not spotting anything rsctl yet.  So maybe this belongs later.  
> 
> There shouldn't be anything that depends on resctrl in this series - looks like
> this is a 2018 era bug in the way I carved this up!
> 
> 
> >> +#include <linux/sizes.h>
> >> +
> >> +struct mpam_msc {
> >> +	/* member of mpam_all_msc */
> >> +	struct list_head        all_msc_list;
> >> +
> >> +	int			id;  
> > 
> > I'd follow (approx) include what you use principles to make later header
> > shuffling easier. So a forward def for this.  
> 
> -ENOPARSE
> 
> I'm sure I'll work this out from your later comments.

I missed on the comment (I think). Would have made more sense a line later.
Add a forwards def

struct platform_device;

as no reason to include the appropriate header
(and you didn't anwyay).





^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-11 14:22   ` Jonathan Cameron
@ 2025-09-26 17:52     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-26 17:52 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 11/09/2025 15:22, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:48 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> Add support for creating and destroying structures to allow a hierarchy
>> of resources to be created.

> Various minor things inline.
> 
> Biggest is I think maybe just moving to explicit reference counts
> rather than using the empty list for that might end up easier to
> read. Mostly because everyone knows what a kref_put() is about.
> 
> Obviously a bit pointless in practice, but I prefer not to think too
> hard.

I've spent a while playing with this - its the wrong shape for what is going on here.

This builds that tree structure during driver probing. (and adds 'unknown' entries to it
when poking the hardware). But by the time mpam_enable() is called - its basically
read-only. After that point the only 'write' that will happen is the error interrupt which
just free's everything. The deferred free from SRCU makes that safe.
(some of this will be clearer when I add the block comment about the 'phases' the driver
 goes through that Dave asked for)

Making it 'reference counted' instead is pointless because callers would be get/put-ing
references - but the structure is basically read-only, and not going to go away while the
SRCU reference is held.
One of the things reference counting breaks is the devm_kzalloc() usage - as an error from
the driver probe function will free all of those regions, regardless of what the reference
count says.

I'll look to rename the existing 'get' functions so folk don't immediatly think of get/put!

('find' doesn't really cut it as it does the allocation if it doesn't 'find' anything)


>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index efc4738e3b4d..c7f4981b3545 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -18,7 +18,6 @@
>>  #include <linux/printk.h>
>>  #include <linux/slab.h>
>>  #include <linux/spinlock.h>
>> -#include <linux/srcu.h>
> 
> Why does this include no longer make sense?

It gets moved to mpam_internal.h because of the srcu_struct declaration that is needed for
callers in the resctrl code to walk the classes list.

I can add it to the mpam_internal.h header from the beginning.


>> @@ -31,7 +30,7 @@
>>  static DEFINE_MUTEX(mpam_list_lock);
>>  static LIST_HEAD(mpam_all_msc);
>>  
>> -static struct srcu_struct mpam_srcu;
>> +struct srcu_struct mpam_srcu;
> 
> ...
> 
>> +/* List of all objects that can be free()d after synchronise_srcu() */
>> +static LLIST_HEAD(mpam_garbage);
>> +
>> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
> 
> Whilst this obviously works, I'd rather pass garbage to init_garbage
> instead of the containing structure (where type varies)

Sure,


>> +static struct mpam_component *
>> +mpam_component_get(struct mpam_class *class, int id)
>> +{
>> +	struct mpam_component *comp;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	list_for_each_entry(comp, &class->components, class_list) {
>> +		if (comp->comp_id == id)
>> +			return comp;
>> +	}
>> +
>> +	return mpam_component_alloc(class, id);
>> +}
> 
>> +static struct mpam_class *
>> +mpam_class_get(u8 level_idx, enum mpam_class_types type)
>> +{
>> +	bool found = false;
>> +	struct mpam_class *class;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	list_for_each_entry(class, &mpam_classes, classes_list) {
>> +		if (class->type == type && class->level == level_idx) {
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (found)
>> +		return class;
> 
> Maybe this gets more complex later, but if it doesn't, just return class where you set
> found above.  Matches with pattern used in mpam_component_get() above.

Yup, this is a relic of more complex locking.


>> +
>> +	return mpam_class_alloc(level_idx, type);
>> +}
> 
> 
>> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
>> +{
>> +	struct mpam_vmsc *vmsc = ris->vmsc;
>> +	struct mpam_msc *msc = vmsc->msc;
>> +	struct platform_device *pdev = msc->pdev;
>> +	struct mpam_component *comp = vmsc->comp;
>> +	struct mpam_class *class = comp->class;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	/*
>> +	 * It is assumed affinities don't overlap. If they do the class becomes
>> +	 * unusable immediately.
>> +	 */
>> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
>> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
>> +	clear_bit(ris->ris_idx, &msc->ris_idxs);
>> +	list_del_rcu(&ris->vmsc_list);
>> +	list_del_rcu(&ris->msc_list);
>> +	add_to_garbage(ris);
>> +	ris->garbage.pdev = pdev;
>> +
>> +	if (list_empty(&vmsc->ris))

> See below. I think it is worth using an explicit reference count even
> though that will only reach zero when the list is empty.

By the time eveything is probed, its an almost entirely read-only structure. The only
writer will free absolutely everything. Changing this to structure-by-structure reference
counting will cause busy-work for readers, who are already protected by SRCU.


>> +		mpam_vmsc_destroy(vmsc);
>> +}
> 
> 
>> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>> +				  enum mpam_class_types type, u8 class_id,
>> +				  int component_id)
>> +{
>> +	int err;
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_msc_ris *ris;
>> +	struct mpam_class *class;
>> +	struct mpam_component *comp;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
>> +		return -EINVAL;
>> +
>> +	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
>> +		return -EBUSY;
>> +
>> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
>> +	if (!ris)
>> +		return -ENOMEM;
>> +	init_garbage(ris);
>> +
>> +	class = mpam_class_get(class_id, type);
>> +	if (IS_ERR(class))
>> +		return PTR_ERR(class);
>> +
>> +	comp = mpam_component_get(class, component_id);
>> +	if (IS_ERR(comp)) {
>> +		if (list_empty(&class->components))
>> +			mpam_class_destroy(class);
> 
> Maybe just reference count the classes with a kref and do a put here?
> 
>> +		return PTR_ERR(comp);
>> +	}
>> +
>> +	vmsc = mpam_vmsc_get(comp, msc);
>> +	if (IS_ERR(vmsc)) {
>> +		if (list_empty(&comp->vmsc))
>> +			mpam_comp_destroy(comp);

> Similar to classes I wonder if simple reference counting is cleaner.

(I spent a few days on it - its prettier, but more work for things like the resctrl code
 to get/put references when SRCU already has the write side covered)


>> +		return PTR_ERR(vmsc);
>> +	}
>> +
>> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
>> +	if (err) {
>> +		if (list_empty(&vmsc->ris))
> 
> and here as well.
> 
>> +			mpam_vmsc_destroy(vmsc);
>> +		return err;
>> +	}
>> +
>> +	ris->ris_idx = ris_idx;
>> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
>> +	ris->vmsc = vmsc;
>> +
>> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
>> +
>> +	return 0;
>> +}
> 
>>  /*
>>   * An MSC can control traffic from a set of CPUs, but may only be accessible
>>   * from a (hopefully wider) set of CPUs. The common reason for this is power
>> @@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>>  		return;
>>  
>>  	mutex_lock(&mpam_list_lock);
>> -	platform_set_drvdata(pdev, NULL);
>> -	list_del_rcu(&msc->all_msc_list);
>> -	synchronize_srcu(&mpam_srcu);
>> +	mpam_msc_destroy(msc);
> 
> I'd suggest introducing mpam_msc_destroy() in the earlier patch. Doesn't make a huge
> difference but 2 out of 3 things removed here would be untouched if you do that.

Sure. This is remnant of a patch from Carl that moved all this around.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-11 16:30   ` Markus Elfring
@ 2025-09-26 17:52     ` James Morse
  2025-09-26 18:15       ` Markus Elfring
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-26 17:52 UTC (permalink / raw)
  To: Markus Elfring, linux-acpi, linux-arm-kernel
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang, Ben Horgan,
	Carl Worth, Catalin Marinas, D Scott Phillips, Danilo Krummrich,
	Dave Martin, David Hildenbrand, Drew Fustini, Fenghua Yu,
	Greg Kroah-Hartman, Hanjun Guo, Jamie Iles, Jonathan Cameron,
	Koba Ko, Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rob Herring, Rohit Mathew, Shanker Donthineni,
	Sudeep Holla, Shaopeng Tan, Wang ShaoBo, Will Deacon, Xin Hao

Hi Markus,

On 11/09/2025 17:30, Markus Elfring wrote:
> …
>> +++ b/drivers/resctrl/mpam_devices.c
> …
>>> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>> +		    enum mpam_class_types type, u8 class_id, int component_id)
>> +{
>> +	int err;
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
>> +				     component_id);
>> +	mutex_unlock(&mpam_list_lock);
> …
> 
> Under which circumstances would you become interested to apply a statement
> like “guard(mutex)(&mpam_list_lock);”?
> https://elixir.bootlin.com/linux/v6.17-rc5/source/include/linux/mutex.h#L228

None! The bit of this you cut out is a call to mpam_free_garbage() which calls
synchronize_srcu(). That may sleep for a while. The whole point of the deferred free-ing
is it does not happen under the lock. The 'guard' magic means the compiler gets to choose
when to call unlock.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-26 17:52     ` James Morse
@ 2025-09-26 18:15       ` Markus Elfring
  2025-10-17 18:51         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Markus Elfring @ 2025-09-26 18:15 UTC (permalink / raw)
  To: James Morse, linux-acpi, linux-arm-kernel
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang, Ben Horgan,
	Carl Worth, Catalin Marinas, D Scott Phillips, Danilo Krummrich,
	Dave Martin, David Hildenbrand, Drew Fustini, Fenghua Yu,
	Greg Kroah-Hartman, Hanjun Guo, Jamie Iles, Jonathan Cameron,
	Koba Ko, Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rob Herring, Rohit Mathew, Shanker Donthineni,
	Sudeep Holla, Shaopeng Tan, Wang ShaoBo, Will Deacon, Xin Hao

>> …
>>> +++ b/drivers/resctrl/mpam_devices.c
>> …
>>>> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>> +		    enum mpam_class_types type, u8 class_id, int component_id)
>>> +{
>>> +	int err;
>>> +
>>> +	mutex_lock(&mpam_list_lock);
>>> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
>>> +				     component_id);
>>> +	mutex_unlock(&mpam_list_lock);
>> …
>>
>> Under which circumstances would you become interested to apply a statement
>> like “guard(mutex)(&mpam_list_lock);”?
>> https://elixir.bootlin.com/linux/v6.17-rc5/source/include/linux/mutex.h#L228
> 
> None! The bit of this you cut out is a call to mpam_free_garbage() which calls
> synchronize_srcu(). That may sleep for a while. The whole point of the deferred free-ing
> is it does not happen under the lock. The 'guard' magic means the compiler gets to choose
> when to call unlock.

How does this feedback fit to the proposed addition of a mutex_lock()/mutex_unlock()
call combination (which might be achievable also with another programming interface)?

Regards,
Markus


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-17 11:03   ` Ben Horgan
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 17/09/2025 12:03, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.

>> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
>> new file mode 100644
>> index 000000000000..92b48fa20108
>> --- /dev/null
>> +++ b/drivers/resctrl/Makefile
>> @@ -0,0 +1,4 @@
>> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
>> +mpam-y						+= mpam_devices.o
>> +
>> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> 
> s/cflags/ccflags/


Fixed, thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-12 10:42   ` Ben Horgan
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen

Hi Ben,

On 12/09/2025 11:42, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> Because an MSC can only by accessed from the CPUs in its cpu-affinity
>> set we need to be running on one of those CPUs to probe the MSC
>> hardware.
>>
>> Do this work in the cpuhp callback. Probing the hardware will only
>> happen before MPAM is enabled, walk all the MSCs and probe those we can
>> reach that haven't already been probed as each CPU's online call is made.
>>
>> This adds the low-level MSC register accessors.
>>
>> Once all MSCs reported by the firmware have been probed from a CPU in
>> their respective cpu-affinity set, the probe-time cpuhp callbacks are
>> replaced.  The replacement callbacks will ultimately need to handle
>> save/restore of the runtime MSC state across power transitions, but for
>> now there is nothing to do in them: so do nothing.
>>
>> The architecture's context switch code will be enabled by a static-key,
>> this can be set by mpam_enable(), but must be done from process context,
>> not a cpuhp callback because both take the cpuhp lock.
>> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
>> to test if all the MSCs have been probed. If probing fails, mpam_disable()
>> is scheduled to unregister the cpuhp callbacks and free memory.
>>
>> CC: Lecopzer Chen <lecopzerc@nvidia.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> Changes since v1:
>>  * Removed register bounds check. If the firmware tables are wrong the
>>    resulting translation fault should be enough to debug this.
>>  * Removed '&' in front of a function pointer.
>>  * Pulled mpam_disable() into this patch.
>>  * Disable mpam when probing fails to avoid extra work on broken platforms.
>>  * Added mpam_disbale_reason as there are now two non-debug reasons for this
>>    to happen.
> 
> Looks good to me.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-11 15:07   ` Jonathan Cameron
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen

Hi Jonathan,

On 11/09/2025 16:07, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:50 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Because an MSC can only by accessed from the CPUs in its cpu-affinity
>> set we need to be running on one of those CPUs to probe the MSC
>> hardware.
>>
>> Do this work in the cpuhp callback. Probing the hardware will only
>> happen before MPAM is enabled, walk all the MSCs and probe those we can
>> reach that haven't already been probed as each CPU's online call is made.
>>
>> This adds the low-level MSC register accessors.
>>
>> Once all MSCs reported by the firmware have been probed from a CPU in
>> their respective cpu-affinity set, the probe-time cpuhp callbacks are
>> replaced.  The replacement callbacks will ultimately need to handle
>> save/restore of the runtime MSC state across power transitions, but for
>> now there is nothing to do in them: so do nothing.
>>
>> The architecture's context switch code will be enabled by a static-key,
>> this can be set by mpam_enable(), but must be done from process context,
>> not a cpuhp callback because both take the cpuhp lock.
>> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
>> to test if all the MSCs have been probed. If probing fails, mpam_disable()
>> is scheduled to unregister the cpuhp callbacks and free memory.

> Trivial suggestion inline. Either way
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> +/* Before mpam is enabled, try to probe new MSC */
>> +static int mpam_discovery_cpu_online(unsigned int cpu)
>> +{
>> +	int err = 0;
>> +	struct mpam_msc *msc;
>> +	bool new_device_probed = false;
>> +
>> +	guard(srcu)(&mpam_srcu);
>> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
>> +				 srcu_read_lock_held(&mpam_srcu)) {
>> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
>> +			continue;
>> +
>> +		mutex_lock(&msc->probe_lock);
>> +		if (!msc->probed)
>> +			err = mpam_msc_hw_probe(msc);
>> +		mutex_unlock(&msc->probe_lock);
>> +
>> +		if (!err)
>> +			new_device_probed = true;
>> +		else
>> +			break;

> Unless this going to get more complex why not
> 
> 		if (err)
> 			break;
> 
> 		new_device_probed = true;

Sure - its been both simpler and more complex in the past!


>> +	}
>> +
>> +	if (new_device_probed && !err)
>> +		schedule_work(&mpam_enable_work);
>> +	if (err) {
>> +		mpam_disable_reason = "error during probing";
>> +		schedule_work(&mpam_broken_work);
>> +	}
>> +
>> +	return err;
>> +}
> 
>> +static void mpam_enable_once(void)
>> +{
>> +	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>> +
>> +	pr_info("MPAM enabled\n");

> Feels too noisy given it should be easy enough to tell. pr_dbg() perhaps.

I was aiming for the driver to only print one thing - once all the hardware has been
probed. Once the driver is assembled, this prints the number of PARTID/PMG that were
discovered as the system wide limits.

The reason to print something is that if you see this message, but don't have resctrl
appear in /proc/filesystems - its never going to appear because the resctrl glue code
couldn't find anything it could use. As this isn't an error, so nothing gets printed in
this case.
This is the most common complaint I get - "our platform doesn't look like a Xeon - why
doesn't resctrl work with it?"

It also matters for other requesters, like the SMMU. If they probe after this point, they
can't reduce the PARTID/PMG range - and may get an error and have their MPAM abilities
disabled. Having an entry in the boot log makes this easier to debug.


The alternative would be to keep track of what the driver is up to, and expose that
through debugfs - but information that only exists for debug purposes is likely to be
wrong. It also doesn't help work out what order different drivers tried to probe in.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-09-11 15:18   ` Jonathan Cameron
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

On 11/09/2025 16:18, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:51 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> CPUs can generate traffic with a range of PARTID and PMG values,
>> but each MSC may also have its own maximum size for these fields.
>> Before MPAM can be used, the driver needs to probe each RIS on
>> each MSC, to find the system-wide smallest value that can be used.
>> The limits from requestors (e.g. CPUs) also need taking into account.
>>
>> While doing this, RIS entries that firmware didn't describe are created
>> under MPAM_CLASS_UNKNOWN.
>>
>> While we're here, implement the mpam_register_requestor() call
>> for the arch code to register the CPU limits. Future callers of this
>> will tell us about the SMMU and ITS.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>

> Trivial stuff inline.
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>>  drivers/resctrl/mpam_devices.c  | 150 +++++++++++++++++++++++++++++++-
>>  drivers/resctrl/mpam_internal.h |   6 ++
>>  include/linux/arm_mpam.h        |  14 +++
>>  3 files changed, 169 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index c265376d936b..24dc81c15ec8 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
> 
> 
>> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>> +{
>> +	int err = 0;
>> +
>> +	spin_lock(&partid_max_lock);

> guard() perhaps so you can return early in the error pat and avoid
> need for local variable err.

Negh ... okay. I dislike the guard thing as its never clear when the lock is unlocked.
I'm not a fan of spooky action at a distance!


>> +	if (!partid_max_init) {
>> +		mpam_partid_max = partid_max;
>> +		mpam_pmg_max = pmg_max;
>> +		partid_max_init = true;
>> +	} else if (!partid_max_published) {
>> +		mpam_partid_max = min(mpam_partid_max, partid_max);
>> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
>> +	} else {
>> +		/* New requestors can't lower the values */
>> +		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
>> +			err = -EBUSY;
>> +	}
>> +	spin_unlock(&partid_max_lock);
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(mpam_register_requestor);
> 
>> @@ -470,9 +547,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
>> +						   u8 ris_idx)
>> +{
>> +	int err;
>> +	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	if (!test_bit(ris_idx, &msc->ris_idxs)) {
>> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
>> +					     0, 0);
>> +		if (err)
>> +			return ERR_PTR(err);
>> +	}
>> +
>> +	list_for_each_entry(ris, &msc->ris, msc_list) {
>> +		if (ris->ris_idx == ris_idx) {
>> +			found = ris;
> I'd return ris;
> 
> Then can do return ERR_PTR(-ENOENT) below and not bother with found.
> 
> Ignore if this gets more complex later.

Thank - this is another relic of more complex locking...
Fixed as you suggested.


>> +			break;
>> +		}
>> +	}
>> +
>> +	return found;
>> +}
> 
>> @@ -675,9 +813,18 @@ static struct platform_driver mpam_msc_driver = {
>>  
>>  static void mpam_enable_once(void)
>>  {
>> +	/*
>> +	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
>> +	 * longer change.
>> +	 */
>> +	spin_lock(&partid_max_lock);
>> +	partid_max_published = true;
>> +	spin_unlock(&partid_max_lock);
>> +
>>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>>  
>> -	pr_info("MPAM enabled\n");
>> +	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
>> +	       mpam_partid_max + 1, mpam_pmg_max + 1);

> Not sure why pr_info before and printk now.  

That looks like a conflict gone wrong!
Fixed.


Thanks,

James



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-09-12 11:11   ` Ben Horgan
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 12:11, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> CPUs can generate traffic with a range of PARTID and PMG values,
>> but each MSC may also have its own maximum size for these fields.
>> Before MPAM can be used, the driver needs to probe each RIS on
>> each MSC, to find the system-wide smallest value that can be used.
>> The limits from requestors (e.g. CPUs) also need taking into account.
>>
>> While doing this, RIS entries that firmware didn't describe are created
>> under MPAM_CLASS_UNKNOWN.
>>
>> While we're here, implement the mpam_register_requestor() call
>> for the arch code to register the CPU limits. Future callers of this
>> will tell us about the SMMU and ITS.

> Looks good to me. I think Jonathan's comment on getting rid of the local
> variable, 'found', is worthwhile.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks!


James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-11 15:24   ` Jonathan Cameron
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 16:24, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:52 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
>> interrupt, and when taking an IPI to access these registers on platforms
>> where MSC are not accesible from every CPU. This makes an irqsave
>> spinlock the obvious lock to protect these registers. On systems with SCMI
>> mailboxes it must be able to sleep, meaning a mutex must be used. The
>> SCMI platforms can't support an overflow interrupt.
>>
>> Clearly these two can't exist for one MSC at the same time.
>>
>> Add helpers for the MON_SEL locking. The outer lock must be taken in a
>> pre-emptible context before the inner lock can be taken. On systems with
>> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
>> will fail to be 'taken' if the caller is unable to sleep. This will allow
>> callers to fail without having to explicitly check the interface type of
>> each MSC.

> Comments talk about outer locks, but not actually seeing that in the current code.

Ugh - I squashed them all together because without the DT support the DT:SCMI support
ceases to be relevant, the ACPI PCC support isn't here yet, and Dave complained this
was complex. I forgot to rewrite the commit message!

The last paragraph is rewritten as:
------------%<------------
Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
only support 'real' MMIO platforms.

In the future this lock will be split in two allowing SCMI/PCC platforms
to take a mutex. Because there are contexts where the SCMI/PCC platforms
can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
this now, so that all the error handling on these paths is present. This
allows the relevant paths to fail if they are needed on a platform where
this isn't possible, instead of having to make explicit checks of the
interface type.
------------%<------------

It took invasive changes to make the control path safe for these firmware backed
platforms. I really don't want to 'simplify' it pretending they don't exist, to
then spend the following month retrofitting this.
I expect the firmware backed platforms to expose one or two global MSC, so they
should never hit a case where a mon_sel register access was sent via IPI.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-11 15:31   ` Ben Horgan
@ 2025-09-29 17:44     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:44 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 11/09/2025 16:31, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
>> interrupt, and when taking an IPI to access these registers on platforms
>> where MSC are not accesible from every CPU. This makes an irqsave
>> spinlock the obvious lock to protect these registers. On systems with SCMI
>> mailboxes it must be able to sleep, meaning a mutex must be used. The
>> SCMI platforms can't support an overflow interrupt.
>>
>> Clearly these two can't exist for one MSC at the same time.
>>
>> Add helpers for the MON_SEL locking. The outer lock must be taken in a
>> pre-emptible context before the inner lock can be taken. On systems with
>> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
>> will fail to be 'taken' if the caller is unable to sleep. This will allow
>> callers to fail without having to explicitly check the interface type of
>> each MSC.

>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 828ce93c95d5..4cc44d4e21c4 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -70,12 +70,17 @@ struct mpam_msc {
>>  
>>  	/*
>>  	 * mon_sel_lock protects access to the MSC hardware registers that are
>> -	 * affected by MPAMCFG_MON_SEL.
>> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
>> +	 * Access to mon_sel is needed from both process and interrupt contexts,
>> +	 * but is complicated by firmware-backed platforms that can't make any
>> +	 * access unless they can sleep.
>> +	 * Always use the mpam_mon_sel_lock() helpers.
>> +	 * Accessed to mon_sel need to be able to fail if they occur in the wrong
>> +	 * context.
>>  	 * If needed, take msc->probe_lock first.
>>  	 */
>> -	struct mutex		outer_mon_sel_lock;
>> -	raw_spinlock_t		inner_mon_sel_lock;
>> -	unsigned long		inner_mon_sel_flags;
>> +	raw_spinlock_t		_mon_sel_lock;
>> +	unsigned long		_mon_sel_flags;
>>  
> 
> These stale variables can be removed in the patch that introduced them,
> outer_mon_sel_lock, inner_mon_sel_lock, inner_mon_sel_flags. Jonathan
> has already pointed out the stale comment and paragraph in the commit
> message.

Yeah - I forgot to rewrite the commit message when I split this patch.
I'll pull those earlier bits into the now:later patch that splits the locking up.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-11 15:29   ` Jonathan Cameron
@ 2025-09-29 17:45     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:45 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 11/09/2025 16:29, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:53 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Expand the probing support with the control and monitor types
>> we can use with resctrl.
>>
>> CC: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> A few trivial things inline.
> LGTM
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> @@ -592,6 +736,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>  	mutex_lock(&msc->part_sel_lock);
>>  	idr = mpam_msc_read_idr(msc);
>>  	mutex_unlock(&msc->part_sel_lock);
>> +
> Stray change  - push it to earlier patch.

Fixed,


>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 4cc44d4e21c4..5ae5d4eee8ec 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>>  	raw_spin_lock_init(&msc->_mon_sel_lock);
>>  }
>>  
>> +/*
>> + * When we compact the supported features, we don't care what they are.
>> + * Storing them as a bitmap makes life easy.
>> + */
>> +typedef u16 mpam_features_t;
> 
> Maybe use a bitmap type and avoid the need to be careful on sizing etc?

That would be unsigned long at a minimum, which is four times larger than needed.
As there is a build-time check, I'm not worried about this ever being wrong...


>> +
>> +/* Bits for mpam_features_t */
>> +enum mpam_device_features {
>> +	mpam_feat_ccap_part = 0,
>> +	mpam_feat_cpor_part,
>> +	mpam_feat_mbw_part,
>> +	mpam_feat_mbw_min,
>> +	mpam_feat_mbw_max,
>> +	mpam_feat_mbw_prop,
>> +	mpam_feat_msmon,
>> +	mpam_feat_msmon_csu,
>> +	mpam_feat_msmon_csu_capture,
>> +	mpam_feat_msmon_csu_hw_nrdy,
>> +	mpam_feat_msmon_mbwu,
>> +	mpam_feat_msmon_mbwu_capture,
>> +	mpam_feat_msmon_mbwu_rwbw,
>> +	mpam_feat_msmon_mbwu_hw_nrdy,
>> +	mpam_feat_msmon_capt,
>> +	MPAM_FEATURE_LAST,
> 
> If it's always meant to be LAST, I'd drop the trailing comma.

Sure. Full-stops for enums!


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-11 15:37   ` Ben Horgan
@ 2025-09-29 17:45     ` James Morse
  2025-09-30 13:32       ` Ben Horgan
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-29 17:45 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 11/09/2025 16:37, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> Expand the probing support with the control and monitor types
>> we can use with resctrl.

>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 4cc44d4e21c4..5ae5d4eee8ec 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>>  	raw_spin_lock_init(&msc->_mon_sel_lock);
>>  }
>>  
>> +/*
>> + * When we compact the supported features, we don't care what they are.
>> + * Storing them as a bitmap makes life easy.
>> + */
>> +typedef u16 mpam_features_t;
>> +
>> +/* Bits for mpam_features_t */
>> +enum mpam_device_features {
>> +	mpam_feat_ccap_part = 0,
>> +	mpam_feat_cpor_part,
>> +	mpam_feat_mbw_part,
>> +	mpam_feat_mbw_min,
>> +	mpam_feat_mbw_max,
>> +	mpam_feat_mbw_prop,
>> +	mpam_feat_msmon,
>> +	mpam_feat_msmon_csu,
>> +	mpam_feat_msmon_csu_capture,
>> +	mpam_feat_msmon_csu_hw_nrdy,
>> +	mpam_feat_msmon_mbwu,
>> +	mpam_feat_msmon_mbwu_capture,
>> +	mpam_feat_msmon_mbwu_rwbw,
>> +	mpam_feat_msmon_mbwu_hw_nrdy,
>> +	mpam_feat_msmon_capt,
>> +	MPAM_FEATURE_LAST,
>> +};

> I added a garbled comment about this for v1. What I was trying to say is
> that I don't think this quite matches what resctrl supports. For
> instance, I don't think mpam_feat_ccap_part matches a resctrl feature.

Ah - right. I thought you meant something was removed later.
Looks like I thought something could be emulated with CCAP, but that turns out not to be
true because it doesn't have an implicit isolation property, which the
resctrl:bitmap-from-userspace requires.
(I think rwbw was a later addition to the architecture and I added it to the wrong patch).

I'll move that, _prop and _rwbw to the later patch. The split is fairly arbitrary - it was
just somewhere to split an otherwise large patch, and does help determine if a bug is
going to be visible to user-space or not.

_capt can go completely. Last I heard no-one was interested in firmware descriptions of
how the capture hardware can be triggered. I suspect no-one has done anything with it.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-09-12 11:49   ` Jonathan Cameron
@ 2025-09-29 17:45     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-29 17:45 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 12/09/2025 12:49, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:54 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> To make a decision about whether to expose an mpam class as
>> a resctrl resource we need to know its overall supported
>> features and properties.
>>
>> Once we've probed all the resources, we can walk the tree
>> and produce overall values by merging the bitmaps. This
>> eliminates features that are only supported by some MSC
>> that make up a component or class.
>>
>> If bitmap properties are mismatched within a component we
>> cannot support the mismatched feature.
>>
>> Care has to be taken as vMSC may hold mismatched RIS.

> A trivial things inline.
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> +/*
>> + * Combine two props fields.
>> + * If this is for controls that alias the same resource, it is safe to just
>> + * copy the values over. If two aliasing controls implement the same scheme
>> + * a safe value must be picked.
>> + * For non-aliasing controls, these control different resources, and the
>> + * resulting safe value must be compatible with both. When merging values in
>> + * the tree, all the aliasing resources must be handled first.
>> + * On mismatch, parent is modified.
>> + */
>> +static void __props_mismatch(struct mpam_props *parent,
>> +			     struct mpam_props *child, bool alias)
>> +{
>> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
>> +		parent->cpbm_wd = child->cpbm_wd;
>> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
>> +				   cpbm_wd, alias)) {
>> +		pr_debug("%s cleared cpor_part\n", __func__);
>> +		mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
>> +		parent->cpbm_wd = 0;
>> +	}
>> +
>> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
>> +		parent->mbw_pbm_bits = child->mbw_pbm_bits;
>> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
>> +				   mbw_pbm_bits, alias)) {
>> +		pr_debug("%s cleared mbw_part\n", __func__);
>> +		mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
>> +		parent->mbw_pbm_bits = 0;
>> +	}
>> +
>> +	/* bwa_wd is a count of bits, fewer bits means less precision */
>> +	if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {
> 
> Seems like an overly long line given other local wrapping.

Fixed.


>> +		parent->bwa_wd = child->bwa_wd;
>> +	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
>> +				     bwa_wd, alias)) {

>> +		pr_debug("%s took the min bwa_wd\n", __func__);

These __func__ arguments need to go as pr_fmt has this covered.


>> +		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
>> +	}
>> +
>> +	/* For num properties, take the minimum */
>> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
>> +		parent->num_csu_mon = child->num_csu_mon;
>> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
>> +				   num_csu_mon, alias)) {
>> +		pr_debug("%s took the min num_csu_mon\n", __func__);
>> +		parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
>> +	}
>> +
>> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
>> +		parent->num_mbwu_mon = child->num_mbwu_mon;
>> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
>> +				   num_mbwu_mon, alias)) {
>> +		pr_debug("%s took the min num_mbwu_mon\n", __func__);
>> +		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
>> +	}


>> +
>> +/*
>> + * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
>> + * For 'num' properties we can just take the minimum.
>> + * For properties where the mismatched unused bits would make a difference, we
>> + * nobble the class feature, as we can't configure all the resources.
>> + * e.g. The L3 cache is composed of two resources with 13 and 17 portion
>> + * bitmaps respectively.
>> + */
>> +static void
>> +__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
> 
> I'm not really sure what the __ prefix denotes here.

Just that its the innards of something and needs comprehending as part of its caller.


>> +{
>> +	struct mpam_props *cprops = &class->props;
>> +	struct mpam_props *vprops = &vmsc->props;
>> +
>> +	lockdep_assert_held(&mpam_list_lock); /* we modify class */
>> +
>> +	pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
>> +		 dev_name(&vmsc->msc->pdev->dev),
>> +		 (long)cprops->features, (long)vprops->features);

> According to https://docs.kernel.org/core-api/printk-formats.html
> should be fine using %x for u16 values. So why dance through a cast to long?

To isolate it from a subsequent change that makes that field a u32, the existence of which
means one day it'll be a u64. If it ever gets bigger than unsigned-long, it'll need to be
a bitmap array, which would need this code to change. Until then doing, it like this makes
changes to the size less churny.


>> +
>> +	/* Take the safe value for any common features */
>> +	__props_mismatch(cprops, vprops, false);
>> +}
>> +
>> +static void
>> +__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
>> +{
>> +	struct mpam_props *rprops = &ris->props;
>> +	struct mpam_props *vprops = &vmsc->props;
>> +
>> +	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
>> +
>> +	pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
>> +		 dev_name(&vmsc->msc->pdev->dev),
>> +		 (long)vprops->features, (long)rprops->features);
> 
> Same as above comment on casts being unnecessary.

I expect to have to change this in the future.

As these two debug messages have a dev to hand, they should probably use dev_debug()
instead of manually printing the dev_name().


>> +
>> +	/*
>> +	 * Merge mismatched features - Copy any features that aren't common,
>> +	 * but take the safe value for any common features.
>> +	 */
>> +	__props_mismatch(vprops, rprops, true);
>> +}
>> +
>> +/*
>> + * Copy the first component's first vMSC's properties and features to the
>> + * class. __class_props_mismatch() will remove conflicts.
>> + * It is not possible to have a class with no components, or a component with
>> + * no resources. The vMSC properties have already been built.
> 
> If it's not possible do we need the defensive _or_null and error checks?

This is just paranoia. I've removed it.


>> + */
>> +static void mpam_enable_init_class_features(struct mpam_class *class)
>> +{
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_component *comp;
>> +
>> +	comp = list_first_entry_or_null(&class->components,
>> +					struct mpam_component, class_list);
>> +	if (WARN_ON(!comp))
>> +		return;
>> +
>> +	vmsc = list_first_entry_or_null(&comp->vmsc,
>> +					struct mpam_vmsc, comp_list);
>> +	if (WARN_ON(!vmsc))
>> +		return;
>> +
>> +	class->props = vmsc->props;
>> +}
> 
>> +/*
>> + * Merge all the common resource features into class.
>> + * vmsc features are bitwise-or'd together, this must be done first.
> 
> I'm not sure what 'this' is here - I think it's a missing plural that has
> me confused.  Perhaps 'these must be done first.'

The bitwise-or. It's singular because its done for each class/component at a time.

The reason is so that mpam_enable_init_class_features() see's all the features in the
first vmsc, not a subset of them. It's because vmsc hold non-overlapping features, which
is all to support a platform that has two RIS with different types of control that do the
same thing. (made up example:) memory min/max on ingress and memory portions on egress.
Both are memory bandwidth for the same hardware block, but for whatever structural reason,
it gets exposed as separate RIS.

Rephrased as:
| * vmsc features are bitwise-or'd together as the first step so that
| * mpam_enable_init_class_features() can initialise the class with a
| * representive set of features.



>> + * Next the class features are the bitwise-and of all the vmsc features.
>> + * Other features are the min/max as appropriate.
>> + *
>> + * To avoid walking the whole tree twice, the class->nrdy_usec property is
>> + * updated when working with the vmsc as it is a max(), and doesn't need
>> + * initialising first.
> 
> Perhaps state that this comment is about what happens in each call of
> mpam_enable_merge_vmsc_features()  Or move the comment to that function.

Sure. The comment is to try and stop people 'fixing' it to only loop over
class->components once. That'll work on 99% of platforms, but discard all the
controls on a few strange ones.


>> + */
>> +static void mpam_enable_merge_features(struct list_head *all_classes_list)
>> +{
>> +	struct mpam_class *class;
>> +	struct mpam_component *comp;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	list_for_each_entry(class, all_classes_list, classes_list) {
>> +		list_for_each_entry(comp, &class->components, class_list)
>> +			mpam_enable_merge_vmsc_features(comp);
>> +
>> +		mpam_enable_init_class_features(class);
>> +
>> +		list_for_each_entry(comp, &class->components, class_list)
>> +			mpam_enable_merge_class_features(comp);
>> +	}
>> +}
> 
> 


Thanks,

James



^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-10 20:42 ` [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
  2025-09-12 11:25   ` Ben Horgan
  2025-09-12 11:55   ` Jonathan Cameron
@ 2025-09-30  2:51   ` Shaopeng Tan (Fujitsu)
  2025-10-01  9:51     ` James Morse
       [not found]   ` <1f084a23-7211-4291-99b6-7f5192fb9096@nvidia.com>
  3 siblings, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-30  2:51 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Rohit Mathew

Hello James,

There is a space in "cpu hp" in the title.

Best reagrards,
Shaopeng TAN



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-29 17:45     ` James Morse
@ 2025-09-30 13:32       ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-30 13:32 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/29/25 18:45, James Morse wrote:
> Hi Ben,
> 
> On 11/09/2025 16:37, Ben Horgan wrote:
>> On 9/10/25 21:42, James Morse wrote:
>>> Expand the probing support with the control and monitor types
>>> we can use with resctrl.
> 
>>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>>> index 4cc44d4e21c4..5ae5d4eee8ec 100644
>>> --- a/drivers/resctrl/mpam_internal.h
>>> +++ b/drivers/resctrl/mpam_internal.h
>>> @@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>>>  	raw_spin_lock_init(&msc->_mon_sel_lock);
>>>  }
>>>  
>>> +/*
>>> + * When we compact the supported features, we don't care what they are.
>>> + * Storing them as a bitmap makes life easy.
>>> + */
>>> +typedef u16 mpam_features_t;
>>> +
>>> +/* Bits for mpam_features_t */
>>> +enum mpam_device_features {
>>> +	mpam_feat_ccap_part = 0,
>>> +	mpam_feat_cpor_part,
>>> +	mpam_feat_mbw_part,
>>> +	mpam_feat_mbw_min,
>>> +	mpam_feat_mbw_max,
>>> +	mpam_feat_mbw_prop,
>>> +	mpam_feat_msmon,
>>> +	mpam_feat_msmon_csu,
>>> +	mpam_feat_msmon_csu_capture,
>>> +	mpam_feat_msmon_csu_hw_nrdy,
>>> +	mpam_feat_msmon_mbwu,
>>> +	mpam_feat_msmon_mbwu_capture,
>>> +	mpam_feat_msmon_mbwu_rwbw,
>>> +	mpam_feat_msmon_mbwu_hw_nrdy,
>>> +	mpam_feat_msmon_capt,
>>> +	MPAM_FEATURE_LAST,
>>> +};
> 
>> I added a garbled comment about this for v1. What I was trying to say is
>> that I don't think this quite matches what resctrl supports. For
>> instance, I don't think mpam_feat_ccap_part matches a resctrl feature.
> 
> Ah - right. I thought you meant something was removed later.
> Looks like I thought something could be emulated with CCAP, but that turns out not to be
> true because it doesn't have an implicit isolation property, which the
> resctrl:bitmap-from-userspace requires.
> (I think rwbw was a later addition to the architecture and I added it to the wrong patch).
> 
> I'll move that, _prop and _rwbw to the later patch. The split is fairly arbitrary - it was

I think ccap gets split into finer grained features later on which seems
fine.

> just somewhere to split an otherwise large patch, and does help determine if a bug is
> going to be visible to user-space or not.

Ok, sensible, I hadn't appreciated the user-space visibility aspect.

> 
> _capt can go completely. Last I heard no-one was interested in firmware descriptions of
> how the capture hardware can be triggered. I suspect no-one has done anything with it.

So, if I've understood correctly this leaves you with the following in
this patch.

mpam_feat_cpor_part,
mpam_feat_mbw_part,
mpam_feat_mbw_min,
mpam_feat_mbw_max,
mpam_feat_mbw_prop,
mpam_feat_msmon,
mpam_feat_msmon_csu,
mpam_feat_msmon_csu_hw_nrdy,
mpam_feat_msmon_mbwu,
mpam_feat_msmon_mbwu_hw_nrdy,

Looks like the correct, resctrl based, split.

> 
> 
> Thanks,
> 
> James

Thanks,

Ben



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-12 12:02   ` Jonathan Cameron
@ 2025-09-30 17:06     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-30 17:06 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 13:02, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:57 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> cpuhp callbacks aren't the only time the MSC configuration may need to
>> be reset. Resctrl has an API call to reset a class.
>> If an MPAM error interrupt arrives it indicates the driver has
>> misprogrammed an MSC. The safest thing to do is reset all the MSCs
>> and disable MPAM.
>>
>> Add a helper to reset RIS via their class. Call this from mpam_disable(),
>> which can be scheduled from the error interrupt handler.

>> Changes since v1:
>>  * Use guard macro for srcu.
> 
> I'm not seeing a strong reason for doing this for the case here and not
> for cases in earlier patches like in mpam_cpu_online() 

I just missed them...


> I'm a fan of using
> these broadly in a given code base, so would guard(srcu) in those earlier patches
> as well.

I've done the online/offline - I'll take another pass through them.


> Anyhow, one other trivial thing inline that you can ignore or not as you wish.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index e7faf453b5d7..a9d3c4b09976 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c

>> @@ -1340,8 +1338,56 @@ static void mpam_enable_once(void)
>> +static void mpam_reset_component_locked(struct mpam_component *comp)
>> +{
>> +	struct mpam_msc *msc;
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_msc_ris *ris;
>> +
>> +	lockdep_assert_cpus_held();
>> +
>> +	guard(srcu)(&mpam_srcu);
>> +	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
>> +				 srcu_read_lock_held(&mpam_srcu)) {
>> +		msc = vmsc->msc;
> 
> Might be worth reducing scope of msc and ris

Sure,


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-12 11:25   ` Ben Horgan
  2025-09-12 14:52     ` Ben Horgan
@ 2025-09-30 17:06     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-30 17:06 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 12:25, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> When a CPU comes online, it may bring a newly accessible MSC with
>> it. Only the default partid has its value reset by hardware, and
>> even then the MSC might not have been reset since its config was
>> previously dirtyied. e.g. Kexec.
>>
>> Any in-use partid must have its configuration restored, or reset.
>> In-use partids may be held in caches and evicted later.
>>
>> MSC are also reset when CPUs are taken offline to cover cases where
>> firmware doesn't reset the MSC over reboot using UEFI, or kexec
>> where there is no firmware involvement.
>>
>> If the configuration for a RIS has not been touched since it was
>> brought online, it does not need resetting again.
>>
>> To reset, write the maximum values for all discovered controls.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index cd8e95fa5fd6..0353313cf284 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -777,8 +778,110 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)

>> +static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
>> +{
>> +	struct mpam_msc *msc = ris->vmsc->msc;
>> +	struct mpam_props *rprops = &ris->props;
>> +
>> +	mpam_assert_srcu_read_lock_held();
>> +
>> +	mutex_lock(&msc->part_sel_lock);
>> +	__mpam_part_sel(ris->ris_idx, partid, msc);
>> +
>> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
>> +		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
>> +		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
>> +		mpam_write_partsel_reg(msc, MBW_MIN, 0);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
>> +		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
>> +		mpam_write_partsel_reg(msc, MBW_PROP, 0);


> If mpam_feat_ccap_part is already in enum mpam_device_features then the
> reset would belong here but I expect it is better just to introduce
> mpam_feat_ccap_part later (patch 21). I also commented on this feature
> introduction split on patch 13.

Yup, this is some knock on cleanup from that change.
(I didn't understand what you meant the first time round!)


>> +	mutex_unlock(&msc->part_sel_lock);
>> +}

>> +static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>> +{
>> +	struct mpam_msc_ris *ris;
>> +
>> +	mpam_assert_srcu_read_lock_held();

> Unneeded? Checked in list_for_each_entry_srcu().> +

So it is! I'll rip those out. They were mostly for documentation anyway.
There will likely be a few others of these in the series...


>> +	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
>> +		mpam_reset_ris(ris);
>> +
>> +		/*
>> +		 * Set in_reset_state when coming online. The reset state
>> +		 * for non-zero partid may be lost while the CPUs are offline.
>> +		 */
>> +		ris->in_reset_state = online;
>> +	}
>> +}
>> +


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-12 14:52     ` Ben Horgan
@ 2025-09-30 17:06       ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-30 17:06 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 15:52, Ben Horgan wrote:
> On 9/12/25 12:25, Ben Horgan wrote:
>> Hi James,
>>
>> On 9/10/25 21:42, James Morse wrote:
>>> When a CPU comes online, it may bring a newly accessible MSC with
>>> it. Only the default partid has its value reset by hardware, and
>>> even then the MSC might not have been reset since its config was
>>> previously dirtyied. e.g. Kexec.
>>>
>>> Any in-use partid must have its configuration restored, or reset.
>>> In-use partids may be held in caches and evicted later.
>>>
>>> MSC are also reset when CPUs are taken offline to cover cases where
>>> firmware doesn't reset the MSC over reboot using UEFI, or kexec
>>> where there is no firmware involvement.
>>>
>>> If the configuration for a RIS has not been touched since it was
>>> brought online, it does not need resetting again.
>>>
>>> To reset, write the maximum values for all discovered controls.

>>> +static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>>> +{
>>> +	struct mpam_msc_ris *ris;
>>> +
>>> +	mpam_assert_srcu_read_lock_held();
>>
>> Unneeded? Checked in list_for_each_entry_srcu().> +
> 
> If you do get rid of this then that leaves one use of the helper,
> mpam_assert_srcu_read_lock_held(), and so the helper could go.

By the end of the series, yes. But there are transiently a few more until then - they get
removed and replaced with comments when those functions get called by IPI as lockdep
expects the lock to be held by current, which isn't true if you IPI'd.
I'll drop the helper.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-12 11:55   ` Jonathan Cameron
@ 2025-09-30 17:06     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-30 17:06 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 12:55, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:55 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> When a CPU comes online, it may bring a newly accessible MSC with
>> it. Only the default partid has its value reset by hardware, and
>> even then the MSC might not have been reset since its config was
>> previously dirtyied. e.g. Kexec.
>>
>> Any in-use partid must have its configuration restored, or reset.
>> In-use partids may be held in caches and evicted later.
>>
>> MSC are also reset when CPUs are taken offline to cover cases where
>> firmware doesn't reset the MSC over reboot using UEFI, or kexec
>> where there is no firmware involvement.
>>
>> If the configuration for a RIS has not been touched since it was
>> brought online, it does not need resetting again.
>>
>> To reset, write the maximum values for all discovered controls.

> Just one trivial passing comment from me.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index cd8e95fa5fd6..0353313cf284 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -818,6 +921,20 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
>>  
>>  static int mpam_cpu_offline(unsigned int cpu)
>>  {
>> +	int idx;
>> +	struct mpam_msc *msc;
>> +
>> +	idx = srcu_read_lock(&mpam_srcu);

> Might be worth using
> guard(srcu)(&mpam_srcu);
> here but only real advantage it bring is in hiding the local idx variable
> away.  

Sure. I did that in a few other places, it looks like I either missed this, or thought it
got more complicated later...


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-09-12 11:57   ` Jonathan Cameron
@ 2025-10-01  9:50     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-01  9:50 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 12/09/2025 12:57, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:56 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Resetting RIS entries from the cpuhp callback is easy as the
>> callback occurs on the correct CPU. This won't be true for any other
>> caller that wants to reset or configure an MSC.
>>
>> Add a helper that schedules the provided function if necessary.
>>
>> Callers should take the cpuhp lock to prevent the cpuhp callbacks from
>> changing the MSC state.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


Thanks!

James



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-09-30  2:51   ` Shaopeng Tan (Fujitsu)
@ 2025-10-01  9:51     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-01  9:51 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Shaopeng,

On 30/09/2025 03:51, Shaopeng Tan (Fujitsu) wrote:
> There is a space in "cpu hp" in the title.

Yeah, I'm inconsistent in how I spell that. I'll fix it.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
                     ` (2 preceding siblings ...)
  2025-09-16 13:17   ` [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready Zeng Heng
@ 2025-10-02  3:21   ` Fenghua Yu
  2025-10-17 18:50     ` James Morse
  2025-10-03  0:58   ` Gavin Shan
  4 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-02  3:21 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
>
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data.
>
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> Changes since v1:
>   * Whitespace.
>   * Gave GLOBAL_AFFINITY a pre-processor'd name.
>   * Fixed assumption that there are zero functional dependencies.
>   * Bounds check walking of the MSC RIS.
>   * More bounds checking in the main table walk.
>   * Check for nonsense numbers of function dependencies.
>   * Smattering of pr_debug() to help folk feeding line-noise to the parser.
>   * Changed the comment flavour on the SPDX string.
>   * Removed additional table check.
>   * More comment wrangling.
>
> Changes since RFC:
>   * Used DEFINE_RES_IRQ_NAMED() and friends macros.
>   * Additional error handling.
>   * Check for zero sized MSC.
>   * Allow table revisions greater than 1. (no spec for revision 0!)
>   * Use cleanup helpers to retrive ACPI tables, which allows some functions
>     to be folded together.
> ---
>   arch/arm64/Kconfig          |   1 +
>   drivers/acpi/arm64/Kconfig  |   3 +
>   drivers/acpi/arm64/Makefile |   1 +
>   drivers/acpi/arm64/mpam.c   | 361 ++++++++++++++++++++++++++++++++++++
>   drivers/acpi/tables.c       |   2 +-
>   include/linux/acpi.h        |  12 ++
>   include/linux/arm_mpam.h    |  48 +++++
>   7 files changed, 427 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/acpi/arm64/mpam.c
>   create mode 100644 include/linux/arm_mpam.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4be8a13505bf..6487c511bdc6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>   
>   config ARM64_MPAM
>   	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>   	help
>   	  Memory System Resource Partitioning and Monitoring (MPAM) is an
>   	  optional extension to the Arm architecture that allows each
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>   
>   config ACPI_APMT
>   	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>   obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>   obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>   obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>   obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>   obj-$(CONFIG_ARM_AMBA)		+= amba.o
>   obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..fd9cfa143676
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,361 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
> +
> +static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
> +				   u32 flags, int *irq,
> +				   u32 processor_container_uid)
> +{
> +	int sense;
> +
> +	if (!intid)
> +		return false;
> +
> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return false;
> +
> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
> +
> +	if (16 <= intid && intid < 32 && processor_container_uid != GLOBAL_AFFINITY) {
> +		pr_err_once("Partitioned interrupts not supported\n");
> +		return false;
> +	}
> +
> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> +	if (*irq <= 0) {
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> +			    intid);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, aff;
> +	int irq;
> +
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->overflow_interrupt_affinity;
> +	else
> +		aff = GLOBAL_AFFINITY;
> +	if (acpi_mpam_register_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> +
> +	flags = tbl_msc->error_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->error_interrupt_affinity;
> +	else
> +		aff = GLOBAL_AFFINITY;
> +	if (acpi_mpam_register_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);

Since level could be negative value here, printing it as %u converts it 
to positive value and will cause debug difficulty. For example, -ENOENT 
returned by find_acpi_cache_level_from_id() will be printed as 
4294967294(instead of -2) which is hard to know the error code.

Suggest to change this to %d:

			pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);

> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	int i, err;
> +	char *ptr, *table_end;
> +	struct acpi_mpam_resource_node *resource;
> +
> +	ptr = (char *)(tbl_msc + 1);
> +	table_end = ptr + tbl_msc->length;
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		u64 max_deps, remaining_table;
> +
> +		if (ptr + sizeof(*resource) > table_end)
> +			return -EINVAL;
> +
> +		resource = (struct acpi_mpam_resource_node *)ptr;
> +
> +		remaining_table = table_end - ptr;
> +		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
> +		if (resource->num_functional_deps > max_deps) {
> +			pr_debug("MSC has impossible number of functional dependencies\n");
> +			return -EINVAL;
> +		}
> +
> +		err = acpi_mpam_parse_resource(msc, resource);
> +		if (err)
> +			return err;
> +
> +		ptr += sizeof(*resource);
> +		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
> +	}
> +
> +	return 0;
> +}
> +
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int err;
> +
> +	memset(&hid, 0, sizeof(hid));
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (err >= sizeof(uid)) {

err could be negative error code. This error validation only checks size 
but not error code.

Better to change it to

         if (err < 0 || err >= sizeof(uid))

[SNIP]

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
  2025-09-11 10:43   ` Jonathan Cameron
  2025-09-25  9:32   ` Stanimir Varbanov
@ 2025-10-02  3:35   ` Fenghua Yu
  2025-10-10 16:54     ` James Morse
  2025-10-03  0:15   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-02  3:35 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT to indicate the subset of CPUs and cache topology that can
> access each MPAM System Component (MSC).
>
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
>
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
>
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
  2025-09-11 10:46   ` Jonathan Cameron
  2025-09-11 14:08   ` Ben Horgan
@ 2025-10-02  3:55   ` Fenghua Yu
  2025-10-10 16:55     ` James Morse
  2025-10-03  0:17   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-02  3:55 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter.  The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
>
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
>
> Gid rid of the levels parameter, which has no remaining purpose.
>
> Fix acpi_get_cache_info() to match.
>
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
  2025-09-11 10:59   ` Jonathan Cameron
  2025-09-11 15:27   ` Lorenzo Pieralisi
@ 2025-10-02  4:30   ` Fenghua Yu
  2025-10-10 16:55     ` James Morse
  2025-10-03  0:23   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-02  4:30 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
>
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
>
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
>
> Signed-off-by: James Morse <james.morse@arm.com>

Other than minor comment issues as follows,

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

> ---
> Changes since v1:
>   * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
>   * Removed a confusing comment.
>   * Clarified the kernel doc.
>
> Changes since RFC:
>   * acpi_count_levels() now returns a value.
>   * Converted the table-get stuff to use Jonathan's cleanup helper.
>   * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>   drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  5 ++++
>   2 files changed, 67 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 7af7d62597df..c5f2a51d280b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>   				     entry->length);
>   	}
>   }
> +
> +/*
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
s/,/./
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return

This comment is not clear. Maybe it's better to say:

+ * If one CPU's L2 is shared with another CPU as L3, this function will 
return

> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
> + * the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_table_header *table;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..5bdca5546697 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1542,6 +1542,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
>   int find_acpi_cpu_topology_package(unsigned int cpu);
>   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>   void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> +int find_acpi_cache_level_from_id(u32 cache_id);
>   #else
>   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>   {
> @@ -1565,6 +1566,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>   }
>   static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>   						     cpumask_t *cpus) { }
> +static inline int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	return -EINVAL;
> +}
>   #endif
>   
>   void acpi_arch_init(void);

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-09-10 20:42 ` [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
  2025-09-11 11:06   ` Jonathan Cameron
@ 2025-10-02  5:03   ` Fenghua Yu
  2025-10-10 16:55     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-02  5:03 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
>
> Add a helper to pull this information out of the PPTT.
>
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
> Changes since v1:
>   * Added punctuation to the commit message.
>   * Removed a comment about an alternative implementaion.
>   * Made the loop continue with a warning if a CPU is missing from the PPTT.
>
> Changes since RFC:
>   * acpi_count_levels() now returns a value.
>   * Converted the table-get stuff to use Jonathan's cleanup helper.
>   * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>   drivers/acpi/pptt.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  6 +++++
>   2 files changed, 65 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index c5f2a51d280b..c379a9952b00 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -966,3 +966,62 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>   
>   	return -ENOENT;
>   }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the unified cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> +	cpumask_clear(cpus);
> +
> +	if (IS_ERR(table))
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (WARN_ON_ONCE(!cpu_node))
> +			continue;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				cpumask_set_cpu(cpu, cpus);

This function is almost identical to find_acpi_cache_level_from_id() 
defined in patch #3.

To reduce code size and complexity, it's better to define a common 
helper to server both of the two functions.

e.g. define a helper acpi_pptt_get_level_cpumask_from_cache_id(u32 
cache_id, int *lvl, cpu_mask_t *cpus)

This helper has the same code body to traverse the cache levels for all 
CPUs as find_acpi_cache_level_from_id() and 
acpi_pptt_get_cpumask_from_cache_id(). The major difference is here:

+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id) {
+				if (cpus)
+					cpumask_set_cpu(cpu, cpus);
+				if ((level) {
+					*lvl = level;
+					return 0;
+				}

Then simplify the two callers as follows:
int find_acpi_cache_level_from_id(u32 cache_id)
{
	int level;
	int err = acpi_pptt_get_level_cpumask_from_cache_id(cache_id, &level, NULL);
	if (err)
		return err;

	return level;
}

int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
{
	return acpi_pptt_get_level_cpumask_from_cache_id(cache_id, NULL, cpus);
}

> +		}
> +	}
> +
> +	return 0;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 5bdca5546697..c5fd92cda487 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1543,6 +1543,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
>   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>   void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>   int find_acpi_cache_level_from_id(u32 cache_id);
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
>   #else
>   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>   {
> @@ -1570,6 +1571,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
>   {
>   	return -EINVAL;
>   }
> +static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
> +						      cpumask_t *cpus)
> +{
> +	return -ENOENT;
> +}
>   #endif
>   
>   void acpi_arch_init(void);

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-09-10 20:42 ` [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
  2025-09-12 10:14   ` Ben Horgan
@ 2025-10-02  5:06   ` Fenghua Yu
  2025-10-10 16:55     ` James Morse
  2025-10-03  0:32   ` Gavin Shan
  2 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-02  5:06 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it. As MPAM is only found on arm64
> platforms, the arm64 tree is the most natural home for the Kconfig
> option.
>
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and to register properties
> of CPUs with the MPAM driver.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> CC: Dave Martin <dave.martin@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-12 11:42   ` Ben Horgan
@ 2025-10-02 18:02     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-02 18:02 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 12:42, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> cpuhp callbacks aren't the only time the MSC configuration may need to
>> be reset. Resctrl has an API call to reset a class.
>> If an MPAM error interrupt arrives it indicates the driver has
>> misprogrammed an MSC. The safest thing to do is reset all the MSCs
>> and disable MPAM.
>>
>> Add a helper to reset RIS via their class. Call this from mpam_disable(),
>> which can be scheduled from the error interrupt handler.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index e7faf453b5d7..a9d3c4b09976 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -842,8 +842,6 @@ static int mpam_reset_ris(void *arg)
>>  	u16 partid, partid_max;
>>  	struct mpam_msc_ris *ris = arg;
>>
>> -	mpam_assert_srcu_read_lock_held();
>> -
>
> Remove where it is introduced. There is already one in
> mpam_reset_ris_partid() at that time.

Mmmm, this should really have been replaced with a comment.

I prefer each function to have an assert like this as documentation. In this case, a new
caller may miss the lock, but always hit the 'in_reset_state' case during testing, and be
caught out when a call to mpam_reset_ris_partid() occurs. Having documentation comments
that can also bark at you when you ignore them is really handy!

It's removed in this  patch  because calling it via mpam_touch_msc() puts it behind an
call to schedule, and lockdep expects 'current' to be the one holding the lock.

I'll add the comment. Looks like it just got dropped when mpam_touch_msc() stopped using
an IPI...


>>  	if (ris->in_reset_state)
>>  		return 0;
>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks!


James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-09-25  7:16   ` Fenghua Yu
@ 2025-10-02 18:02     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-02 18:02 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 25/09/2025 08:16, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> cpuhp callbacks aren't the only time the MSC configuration may need to
>> be reset. Resctrl has an API call to reset a class.
>> If an MPAM error interrupt arrives it indicates the driver has
>> misprogrammed an MSC. The safest thing to do is reset all the MSCs
>> and disable MPAM.
>>
>> Add a helper to reset RIS via their class. Call this from mpam_disable(),
>> which can be scheduled from the error interrupt handler.


>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index e7faf453b5d7..a9d3c4b09976 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>>   @@ -1340,8 +1338,56 @@ static void mpam_enable_once(void)
>>              mpam_partid_max + 1, mpam_pmg_max + 1);
>>   }
>>   +static void mpam_reset_component_locked(struct mpam_component *comp)
>> +{
>> +    struct mpam_msc *msc;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    lockdep_assert_cpus_held();
>> +
>> +    guard(srcu)(&mpam_srcu);
>> +   
> 
> Nested locks on mpam_srcu in this call chain:
> 
> mpam_disable() -> mpam_reset_class() -> mpam_reset_class_locked() -> mpam_component_locked()

These are allowed to nest like this.


> There are redundant locks on mpam_srcu in mpam_disabled(), mpam_reset_class_locked(), and
> mpam_reset_component_locked().
> 
> It's better to guard mpam_srcu only in the top function mpam_disable() for simpler logic
> and lower overhead.

These things don't block, so there no real overhead. Shuffling them around to avoid the
harmless nesting would likely complicate the flow, not simplify it.

In the reset case you point at here, the resctrl code would need to take the srcu lock
before calling it - which is exposing the innards of what that function does.



>> list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
>> +                 srcu_read_lock_held(&mpam_srcu)) {
>> +        msc = vmsc->msc;
>> +
>> +        list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
>> +                     srcu_read_lock_held(&mpam_srcu)) {
>> +            if (!ris->in_reset_state)
>> +                mpam_touch_msc(msc, mpam_reset_ris, ris);
>> +            ris->in_reset_state = true;
>> +        }
>> +    }
>> +}


>> +/*
>> + * Called in response to an error IRQ.
>> + * All of MPAMs errors indicate a software bug, restore any modified
>> + * controls to their reset values.
>> + */
>>   void mpam_disable(struct work_struct *ignored)
>>   {
>> +    int idx;
>> +    struct mpam_class *class;
>>       struct mpam_msc *msc, *tmp;
>>         mutex_lock(&mpam_cpuhp_state_lock);
>> @@ -1351,6 +1397,12 @@ void mpam_disable(struct work_struct *ignored)
>>       }
>>       mutex_unlock(&mpam_cpuhp_state_lock);
>>   +    idx = srcu_read_lock(&mpam_srcu);

> It's better to change to guard(srcu)(&mpam_srcu);

For this one - absolutely not.
The guard() thing allows the toolchain to decide when to drop the lock. Further down in
this same function is an attempt to free the memory that got deferred. Guess what happens
if you call synchronize_srcu() while still in an srcu read side section....


>> +    list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>> +                 srcu_read_lock_held(&mpam_srcu))
>> +        mpam_reset_class(class);
>> +    srcu_read_unlock(&mpam_srcu, idx);
>> +
>>       mutex_lock(&mpam_list_lock);
>>       list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
>>           mpam_msc_destroy(msc);


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-12 12:12   ` Jonathan Cameron
@ 2025-10-02 18:02     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-02 18:02 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 13:12, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:58 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Register and enable error IRQs. All the MPAM error interrupts indicate a
>> software bug, e.g. out of range partid. If the error interrupt is ever
>> signalled, attempt to disable MPAM.
>>
>> Only the irq handler accesses the ESR register, so no locking is needed.
>> The work to disable MPAM after an error needs to happen at process
>> context as it takes mutex. It also unregisters the interrupts, meaning
>> it can't be done from the threaded part of a threaded interrupt.
>> Instead, mpam_disable() gets scheduled.
>>
>> Enabling the IRQs in the MSC may involve cross calling to a CPU that
>> can access the MSC.
>>
>> Once the IRQ is requested, the mpam_disable() path can be called
>> asynchronously, which will walk structures sized by max_partid. Ensure
>> this size is fixed before the interrupt is requested.

>> @@ -1318,11 +1405,172 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
>>  	}
>>  }
>>  
>> +static char *mpam_errcode_names[16] = {
>> +	[0] = "No error",
> 
> I think you had a bunch of defines for these in an earlier patch.  Can we use
> that to index here instead of [0] etc. 

Sure,


>> +	[1] = "PARTID_SEL_Range",
>> +	[2] = "Req_PARTID_Range",
>> +	[3] = "MSMONCFG_ID_RANGE",
>> +	[4] = "Req_PMG_Range",
>> +	[5] = "Monitor_Range",
>> +	[6] = "intPARTID_Range",
>> +	[7] = "Unexpected_INTERNAL",
>> +	[8] = "Undefined_RIS_PART_SEL",
>> +	[9] = "RIS_No_Control",
>> +	[10] = "Undefined_RIS_MON_SEL",
>> +	[11] = "RIS_No_Monitor",
>> +	[12 ... 15] = "Reserved"
>> +};
> 
> 
>> +static void mpam_unregister_irqs(void)
>> +{
>> +	int irq, idx;
>> +	struct mpam_msc *msc;
>> +
>> +	cpus_read_lock();
> 
> 	guard(cpus_read_lock)();
> 	guard(srcu)(&mpam_srcu);

Sure, looks like I didn't realise there was a cpus_read_lock version of this when I went
looking for places to add this.


>> +	/* take the lock as free_irq() can sleep */

The comment gets dropped as this mattered for an earlier locking scheme. (but free_irq()
can still sleep)


>> +	idx = srcu_read_lock(&mpam_srcu);
>> +	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
>> +				 srcu_read_lock_held(&mpam_srcu)) {
>> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
>> +		if (irq <= 0)
>> +			continue;
>> +
>> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags))
>> +			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
>> +
>> +		if (test_and_clear_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags)) {
>> +			if (irq_is_percpu(irq)) {
>> +				msc->reenable_error_ppi = 0;
>> +				free_percpu_irq(irq, msc->error_dev_id);
>> +			} else {
>> +				devm_free_irq(&msc->pdev->dev, irq, msc);
>> +			}
>> +		}
>> +	}
>> +	srcu_read_unlock(&mpam_srcu, idx);
>> +	cpus_read_unlock();
>> +}

>> @@ -1332,6 +1580,27 @@ static void mpam_enable_once(void)
>>  	partid_max_published = true;
>>  	spin_unlock(&partid_max_lock);
>>  
>> +	/*
>> +	 * If all the MSC have been probed, enabling the IRQs happens next.
>> +	 * That involves cross-calling to a CPU that can reach the MSC, and
>> +	 * the locks must be taken in this order:
>> +	 */
>> +	cpus_read_lock();
>> +	mutex_lock(&mpam_list_lock);
>> +	mpam_enable_merge_features(&mpam_classes);
>> +
>> +	err = mpam_register_irqs();
>> +	if (err)
>> +		pr_warn("Failed to register irqs: %d\n", err);
> 
> Perhaps move the print into the if (err) below?

More types of error get later, and its maybe useful to know which of these failed.


>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 6e047fbd3512..f04a9ef189cf 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -32,6 +32,10 @@ struct mpam_garbage {
>>  	struct platform_device	*pdev;
>>  };
>>  
>> +/* Bit positions for error_irq_flags */
>> +#define	MPAM_ERROR_IRQ_REQUESTED  0
>> +#define	MPAM_ERROR_IRQ_HW_ENABLED 1
> 
> If there aren't going to be load more of these (I've not really thought
> about whether there might) then using a bitmap for these seems to add complexity
> that we wouldn't see with 
> bool error_irq_req;
> bool error_irq_hw_enabled;

It's a bitmap so that mpam_unregister_irqs() can use test_and_clear_bit() on them,
because with a real interrupt mpam_unregister_irqs() can run multiple times in parallel
with itself.
Doing this as bools would mean having a mutex to prevent that from happening.

I'll do that as its a slightly simpler.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-12 14:40   ` Ben Horgan
@ 2025-10-02 18:03     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-02 18:03 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 15:40, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> Register and enable error IRQs. All the MPAM error interrupts indicate a
>> software bug, e.g. out of range partid. If the error interrupt is ever
>> signalled, attempt to disable MPAM.
>>
>> Only the irq handler accesses the ESR register, so no locking is needed.
>> The work to disable MPAM after an error needs to happen at process
>> context as it takes mutex. It also unregisters the interrupts, meaning
>> it can't be done from the threaded part of a threaded interrupt.
>> Instead, mpam_disable() gets scheduled.
>>
>> Enabling the IRQs in the MSC may involve cross calling to a CPU that
>> can access the MSC.
>>
>> Once the IRQ is requested, the mpam_disable() path can be called
>> asynchronously, which will walk structures sized by max_partid. Ensure
>> this size is fixed before the interrupt is requested.


>> +static int __setup_ppi(struct mpam_msc *msc)
>> +{
>> +	int cpu;
>> +	struct device *dev = &msc->pdev->dev;
>> +
>> +	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
>> +	if (!msc->error_dev_id)
>> +		return -ENOMEM;
>> +
>> +	for_each_cpu(cpu, &msc->accessibility) {
>> +		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
>> +
>> +		if (empty) {
> 
> I'm confused about how this if conditioned can be satisfied. Isn't the
> alloc clearing msc->error_dev_id for each cpu and then it's only getting
> set for each cpu later in the iteration.

Yes, you're right.

I think this was part of the support for PPI partitions, where multiple partitions would
get set up here. This was a sanity check that they didn't overlap...


I've ripped that out.


>> +			dev_err_once(dev, "MSC shares PPI with %s!\n",
>> +				     dev_name(&empty->pdev->dev));
>> +			return -EBUSY;
>> +		}
>> +		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
>> +	}
>> +
>> +	return 0;
>> +}

Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
                     ` (2 preceding siblings ...)
  2025-10-02  3:35   ` Fenghua Yu
@ 2025-10-03  0:15   ` Gavin Shan
  2025-10-10 16:55     ` James Morse
  3 siblings, 1 reply; 200+ messages in thread
From: Gavin Shan @ 2025-10-03  0:15 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

On 9/11/25 6:42 AM, James Morse wrote:
> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT to indicate the subset of CPUs and cache topology that can
> access each MPAM System Component (MSC).
> 
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Replaced commit message with wording from Dave.
>   * Fixed a stray plural.
>   * Moved further down in the file to make use of get_pptt() helper.
>   * Added a break to exit the loop early.
> 
> Changes since RFC:
>   * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()
> 
> Changes since RFC:
>   * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
>   * Added missing : in kernel-doc
>   * Made helper return void as this never actually returns an error.
> ---
>   drivers/acpi/pptt.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  3 ++
>   2 files changed, 86 insertions(+)
> 

With the description for the return value of acpi_pptt_get_cpus_from_container()
is dropped since that function doesn't have a return value, as mentioned by
Stanimir Varbanov.

Reviewed-by: Gavin Shan <gshan@redhat.com>

Thanks,
Gavin


> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..1728545d90b2 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -817,3 +817,86 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>   	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>   					  ACPI_PPTT_ACPI_IDENTICAL);
>   }
> +
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
> + * @table_hdr:		A reference to the PPTT table.
> + * @parent_node:	A pointer to the processor node in the @table_hdr.
> + * @cpus:		A cpumask to fill with the CPUs below @parent_node.
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> +				     struct acpi_pptt_processor *parent_node,
> +				     cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_id;
> +	int cpu;
> +
> +	cpumask_clear(cpus);
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
> +
> +		while (cpu_node) {
> +			if (cpu_node == parent_node) {
> +				cpumask_set_cpu(cpu, cpus);
> +				break;
> +			}
> +			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +		}
> +	}
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor container
> + * @acpi_cpu_id:	The UID of the processor container.
> + * @cpus:		The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 proc_sz;
> +
> +	cpumask_clear(cpus);
> +
> +	table_hdr = acpi_get_pptt();
> +	if (!table_hdr)
> +		return;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +	proc_sz = sizeof(struct acpi_pptt_processor);
> +	while ((unsigned long)entry + proc_sz <= table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> +			if (!acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +				if (cpu_node->acpi_processor_id == acpi_cpu_id) {
> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> +					break;
> +				}
> +			}
> +		}
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 1c5bb1e887cd..f97a9ff678cc 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
>   int find_acpi_cpu_topology_cluster(unsigned int cpu);
>   int find_acpi_cpu_topology_package(unsigned int cpu);
>   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>   #else
>   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>   {
> @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>   {
>   	return -EINVAL;
>   }
> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> +						     cpumask_t *cpus) { }
>   #endif
>   
>   void acpi_arch_init(void);



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
                     ` (2 preceding siblings ...)
  2025-10-02  3:55   ` Fenghua Yu
@ 2025-10-03  0:17   ` Gavin Shan
  3 siblings, 0 replies; 200+ messages in thread
From: Gavin Shan @ 2025-10-03  0:17 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

On 9/11/25 6:42 AM, James Morse wrote:
> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter.  The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
> 
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
> 
> Gid rid of the levels parameter, which has no remaining purpose.
> 
> Fix acpi_get_cache_info() to match.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> ---
> Changes since v1:
>   * Rewritten commit message from Dave.
>   * Minor changes to kernel doc comment.
>   * Keep the much loved typo.
> 
> Changes since RFC:
>   * Made acpi_count_levels() return the levels value.
> ---
>   drivers/acpi/pptt.c | 20 ++++++++++++--------
>   1 file changed, 12 insertions(+), 8 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
                     ` (2 preceding siblings ...)
  2025-10-02  4:30   ` Fenghua Yu
@ 2025-10-03  0:23   ` Gavin Shan
  3 siblings, 0 replies; 200+ messages in thread
From: Gavin Shan @ 2025-10-03  0:23 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

On 9/11/25 6:42 AM, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
>   * Removed a confusing comment.
>   * Clarified the kernel doc.
> 
> Changes since RFC:
>   * acpi_count_levels() now returns a value.
>   * Converted the table-get stuff to use Jonathan's cleanup helper.
>   * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>   drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  5 ++++
>   2 files changed, 67 insertions(+)
> 

With existing comments addressed, especially those from Lorenzo Pieralisi:

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-09-10 20:42 ` [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
  2025-09-12 10:14   ` Ben Horgan
  2025-10-02  5:06   ` Fenghua Yu
@ 2025-10-03  0:32   ` Gavin Shan
  2025-10-10 16:55     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Gavin Shan @ 2025-10-03  0:32 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

On 9/11/25 6:42 AM, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it. As MPAM is only found on arm64
> platforms, the arm64 tree is the most natural home for the Kconfig
> option.
> 
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and to register properties
> of CPUs with the MPAM driver.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> CC: Dave Martin <dave.martin@arm.com>
> ---
> Changes since v1:
>   * Help text rewritten by Dave.
> ---
>   arch/arm64/Kconfig | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>





^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
                     ` (3 preceding siblings ...)
  2025-10-02  3:21   ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table Fenghua Yu
@ 2025-10-03  0:58   ` Gavin Shan
  2025-10-17 18:51     ` James Morse
  4 siblings, 1 reply; 200+ messages in thread
From: Gavin Shan @ 2025-10-03  0:58 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

On 9/11/25 6:42 AM, James Morse wrote:
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v1:
>   * Whitespace.
>   * Gave GLOBAL_AFFINITY a pre-processor'd name.
>   * Fixed assumption that there are zero functional dependencies.
>   * Bounds check walking of the MSC RIS.
>   * More bounds checking in the main table walk.
>   * Check for nonsense numbers of function dependencies.
>   * Smattering of pr_debug() to help folk feeding line-noise to the parser.
>   * Changed the comment flavour on the SPDX string.
>   * Removed additional table check.
>   * More comment wrangling.
> 
> Changes since RFC:
>   * Used DEFINE_RES_IRQ_NAMED() and friends macros.
>   * Additional error handling.
>   * Check for zero sized MSC.
>   * Allow table revisions greater than 1. (no spec for revision 0!)
>   * Use cleanup helpers to retrive ACPI tables, which allows some functions
>     to be folded together.
> ---
>   arch/arm64/Kconfig          |   1 +
>   drivers/acpi/arm64/Kconfig  |   3 +
>   drivers/acpi/arm64/Makefile |   1 +
>   drivers/acpi/arm64/mpam.c   | 361 ++++++++++++++++++++++++++++++++++++
>   drivers/acpi/tables.c       |   2 +-
>   include/linux/acpi.h        |  12 ++
>   include/linux/arm_mpam.h    |  48 +++++
>   7 files changed, 427 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/acpi/arm64/mpam.c
>   create mode 100644 include/linux/arm_mpam.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4be8a13505bf..6487c511bdc6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>   
>   config ARM64_MPAM
>   	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>   	help
>   	  Memory System Resource Partitioning and Monitoring (MPAM) is an
>   	  optional extension to the Arm architecture that allows each
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>   
>   config ACPI_APMT
>   	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>   obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>   obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>   obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>   obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>   obj-$(CONFIG_ARM_AMBA)		+= amba.o
>   obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..fd9cfa143676
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,361 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
> +
> +static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
> +				   u32 flags, int *irq,
> +				   u32 processor_container_uid)
> +{
> +	int sense;
> +
> +	if (!intid)
> +		return false;
> +
> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return false;
> +
> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
> +
> +	if (16 <= intid && intid < 32 && processor_container_uid != GLOBAL_AFFINITY) {
> +		pr_err_once("Partitioned interrupts not supported\n");
> +		return false;
> +	}
> +
> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> +	if (*irq <= 0) {
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> +			    intid);
> +		return false;
> +	}
> +
> +	return true;
> +}

0 is allowed by acpi_register_gsi().

	if (*irq < 0) {
		pr_err_once(...);
		return false;
	}

> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, aff;
> +	int irq;
> +
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->overflow_interrupt_affinity;
> +	else
> +		aff = GLOBAL_AFFINITY;
> +	if (acpi_mpam_register_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> +
> +	flags = tbl_msc->error_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->error_interrupt_affinity;
> +	else
> +		aff = GLOBAL_AFFINITY;
> +	if (acpi_mpam_register_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);

It's perhaps worthy a warning message when @nid is explicitly set to zero due to
the bad proximity domain, something like below.

		if (nid == NUMA_NO_NODE) {
			nid = 0;
			if (num_possible_nodes() > 1) {
				pr_warn("Bad proximity domain %d, mapped to node 0\n",
					res->locator.memory_locator.proximity_domain);
			}
		}
		

> +	default:
> +		/* These get discovered later and treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	int i, err;
> +	char *ptr, *table_end;
> +	struct acpi_mpam_resource_node *resource;
> +
> +	ptr = (char *)(tbl_msc + 1);
> +	table_end = ptr + tbl_msc->length;
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		u64 max_deps, remaining_table;
> +
> +		if (ptr + sizeof(*resource) > table_end)
> +			return -EINVAL;
> +
> +		resource = (struct acpi_mpam_resource_node *)ptr;
> +
> +		remaining_table = table_end - ptr;
> +		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
> +		if (resource->num_functional_deps > max_deps) {
> +			pr_debug("MSC has impossible number of functional dependencies\n");
> +			return -EINVAL;
> +		}
> +
> +		err = acpi_mpam_parse_resource(msc, resource);
> +		if (err)
> +			return err;
> +
> +		ptr += sizeof(*resource);
> +		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
> +	}
> +
> +	return 0;
> +}
> +
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int err;
> +
> +	memset(&hid, 0, sizeof(hid));
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (err >= sizeof(uid)) {
> +		pr_debug("Failed to convert uid of device for power management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case 0:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case 0xa:
> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int __init acpi_mpam_parse(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct property_entry props[4]; /* needs a sentinel */
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int next_res, next_prop, err = 0;
> +	struct acpi_device *companion;
> +	struct platform_device *pdev;
> +	enum mpam_msc_iface iface;
> +	struct resource res[3];
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		if (table_offset > table_end) {
> +			pr_debug("MSC entry overlaps end of ACPI table\n");
> +			break;
> +		}
> +
> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the MSC structure. This MSC will still be counted,
> +		 * meaning the MPAM driver can't probe against all MSC, and
> +		 * will never be enabled. There is no way to enable it safely,
> +		 * because we cannot determine safe system-wide partid and pmg
> +		 * ranges in this situation.
> +		 */
> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (!tbl_msc->mmio_size) {
> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (decode_interface_type(tbl_msc, &iface)) {
> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		next_res = 0;
> +		next_prop = 0;
> +		memset(res, 0, sizeof(res));
> +		memset(props, 0, sizeof(props));
> +
> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
> +		if (!pdev) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		/* Some power management is described in the namespace: */
> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +		if (err > 0 && err < sizeof(uid)) {
> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> +			if (companion)
> +				ACPI_COMPANION_SET(&pdev->dev, companion);
> +			else
> +				pr_debug("MSC.%u: missing namespace entry\n", tbl_msc->identifier);
> +		}
> +
> +		if (iface == MPAM_IFACE_MMIO) {
> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +							       tbl_msc->mmio_size,
> +							       "MPAM:MSC");
> +		} else if (iface == MPAM_IFACE_PCC) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> +								tbl_msc->base_address);
> +			next_prop++;
> +		}
> +
> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +		err = platform_device_add_resources(pdev, res, next_res);
> +		if (err)
> +			break;
> +
> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +							tbl_msc->max_nrdy_usec);
> +
> +		/*
> +		 * The MSC's CPU affinity is described via its linked power
> +		 * management device, but only if it points at a Processor or
> +		 * Processor Container.
> +		 */
> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
> +								acpi_id);
> +		}
> +
> +		err = device_create_managed_software_node(&pdev->dev, props,
> +							  NULL);
> +		if (err)
> +			break;
> +
> +		/* Come back later if you want the RIS too */
> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> +		if (err)
> +			break;
> +
> +		err = platform_device_add(pdev);
> +		if (err)
> +			break;
> +	}
> +
> +	if (err)
> +		platform_device_put(pdev);
> +
> +	return err;
> +}
> +
> +int acpi_mpam_count_msc(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +		if (tbl_msc->length > table_end - table_offset)
> +			return -EINVAL;
> +		table_offset += tbl_msc->length;
> +
> +		count++;
> +	}
> +
> +	return count;
> +}
> +

acpi_mpam_count_msc() iterates the existing MSC node, which is part of acpi_mpam_parse().
So the question is why we can't drop acpi_mpam_count_msc() and maintain a variable to
count the existing MSC nodes in acpi_mpam_parse() ?

> +/*
> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index fa9bb8c8ce95..835e3795ede3 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
>   	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>   	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>   	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> -	ACPI_SIG_NBFT };
> +	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
>   
>   #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>   
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index c5fd92cda487..af449964426b 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>   #ifndef _LINUX_ACPI_H
>   #define _LINUX_ACPI_H
>   
> +#include <linux/cleanup.h>
>   #include <linux/errno.h>
>   #include <linux/ioport.h>	/* for struct resource */
>   #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>   void acpi_table_init_complete (void);
>   int acpi_table_init (void);
>   
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> +
>   int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>   int __init_or_acpilib acpi_table_parse_entries(char *id,
>   		unsigned long table_size, int entry_id,
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..3d6c39c667c3
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +#define GLOBAL_AFFINITY		~0
> +
> +struct mpam_msc;
> +
> +enum mpam_msc_iface {
> +	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
> +	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
> +};
> +
> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +/* Parse the ACPI description of resources entries for this MSC. */
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else
> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +					    struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
> +#endif
> +
> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* __LINUX_ARM_MPAM_H */

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-10 20:42 ` [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
  2025-09-11 13:35   ` Jonathan Cameron
  2025-09-17 11:03   ` Ben Horgan
@ 2025-10-03  3:53   ` Gavin Shan
  2025-10-17 18:51     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Gavin Shan @ 2025-10-03  3:53 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi James,

On 9/11/25 6:42 AM, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Avoid selecting driver on other architectrues.
>   * Removed PCC support stub.
>   * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
>   * Clarified a comment.
>   * Stopped using mpam_num_msc as an id,a and made it atomic.
>   * Size of -1 returned from cache_of_calculate_id()
>   * Renamed some struct members.
>   * Made a bunch of pr_err() dev_err_ocne().
>   * Used more cleanup magic.
>   * Inlined a print message.
>   * Fixed error propagation from mpam_dt_parse_resources().
>   * Moved cache accessibility checks earlier.
> 
> Changes since RFC:
>   * Check for status=broken DT devices.
>   * Moved all the files around.
>   * Made Kconfig symbols depend on EXPERT
> ---
>   arch/arm64/Kconfig              |   1 +
>   drivers/Kconfig                 |   2 +
>   drivers/Makefile                |   1 +
>   drivers/resctrl/Kconfig         |  14 +++
>   drivers/resctrl/Makefile        |   4 +
>   drivers/resctrl/mpam_devices.c  | 180 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  65 ++++++++++++
>   7 files changed, 267 insertions(+)
>   create mode 100644 drivers/resctrl/Kconfig
>   create mode 100644 drivers/resctrl/Makefile
>   create mode 100644 drivers/resctrl/mpam_devices.c
>   create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6487c511bdc6..93e563e1cce4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>   
>   config ARM64_MPAM
>   	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER if EXPERT
>   	select ACPI_MPAM if ACPI
>   	help
>   	  Memory System Resource Partitioning and Monitoring (MPAM) is an
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>   
>   source "drivers/cdx/Kconfig"
>   
> +source "drivers/resctrl/Kconfig"
> +
>   endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
>   obj-$(CONFIG_DRM_ACCEL)		+= accel/
>   obj-$(CONFIG_CDX_BUS)		+= cdx/
>   obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
>   
>   obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..c30532a3a3a4
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,14 @@
> +menuconfig ARM64_MPAM_DRIVER
> +	bool "MPAM driver"
> +	depends on ARM64 && ARM64_MPAM && EXPERT
> +	help
> +	  MPAM driver for System IP, e,g. caches and memory controllers.
> +
> +if ARM64_MPAM_DRIVER
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver"
> +	depends on ARM64_MPAM_DRIVER
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> +
> +endif
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+= mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..efc4738e3b4d
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,180 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/*
> + * Number of MSCs that have been probed. Once all MSC have been probed MPAM
> + * can be enabled.
> + */
> +static atomic_t mpam_num_msc;
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	u32 affinity_id;
> +	int err;
> +
> +	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +				       &affinity_id);
> +	if (err)
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +	else
> +		acpi_pptt_get_cpus_from_container(affinity_id,
> +						  &msc->accessibility);
> +
> +	return 0;
> +
> +	return err;
> +}
> +

Double return here and different values have been returned. I think here we
need "return err". In this case, we needn't copy @cpu_possible_mask on error
because the caller mpam_msc_drv_probe() will release the MSC instance.

> +static int fw_num_msc;
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	platform_set_drvdata(pdev, NULL);
> +	list_del_rcu(&msc->all_msc_list);
> +	synchronize_srcu(&mpam_srcu);
> +	mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	do {
> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +		if (!msc) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		mutex_init(&msc->probe_lock);
> +		mutex_init(&msc->part_sel_lock);
> +		mutex_init(&msc->outer_mon_sel_lock);
> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		msc->id = pdev->id;
> +		msc->pdev = pdev;
> +		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +		INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +		err = update_msc_accessibility(msc);
> +		if (err)
> +			break;
> +		if (cpumask_empty(&msc->accessibility)) {
> +			dev_err_once(dev, "MSC is not accessible from any CPU!");
> +			err = -EINVAL;
> +			break;
> +		}
> +

This check (cpumask_empty()) would be part of update_msc_accessibility() since
msc->accessibility is sorted out in that function where it should be validated.

> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> +					     &msc->pcc_subspace_id))
> +			msc->iface = MPAM_IFACE_MMIO;
> +		else
> +			msc->iface = MPAM_IFACE_PCC;
> +
> +		if (msc->iface == MPAM_IFACE_MMIO) {
> +			void __iomem *io;
> +
> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +								    &msc_res);
> +			if (IS_ERR(io)) {
> +				dev_err_once(dev, "Failed to map MSC base address\n");
> +				err = PTR_ERR(io);
> +				break;
> +			}
> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +			msc->mapped_hwpage = io;
> +		}
> +
> +		list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +		platform_set_drvdata(pdev, msc);
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +	}
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);
> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> +		pr_info("Discovered all MSC\n");
> +
> +	return err;
> +}
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	fw_num_msc = acpi_mpam_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..7c63d590fc98
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        all_msc_list;
> +
> +	int			id;
> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			pcc_subspace_id;
> +	struct mbox_client	pcc_cl;
> +	struct pcc_mbox_chan	*pcc_chan;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only taken during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs;
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	/*
> +	 * mon_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_MON_SEL.
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		outer_mon_sel_lock;
> +	raw_spinlock_t		inner_mon_sel_lock;
> +	unsigned long		inner_mon_sel_flags;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity);
> +
> +#endif /* MPAM_INTERNAL_H */

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
  2025-09-11 14:22   ` Jonathan Cameron
  2025-09-11 16:30   ` Markus Elfring
@ 2025-10-03 16:54   ` Fenghua Yu
  2025-10-17 18:51     ` James Morse
  2025-10-06 23:13   ` Gavin Shan
  3 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-10-03 16:54 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
>
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
>
> Add support for creating and destroying structures to allow a hierarchy
> of resources to be created.
>
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Fixed a comp/vmsc typo.
>   * Removed duplicate description from the commit message.
>   * Moved parenthesis in the add_to_garbage() macro.
>   * Check for out of range ris_idx when creating ris.
>   * Removed GFP as probe_lock is no longer a spin lock.
>   * Removed alloc flag as ended up searching the lists itself.
>   * Added a comment about affinity masks not overlapping.
>
> Changes since RFC:
>   * removed a pr_err() debug message that crept in.
> ---
>   drivers/resctrl/mpam_devices.c  | 406 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  90 +++++++
>   include/linux/arm_mpam.h        |   8 +-
>   3 files changed, 493 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index efc4738e3b4d..c7f4981b3545 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -18,7 +18,6 @@
>   #include <linux/printk.h>
>   #include <linux/slab.h>
>   #include <linux/spinlock.h>
> -#include <linux/srcu.h>
>   #include <linux/types.h>
>   
>   #include "mpam_internal.h"
> @@ -31,7 +30,7 @@
>   static DEFINE_MUTEX(mpam_list_lock);
>   static LIST_HEAD(mpam_all_msc);
>   
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
>   
>   /*
>    * Number of MSCs that have been probed. Once all MSC have been probed MPAM
> @@ -39,6 +38,402 @@ static struct srcu_struct mpam_srcu;
>    */
>   static atomic_t mpam_num_msc;
>   
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
> +	if (!vmsc)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(vmsc);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
> +
> +static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
> +				       struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (vmsc->msc->id == msc->id)
> +			return vmsc;
> +	}
> +
> +	return mpam_vmsc_alloc(comp, msc);
> +}
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(comp);
> +
> +	comp->comp_id = id;
> +	INIT_LIST_HEAD_RCU(&comp->vmsc);
> +	/* affinity is updated when ris are added */
> +	INIT_LIST_HEAD_RCU(&comp->class_list);
> +	comp->class = class;
> +
> +	list_add_rcu(&comp->class_list, &class->components);
> +
> +	return comp;
> +}
> +
> +static struct mpam_component *
> +mpam_component_get(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		if (comp->comp_id == id)
> +			return comp;
> +	}
> +
> +	return mpam_component_alloc(class, id);
> +}
> +
> +static struct mpam_class *
> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
> +{
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	class = kzalloc(sizeof(*class), GFP_KERNEL);
> +	if (!class)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(class);
> +
> +	INIT_LIST_HEAD_RCU(&class->components);
> +	/* affinity is updated when ris are added */
> +	class->level = level_idx;
> +	class->type = type;
> +	INIT_LIST_HEAD_RCU(&class->classes_list);
> +
> +	list_add_rcu(&class->classes_list, &mpam_classes);
> +
> +	return class;
> +}
> +
> +static struct mpam_class *
> +mpam_class_get(u8 level_idx, enum mpam_class_types type)
> +{
> +	bool found = false;
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		if (class->type == type && class->level == level_idx) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (found)
> +		return class;
> +
> +	return mpam_class_alloc(level_idx, type);
> +}
> +
> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = (x);				\
> +	_x->garbage.to_free = _x;			\
> +	llist_add(&_x->garbage.llist, &mpam_garbage);	\
> +} while (0)
> +
> +static void mpam_class_destroy(struct mpam_class *class)
> +{
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&class->classes_list);
> +	add_to_garbage(class);
> +}
> +
> +static void mpam_comp_destroy(struct mpam_component *comp)
> +{
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&comp->class_list);
> +	add_to_garbage(comp);
> +
> +	if (list_empty(&class->components))
> +		mpam_class_destroy(class);
> +}
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_comp_destroy(comp);
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	/*
> +	 * It is assumed affinities don't overlap. If they do the class becomes
> +	 * unusable immediately.
> +	 */
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, &msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
> +	add_to_garbage(ris);
> +	ris->garbage.pdev = pdev;
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
> +
> +/*
> + * There are two ways of reaching a struct mpam_msc_ris. Via the
> + * class->component->vmsc->ris, or via the msc.
> + * When destroying the msc, the other side needs unlinking and cleaning up too.
> + */
> +static void mpam_msc_destroy(struct mpam_msc *msc)
> +{
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_msc_ris *ris, *tmp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> +		mpam_ris_destroy(ris);
> +
> +	list_del_rcu(&msc->all_msc_list);
> +	platform_set_drvdata(pdev, NULL);
> +
> +	add_to_garbage(msc);
> +	msc->garbage.pdev = pdev;
> +}
> +
> +static void mpam_free_garbage(void)
> +{
> +	struct mpam_garbage *iter, *tmp;
> +	struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +

Should this be protected by mpam_list_lock and check if the lock is held?

+	lockdep_assert_held(&mpam_list_lock);

Multiple threads may add and free garbage in parallel. Please see later free_garbage() is not protected by any lock.

> +	if (!to_free)
> +		return;
> +
> +	synchronize_srcu(&mpam_srcu);
> +
> +	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> +		if (iter->pdev)
> +			devm_kfree(&iter->pdev->dev, iter->to_free);
> +		else
> +			kfree(iter->to_free);
> +	}
> +}
> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity)
> +{
> +	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
> +}
> +
> +/*
> + * cpumask_of_node() only knows about online CPUs. This can't tell us whether
> + * a class is represented on all possible CPUs.
> + */
> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (node_id == cpu_to_node(cpu))
> +			cpumask_set_cpu(cpu, affinity);
> +	}
> +}
> +
> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
> +				 enum mpam_class_types type,
> +				 struct mpam_class *class,
> +				 struct mpam_component *comp)
> +{
> +	int err;
> +
> +	switch (type) {
> +	case MPAM_CLASS_CACHE:
> +		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
> +						     affinity);
> +		if (err)
> +			return err;
> +
> +		if (cpumask_empty(affinity))
> +			pr_warn_once("%s no CPUs associated with cache node",
> +				     dev_name(&msc->pdev->dev));
> +
> +		break;
> +	case MPAM_CLASS_MEMORY:
> +		get_cpumask_from_node_id(comp->comp_id, affinity);
> +		/* affinity may be empty for CPU-less memory nodes */
> +		break;
> +	case MPAM_CLASS_UNKNOWN:
> +		return 0;
> +	}
> +
> +	cpumask_and(affinity, affinity, &msc->accessibility);
> +
> +	return 0;
> +}
> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
> +		return -EINVAL;
> +
> +	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
> +		return -EBUSY;
> +

Should setting msc->ris_idxs bit be moved to the end of this function 
after all error handling paths? The reason is this bit is better to be 0 
(or recovered) if any error happens. It's hard to recover it to 0 for 
each error handling. The easiest way is to set it at the end of the 
function.

> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(ris);
> +
> +	class = mpam_class_get(class_id, type);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);
> +
> +	comp = mpam_component_get(class, component_id);
> +	if (IS_ERR(comp)) {
> +		if (list_empty(&class->components))
> +			mpam_class_destroy(class);
> +		return PTR_ERR(comp);
> +	}
> +
> +	vmsc = mpam_vmsc_get(comp, msc);
> +	if (IS_ERR(vmsc)) {
> +		if (list_empty(&comp->vmsc))
> +			mpam_comp_destroy(comp);
> +		return PTR_ERR(vmsc);
> +	}
> +
> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> +	if (err) {
> +		if (list_empty(&vmsc->ris))
> +			mpam_vmsc_destroy(vmsc);
> +		return err;
> +	}
> +
> +	ris->ris_idx = ris_idx;
> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);

vmsc_list will be used but not initialized. Missing 
INIT_LIST_HEAD_RCU(&ris->msc_list) here?

> +	ris->vmsc = vmsc;
> +
> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +
Setting the msc->ris_idxs here is better to avoid to clear it in each 
error handling path.
> +	return 0;
> +}
> +
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id)
> +{
> +	int err;
> +
> +	mutex_lock(&mpam_list_lock);
> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> +				     component_id);
> +	mutex_unlock(&mpam_list_lock);
> +	if (err)
> +		mpam_free_garbage();
> +
> +	return err;
> +}
> +
>   /*
>    * An MSC can control traffic from a set of CPUs, but may only be accessible
>    * from a (hopefully wider) set of CPUs. The common reason for this is power
> @@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>   		return;
>   
>   	mutex_lock(&mpam_list_lock);
> -	platform_set_drvdata(pdev, NULL);
> -	list_del_rcu(&msc->all_msc_list);
> -	synchronize_srcu(&mpam_srcu);
> +	mpam_msc_destroy(msc);
>   	mutex_unlock(&mpam_list_lock);
> +
> +	mpam_free_garbage();

Should mpam_free_garbage() be protected by mpam_list_lock? It may race 
with adding garbage. I can see other adding and freeing garbage are 
protected by mpam_list_lock but not this one.

>   }
>   
>   static int mpam_msc_drv_probe(struct platform_device *pdev)
> @@ -95,6 +490,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>   			err = -ENOMEM;
>   			break;
>   		}
> +		init_garbage(msc);
>   
>   		mutex_init(&msc->probe_lock);
>   		mutex_init(&msc->part_sel_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 7c63d590fc98..02e9576ece6b 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -7,10 +7,29 @@
>   #include <linux/arm_mpam.h>
>   #include <linux/cpumask.h>
>   #include <linux/io.h>
> +#include <linux/llist.h>
>   #include <linux/mailbox_client.h>
>   #include <linux/mutex.h>
>   #include <linux/resctrl.h>
>   #include <linux/sizes.h>
> +#include <linux/srcu.h>
> +
> +#define MPAM_MSC_MAX_NUM_RIS	16
> +
> +/*
> + * Structures protected by SRCU may not be freed for a surprising amount of
> + * time (especially if perf is running). To ensure the MPAM error interrupt can
> + * tear down all the structures, build a list of objects that can be gargbage
> + * collected once synchronize_srcu() has returned.
> + * If pdev is non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> +	/* member of mpam_garbage */
> +	struct llist_node	llist;
> +
> +	void			*to_free;
> +	struct platform_device	*pdev;
> +};
>   
>   struct mpam_msc {
>   	/* member of mpam_all_msc */
> @@ -57,8 +76,79 @@ struct mpam_msc {
>   
>   	void __iomem		*mapped_hwpage;
>   	size_t			mapped_hwpage_sz;
> +
> +	struct mpam_garbage	garbage;
>   };
>   
> +struct mpam_class {
> +	/* mpam_components in this class */
> +	struct list_head	components;
> +
> +	cpumask_t		affinity;
> +
> +	u8			level;
> +	enum mpam_class_types	type;
> +
> +	/* member of mpam_classes */
> +	struct list_head	classes_list;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_component {
> +	u32			comp_id;
> +
> +	/* mpam_vmsc in this component */
> +	struct list_head	vmsc;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_class:components */
> +	struct list_head	class_list;
> +
> +	/* parent: */
> +	struct mpam_class	*class;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_vmsc {
> +	/* member of mpam_component:vmsc_list */
> +	struct list_head	comp_list;
> +
> +	/* mpam_msc_ris in this dvmsc */
> +	struct list_head	ris;
> +
> +	/* All RIS in this vMSC are members of this MSC */
> +	struct mpam_msc		*msc;
> +
> +	/* parent: */
> +	struct mpam_component	*comp;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_msc_ris {
> +	u8			ris_idx;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_vmsc:ris */
> +	struct list_head	vmsc_list;
> +
> +	/* member of mpam_msc:ris */
> +	struct list_head	msc_list;
> +
> +	/* parent: */
> +	struct mpam_vmsc	*vmsc;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +/* List of all classes - protected by srcu*/
> +extern struct srcu_struct mpam_srcu;
> +extern struct list_head mpam_classes;
> +
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 3d6c39c667c3..3206f5ddc147 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -38,11 +38,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
>   static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>   #endif
>   
> -static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> -				  enum mpam_class_types type, u8 class_id,
> -				  int component_id)
> -{
> -	return -EINVAL;
> -}
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id);
>   
>   #endif /* __LINUX_ARM_MPAM_H */

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
  2025-09-11 15:07   ` Jonathan Cameron
  2025-09-12 10:42   ` Ben Horgan
@ 2025-10-03 17:56   ` Fenghua Yu
  2025-10-06 23:42   ` Gavin Shan
  3 siblings, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-10-03 17:56 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen


On 9/10/25 13:42, James Morse wrote:
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
>
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed as each CPU's online call is made.
>
> This adds the low-level MSC register accessors.
>
> Once all MSCs reported by the firmware have been probed from a CPU in
> their respective cpu-affinity set, the probe-time cpuhp callbacks are
> replaced.  The replacement callbacks will ultimately need to handle
> save/restore of the runtime MSC state across power transitions, but for
> now there is nothing to do in them: so do nothing.
>
> The architecture's context switch code will be enabled by a static-key,
> this can be set by mpam_enable(), but must be done from process context,
> not a cpuhp callback because both take the cpuhp lock.
> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
> to test if all the MSCs have been probed. If probing fails, mpam_disable()
> is scheduled to unregister the cpuhp callbacks and free memory.
>
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-12 15:22   ` Dave Martin
@ 2025-10-03 18:02     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:02 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Dave,

On 12/09/2025 16:22, Dave Martin wrote:
> On Wed, Sep 10, 2025 at 08:42:58PM +0000, James Morse wrote:
>> Register and enable error IRQs. All the MPAM error interrupts indicate a
>> software bug, e.g. out of range partid. If the error interrupt is ever
>> signalled, attempt to disable MPAM.
>>
>> Only the irq handler accesses the ESR register, so no locking is needed.
> 
> Nit: MPAMF_ESR?  (Casual readers may confuse it with ESR_ELx.

Sure, fixed.


> Formally, there is no MPAM "ESR" register, though people familiar with
> the spec will of course know what you're referring to.)
> 
>> The work to disable MPAM after an error needs to happen at process
>> context as it takes mutex. It also unregisters the interrupts, meaning
>> it can't be done from the threaded part of a threaded interrupt.
>> Instead, mpam_disable() gets scheduled.
>>
>> Enabling the IRQs in the MSC may involve cross calling to a CPU that
>> can access the MSC.
>>
>> Once the IRQ is requested, the mpam_disable() path can be called
>> asynchronously, which will walk structures sized by max_partid. Ensure
>> this size is fixed before the interrupt is requested.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index a9d3c4b09976..e7e4afc1ea95 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c

>> @@ -166,6 +169,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
>>  	return (idr_high << 32) | idr_low;
>>  }
>>  
>> +static void mpam_msc_zero_esr(struct mpam_msc *msc)

> Nit: Maybe clear_esr?  (The fact that setting the ERRCODE and OVRWR
> fields to zero clears the interrupt and prepares for unambiguous
> reporting of the next error is more of an implementation detail.
> It doesn't matter what the rest of the register is set to.)

If you think that name is clearer - sure,


>> +{
>> +	__mpam_write_reg(msc, MPAMF_ESR, 0);
>> +	if (msc->has_extd_esr)
> 
> This deasserts the interrupt (if level-sensitive) and enables the MSC
> to report further errors.  If we are unlucky and error occurs now,
> won't we splat the newly HW-generated RIS field by:
> 
>> +		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
> 
> ...?  If so, we will diagnose the wrong RIS when we pump the new error
> from MPAMF_ESR.  I think the correct interpretation of the spec may be
> that:
> 
>  a) software should treat fields in MPAMF_ESR[63:32] as vaild only if
>     ERRCODE is nonzero, and
> 
>  b) software should never write to MPAMF_ESR[63:32] while ERRCODE is
>     zero.
> 
> Does this look right?  Should the fields be cleared in the opposite
> order?
> 
> Or alternatively, is it actually necessary to clear MPAMF_ESR[63:32]
> at all?
> 
> (The spec seems a bit vague on what software is supposed to do with
> this register to ensure correctness...)

Yeah - I don't think there was much intention here beyond making the RIS
available for the first error. As none of these errors can really be handled,
I don't think a second error value matters too much.

I've reordered it as you suggest, and added a block comment to explain the
problem:
-----%<-----
	u64 esr_low = __mpam_read_reg(msc, MPAMF_ESR);
	if (!esr_low)
		return;

	/*
	 * Clearing the high/low bits of MPAMF_ESR can not be atomic.
	 * Clear the top half first, so that the pending error bits in the
	 * lower half prevent hardware from updating either half of the
	 * register.
	 */
	if (msc->has_extd_esr)
		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
	__mpam_write_reg(msc, MPAMF_ESR, 0);
-----%<-----

We should probably go bother the architects to find out if we can modify
the high bits like this safely.



>> +}

>> @@ -895,6 +920,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>>  	}
>>  }
>>  
>> +static void _enable_percpu_irq(void *_irq)
>> +{
>> +	int *irq = _irq;
>> +
>> +	enable_percpu_irq(*irq, IRQ_TYPE_NONE);

> Can the type vary?  (Maybe this makes no sense on GIC-based systems --
> IRQ_TYPE_NONE (or "0") seems overwhelmingly common.)
> 
> (Just my lack of familiarity takling, here.)

PPI can be edge or level - but the irqchip doesn't need to know at this point, and
specifying NONE tells it to use what it already knows. The irqchip already got told
what firmware table said when the interrupt was registered. I believe the GIC knows
its a PPI from the intid, they live in a magic range.


> [...]
> 
>> +static int __setup_ppi(struct mpam_msc *msc)
>> +{
>> +	int cpu;
>> +	struct device *dev = &msc->pdev->dev;
>> +
>> +	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
>> +	if (!msc->error_dev_id)
>> +		return -ENOMEM;
>> +
>> +	for_each_cpu(cpu, &msc->accessibility) {
>> +		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
>> +
>> +		if (empty) {
>> +			dev_err_once(dev, "MSC shares PPI with %s!\n",
>> +				     dev_name(&empty->pdev->dev));
>> +			return -EBUSY;
>> +		}
>> +		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
>> +	}

> How are PPIs supposed to work?

You take the interrupt on the corresponding CPU, and the irqchip gives you the
corresponding percpu pointer. That is what is being setup here.


> An individual MSC that is affine to multiple CPUs has no way to
> distinguish which CPU an error relates to, and no CPU-specific (or even
> RIS-specific) ESR.

For deeper caches this is certainly a problem. But for things close in to the CPU, they
may well know which CPU this transaction came from. The MSC may even be part of the
CPU.


> So, won't such an interrupt be pointlessly take the interrupt on all
> CPUs, which would all fight over the reported event?

It ought to be only one, but nothing requires this.


> Have you encountered any platforms wired up this way?  The spec
> recommends not to do this, but does not provide any rationale...
> 
> The spec only mentions PPIs in the context of being affine to a single
> CPU (PE).

> It's not clear to me that any other use of PPIs makes
> sense (?)

The PPI problem also exists the other way round. If your MSC is not globally accessible,
you can't wire its interrupts up to an SPI, otherwise linux can route the interrupt to
CPUs that can't access the MSC.

The only tool you have to get out of this is to use a PPI.

This came up multiple times with RAS when the caches signalled errors and it really needed
to go to the local CPUs, not a random CPU in a remote package.

It's my assumption that folk will build platforms the same shape, and reach for PPI again.


> If we really have to cope with this, maybe it would make sense to pick
> a single CPU in the affinity set (though we might have to move it
> around if the unlucky CPU is offlined).

Unfortunately PPI are a totally separate set of wiring with a reserved intid range (or
two!) and their own registers in the GICR. It isn't possible to pretend they're a regular
interrupt taken on a particular CPU.

A choice that can be made here is to assume no-one will ever(?) use these, and drop
support. I don't know of a platform that is using PPI - but I can't say no-one is.


> 
> [...]
> 
>> +static char *mpam_errcode_names[16] = {
>> +	[0] = "No error",
>> +	[1] = "PARTID_SEL_Range",
>> +	[2] = "Req_PARTID_Range",
>> +	[3] = "MSMONCFG_ID_RANGE",
>> +	[4] = "Req_PMG_Range",
>> +	[5] = "Monitor_Range",
>> +	[6] = "intPARTID_Range",
>> +	[7] = "Unexpected_INTERNAL",
>> +	[8] = "Undefined_RIS_PART_SEL",
>> +	[9] = "RIS_No_Control",
>> +	[10] = "Undefined_RIS_MON_SEL",
>> +	[11] = "RIS_No_Monitor",
>> +	[12 ... 15] = "Reserved"
>> +};
>> +
>> +static int mpam_enable_msc_ecr(void *_msc)
>> +{
>> +	struct mpam_msc *msc = _msc;
>> +
>> +	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
>> +
>> +	return 0;
>> +}
> 
> This could also be a switch () { case 0: return "foo";
> case 1: return "bar"; ... }, without the explicit table.  This would
> avoid having to think about the ERRCODE field growing.  (There are some
> RES0 bits looming over it.)
> 
> (This also tends to avoid the extra pointer table in .rodata, which
> might be of interest if this were a hot path.)

Given they added new fields to the end of the list, I'm not worried about it becoming sparse.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 18/29] arm_mpam: Register and enable IRQs
  2025-09-25  6:33   ` Fenghua Yu
@ 2025-10-03 18:03     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:03 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Xin Hao, peternewman, dfustini,
	amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 25/09/2025 07:33, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> Register and enable error IRQs. All the MPAM error interrupts indicate a
>> software bug, e.g. out of range partid. If the error interrupt is ever
>> signalled, attempt to disable MPAM.
>>
>> Only the irq handler accesses the ESR register, so no locking is needed.
>> The work to disable MPAM after an error needs to happen at process
>> context as it takes mutex. It also unregisters the interrupts, meaning
>> it can't be done from the threaded part of a threaded interrupt.
>> Instead, mpam_disable() gets scheduled.
>>
>> Enabling the IRQs in the MSC may involve cross calling to a CPU that
>> can access the MSC.
>>
>> Once the IRQ is requested, the mpam_disable() path can be called
>> asynchronously, which will walk structures sized by max_partid. Ensure
>> this size is fixed before the interrupt is requested.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index a9d3c4b09976..e7e4afc1ea95 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1318,11 +1405,172 @@ static void mpam_enable_merge_features(struct list_head

>> +static void mpam_unregister_irqs(void)
>> +{
>> +    int irq, idx;
>> +    struct mpam_msc *msc;
>> +
>> +    cpus_read_lock();
>> +    /* take the lock as free_irq() can sleep */
>> +    idx = srcu_read_lock(&mpam_srcu);

> guard(srcu)(&mpam_srcu);

Yes - Jonathan already suggested this.


>> +    list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
>> +                 srcu_read_lock_held(&mpam_srcu)) {
>> +        irq = platform_get_irq_byname_optional(msc->pdev, "error");
>> +        if (irq <= 0)
>> +            continue;
>> +
>> +        if (test_and_clear_bit(MPAM_ERROR_IRQ_HW_ENABLED, &msc->error_irq_flags))
>> +            mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
>> +
>> +        if (test_and_clear_bit(MPAM_ERROR_IRQ_REQUESTED, &msc->error_irq_flags)) {
>> +            if (irq_is_percpu(irq)) {
>> +                msc->reenable_error_ppi = 0;
>> +                free_percpu_irq(irq, msc->error_dev_id);
>> +            } else {
>> +                devm_free_irq(&msc->pdev->dev, irq, msc);
>> +            }
>> +        }
>> +    }
>> +    srcu_read_unlock(&mpam_srcu, idx);
>> +    cpus_read_unlock();
>> +}


James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-12 12:13   ` Jonathan Cameron
@ 2025-10-03 18:03     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:03 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 13:13, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:59 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Once all the MSC have been probed, the system wide usable number of
>> PARTID is known and the configuration arrays can be allocated.
>>
>> After this point, checking all the MSC have been probed is pointless,
>> and the cpuhp callbacks should restore the configuration, instead of
>> just resetting the MSC.
>>
>> Add a static key to enable this behaviour. This will also allow MPAM
>> to be disabled in repsonse to an error, and the architecture code to
>> enable/disable the context switch of the MPAM system registers.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Seems fine to me (other than the TODO move to arch code
> that should probably be resolved.

That is just to make it clear this isn't where it should live, its a quirk of moving
the arch code alter to reduce the number of trees this interacts with.
I've dropped the word 'TODO'...


> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-12 14:42   ` Ben Horgan
@ 2025-10-03 18:03     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:03 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 15:42, Ben Horgan wrote:
> On 9/10/25 21:42, James Morse wrote:
>> Once all the MSC have been probed, the system wide usable number of
>> PARTID is known and the configuration arrays can be allocated.
>>
>> After this point, checking all the MSC have been probed is pointless,
>> and the cpuhp callbacks should restore the configuration, instead of
>> just resetting the MSC.
>>
>> Add a static key to enable this behaviour. This will also allow MPAM
>> to be disabled in repsonse to an error, and the architecture code to
>> enable/disable the context switch of the MPAM system registers.

> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-09-26  2:31   ` Fenghua Yu
@ 2025-10-03 18:04     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:04 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 26/09/2025 03:31, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> Once all the MSC have been probed, the system wide usable number of
>> PARTID is known and the configuration arrays can be allocated.
>>
>> After this point, checking all the MSC have been probed is pointless,
>> and the cpuhp callbacks should restore the configuration, instead of
>> just resetting the MSC.
>>
>> Add a static key to enable this behaviour. This will also allow MPAM
>> to be disabled in repsonse to an error, and the architecture code to

> nit...s/repsonse/response/

Oops,


>> enable/disable the context switch of the MPAM system registers.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-09-25  6:53   ` Fenghua Yu
@ 2025-10-03 18:04     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:04 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 25/09/2025 07:53, Fenghua Yu wrote:
> On 9/10/25 13:43, James Morse wrote:
>> When CPUs come online the MSC's original configuration should be restored.
>>
>> Add struct mpam_config to hold the configuration. This has a bitmap of
>> features that were modified. Once the maximum partid is known, allocate
>> a configuration array for each component, and reprogram each RIS
>> configuration from this.


>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index ec1db5f8b05c..7fd149109c75 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c

>> @@ -922,6 +991,40 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>>       }
>>   }
>>   +static void mpam_reprogram_msc(struct mpam_msc *msc)
>> +{
>> +    u16 partid;
>> +    bool reset;
>> +    struct mpam_config *cfg;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    /*
>> +     * No lock for mpam_partid_max as partid_max_published has been
>> +     * set by mpam_enabled(), so the values can no longer change.
>> +     */
>> +    mpam_assert_partid_sizes_fixed();
>> +
>> +    guard(srcu)(&mpam_srcu);

> mpam_srcu is locked in caller mpam_cpu_online(). It's unnecessary to call guard(srcu)
> (&mpam_srcu) here again for simpler logic and less overhead.

It's not simpler - as now the requirements for the innards of mpam_reprogram_msc() have to
spill out to the callers.
I also contest that its less overhead - this is used on the cpuhp path, I'd suspect the
'cost' can't even be measured.

The good news is at the end of the day this thing only has that one caller, so I'll change
this one ...


Thanks,

James




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features
  2025-09-12 13:07   ` Jonathan Cameron
@ 2025-10-03 18:05     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-03 18:05 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Zeng Heng

Hi Jonathan,

On 12/09/2025 14:07, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:01 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> MPAM supports more features than are going to be exposed to resctrl.
>> For partid other than 0, the reset values of these controls isn't
>> known.
>>
>> Discover the rest of the features so they can be reset to avoid any
>> side effects when resctrl is in use.
>>
>> PARTID narrowing allows MSC/RIS to support less configuration space than
>> is usable. If this feature is found on a class of device we are likely
>> to use, then reduce the partid_max to make it usable. This allows us
>> to map a PARTID to itself.

> A few trivial things inline.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> @@ -667,10 +676,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>>  	struct mpam_msc *msc = ris->vmsc->msc;
>>  	struct device *dev = &msc->pdev->dev;
>>  	struct mpam_props *props = &ris->props;
>> +	struct mpam_class *class = ris->vmsc->comp->class;
>>  
>>  	lockdep_assert_held(&msc->probe_lock);
>>  	lockdep_assert_held(&msc->part_sel_lock);
>>  
>> +	/* Cache Capacity Partitioning */
>> +	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
>> +		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
>> +
>> +		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
>> +		if (props->cmax_wd &&
>> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
>> +			mpam_set_feature(mpam_feat_cmax_softlim, props);
>> +
>> +		if (props->cmax_wd &&
>> +		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
>> +			mpam_set_feature(mpam_feat_cmax_cmax, props);
>> +
>> +		if (props->cmax_wd &&
>> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
>> +			mpam_set_feature(mpam_feat_cmax_cmin, props);
>> +
>> +		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
>> +

> Trivial but blank line here feels inconsistent with local style. I'd drop it.

Sure,


>> +		if (props->cassoc_wd &&
>> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
>> +			mpam_set_feature(mpam_feat_cmax_cassoc, props);
>> +	}
>> +
> 
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 17570d9aae9b..326ba9114d70 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -136,25 +136,34 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>>   * When we compact the supported features, we don't care what they are.
>>   * Storing them as a bitmap makes life easy.
>>   */
>> -typedef u16 mpam_features_t;
>> +typedef u32 mpam_features_t;

> This is strengthening my view that this should just be a DECLARE_BITMAP(MPAM_FEATURE_LAST)
> in the appropriate places.

I don't think this list is going to grow much. But sure.

Most stuff can be churned out, but this makes an utter mess of exposing the value to
debugfs.., I do need that exposed as "why didn't resctrl do what I wanted" is a very
common question - and being able to see what the hardware supports really helps.

Looks like exposing the first 'ulong' is the simplest - it kicks the can this creates
down the road. I experimented with u32_array - but its no better.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-09-10 20:42 ` [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
  2025-09-11 15:18   ` Jonathan Cameron
  2025-09-12 11:11   ` Ben Horgan
@ 2025-10-03 18:58   ` Fenghua Yu
  2 siblings, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-10-03 18:58 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may also have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> The limits from requestors (e.g. CPUs) also need taking into account.
>
> While doing this, RIS entries that firmware didn't describe are created
> under MPAM_CLASS_UNKNOWN.
>
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
>
> Signed-off-by: James Morse <james.morse@arm.com>
[SNIP]
> @@ -113,6 +123,72 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>   
>   #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
>   
> +static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);

if reg + 4 == msc->mapped_hwpage_sz, the register is out of boundary as 
well.

So the validation should be:

WARN_ON_ONCE(reg + sizeof(u32) >= msc->mapped_hwpage_sz);

[SNIP]

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-10 20:42 ` [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
  2025-09-11 15:24   ` Jonathan Cameron
  2025-09-11 15:31   ` Ben Horgan
@ 2025-10-05  0:09   ` Fenghua Yu
  2 siblings, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-10-05  0:09 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi, James,

On 9/10/25 13:42, James Morse wrote:
> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms
> where MSC are not accesible from every CPU. This makes an irqsave
> spinlock the obvious lock to protect these registers. On systems with SCMI
> mailboxes it must be able to sleep, meaning a mutex must be used. The
> SCMI platforms can't support an overflow interrupt.
>
> Clearly these two can't exist for one MSC at the same time.
>
> Add helpers for the MON_SEL locking. The outer lock must be taken in a
> pre-emptible context before the inner lock can be taken. On systems with
> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
> will fail to be 'taken' if the caller is unable to sleep. This will allow
> callers to fail without having to explicitly check the interface type of
> each MSC.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Change since v1:
>   * Made accesses to outer_lock_held READ_ONCE() for torn values in the failure
>     case.
> ---
>   drivers/resctrl/mpam_devices.c  |  3 +--
>   drivers/resctrl/mpam_internal.h | 37 +++++++++++++++++++++++++++++----
>   2 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 24dc81c15ec8..a26b012452e2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -748,8 +748,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>   
>   		mutex_init(&msc->probe_lock);
>   		mutex_init(&msc->part_sel_lock);
> -		mutex_init(&msc->outer_mon_sel_lock);
> -		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		mpam_mon_sel_lock_init(msc);
>   		msc->id = pdev->id;
>   		msc->pdev = pdev;
>   		INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 828ce93c95d5..4cc44d4e21c4 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -70,12 +70,17 @@ struct mpam_msc {
>   
>   	/*
>   	 * mon_sel_lock protects access to the MSC hardware registers that are
> -	 * affected by MPAMCFG_MON_SEL.
> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> +	 * Access to mon_sel is needed from both process and interrupt contexts,
> +	 * but is complicated by firmware-backed platforms that can't make any
> +	 * access unless they can sleep.
> +	 * Always use the mpam_mon_sel_lock() helpers.
> +	 * Accessed to mon_sel need to be able to fail if they occur in the wrong
> +	 * context.
>   	 * If needed, take msc->probe_lock first.
>   	 */
> -	struct mutex		outer_mon_sel_lock;
> -	raw_spinlock_t		inner_mon_sel_lock;
> -	unsigned long		inner_mon_sel_flags;
> +	raw_spinlock_t		_mon_sel_lock;
> +	unsigned long		_mon_sel_flags;
>   
>   	void __iomem		*mapped_hwpage;
>   	size_t			mapped_hwpage_sz;
> @@ -83,6 +88,30 @@ struct mpam_msc {
>   	struct mpam_garbage	garbage;
>   };
>   
> +/* Returning false here means accesses to mon_sel must fail and report an error. */
> +static inline bool __must_check mpam_mon_sel_lock(struct mpam_msc *msc)
> +{
> +	WARN_ON_ONCE(msc->iface != MPAM_IFACE_MMIO);
> +
> +	raw_spin_lock_irqsave(&msc->_mon_sel_lock, msc->_mon_sel_flags);
> +	return true;
> +}
> +
> +static inline void mpam_mon_sel_unlock(struct mpam_msc *msc)
> +{
> +	raw_spin_unlock_irqrestore(&msc->_mon_sel_lock, msc->_mon_sel_flags);
> +}
> +
> +static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> +{
> +	lockdep_assert_held_once(&msc->_mon_sel_lock);
> +}
> +
> +static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
> +{
> +	raw_spin_lock_init(&msc->_mon_sel_lock);
> +}
> +
>   struct mpam_class {
>   	/* mpam_components in this class */
>   	struct list_head	components;

The inner and outer locks were defined and used in patch #7; but they 
are replaced by _mon_sel_lock in this patch.

I'm wondering if this patch should be merged into patch #7. This patch 
seems is redundant.

Thanks.

-Fenghua




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-09-10 20:42 ` [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
  2025-09-11 15:29   ` Jonathan Cameron
  2025-09-11 15:37   ` Ben Horgan
@ 2025-10-05  0:53   ` Fenghua Yu
  2 siblings, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-10-05  0:53 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich


On 9/10/25 13:42, James Morse wrote:
> Expand the probing support with the control and monitor types
> we can use with resctrl.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

A couple of minor comments below.

> ---
> Changes since v1:
>   * added an underscore to a variable name.
>
> Changes since RFC:
>   * Made mpam_ris_hw_probe_hw_nrdy() more in C.
>   * Added static assert on features bitmap size.
> ---
>   drivers/resctrl/mpam_devices.c  | 151 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  53 +++++++++++
>   2 files changed, 204 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index a26b012452e2..ba8e8839cdc4 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
[SNIP]
> @@ -592,6 +736,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   	mutex_lock(&msc->part_sel_lock);
>   	idr = mpam_msc_read_idr(msc);
>   	mutex_unlock(&msc->part_sel_lock);
> +

Adding this blank line is irrelevant to this patch. It's better to move 
this blank line to the original patch #11.

>   	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>   
>   	/* Use these values so partid/pmg always starts with a valid value */
> @@ -614,6 +759,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   		mutex_unlock(&mpam_list_lock);
>   		if (IS_ERR(ris))
>   			return PTR_ERR(ris);
> +		ris->idr = idr;
> +
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		mpam_ris_hw_probe(ris);
> +		mutex_unlock(&msc->part_sel_lock);
>   	}
>   
>   	spin_lock(&partid_max_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4cc44d4e21c4..5ae5d4eee8ec 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -112,6 +112,55 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>   	raw_spin_lock_init(&msc->_mon_sel_lock);
>   }
>   
> +/*
> + * When we compact the supported features, we don't care what they are.
> + * Storing them as a bitmap makes life easy.
> + */
> +typedef u16 mpam_features_t;
> +

mpam_features_t is changed to u32 later in patch #21. It's better to 
directly define it as u32 here and remove the type change in patch #21.

[SNIP]

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-09-10 20:42 ` [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
  2025-09-12 11:49   ` Jonathan Cameron
@ 2025-10-05  1:28   ` Fenghua Yu
  1 sibling, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-10-05  1:28 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan


On 9/10/25 13:42, James Morse wrote:
> To make a decision about whether to expose an mpam class as
> a resctrl resource we need to know its overall supported
> features and properties.
>
> Once we've probed all the resources, we can walk the tree
> and produce overall values by merging the bitmaps. This
> eliminates features that are only supported by some MSC
> that make up a component or class.
>
> If bitmap properties are mismatched within a component we
> cannot support the mismatched feature.
>
> Care has to be taken as vMSC may hold mismatched RIS.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-09-10 20:42 ` [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
  2025-09-12 11:57   ` Jonathan Cameron
@ 2025-10-05 21:08   ` Fenghua Yu
  1 sibling, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-10-05 21:08 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan


On 9/10/25 13:42, James Morse wrote:
> Resetting RIS entries from the cpuhp callback is easy as the
> callback occurs on the correct CPU. This won't be true for any other
> caller that wants to reset or configure an MSC.
>
> Add a helper that schedules the provided function if necessary.
>
> Callers should take the cpuhp lock to prevent the cpuhp callbacks from
> changing the MSC state.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

[SNIP]


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors
  2025-09-12 13:11   ` Jonathan Cameron
@ 2025-10-06 14:57     ` James Morse
  2025-10-06 15:56     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-06 14:57 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 12/09/2025 14:11, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:02 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> MPAM's MSC support a number of monitors, each of which supports
>> bandwidth counters, or cache-storage-utilisation counters. To use
>> a counter, a monitor needs to be configured. Add helpers to allocate
>> and free CSU or MBWU monitors.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> 
> One minor requested change inline that will probably otherwise get picked
> up by someone's cleanup script
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 326ba9114d70..81c4c2bfea3d 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h

>> +static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
>> +{
>> +	struct mpam_props *cprops = &class->props;
>> +
>> +	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
>> +		return -EOPNOTSUPP;
>> +
>> +	return ida_alloc_range(&class->ida_mbwu_mon, 0,
>> +			       cprops->num_mbwu_mon - 1, GFP_KERNEL);
> 
> ida_alloc_max() - which is just a wrapper that sets the minimum to 0
> but none the less perhaps conveys things slightly better.

Sure - I didn't spot that when I did this.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors
  2025-09-12 13:11   ` Jonathan Cameron
  2025-10-06 14:57     ` James Morse
@ 2025-10-06 15:56     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-06 15:56 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 12/09/2025 14:11, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:02 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> MPAM's MSC support a number of monitors, each of which supports
>> bandwidth counters, or cache-storage-utilisation counters. To use
>> a counter, a monitor needs to be configured. Add helpers to allocate
>> and free CSU or MBWU monitors.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> 
> One minor requested change inline that will probably otherwise get picked
> up by someone's cleanup script
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 326ba9114d70..81c4c2bfea3d 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h

>> +static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
>> +{
>> +	struct mpam_props *cprops = &class->props;
>> +
>> +	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
>> +		return -EOPNOTSUPP;
>> +
>> +	return ida_alloc_range(&class->ida_mbwu_mon, 0,
>> +			       cprops->num_mbwu_mon - 1, GFP_KERNEL);
> 
> ida_alloc_max() - which is just a wrapper that sets the minimum to 0
> but none the less perhaps conveys things slightly better.

Sure - I didn't spot that when I did this.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-11 15:46   ` Ben Horgan
  2025-09-12 15:08     ` Ben Horgan
@ 2025-10-06 15:59     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-06 15:59 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 11/09/2025 16:46, Ben Horgan wrote:
> On 9/10/25 21:43, James Morse wrote:
>> Reading a monitor involves configuring what you want to monitor, and
>> reading the value. Components made up of multiple MSC may need values
>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>> The maximum 'not ready' time should have been provided by firmware.
>>
>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>> not ready, then wait the full timeout value before trying again.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index cf190f896de1..1543c33c5d6a 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -898,6 +898,232 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)

>> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
>> +{
>> +	int err, idx;
>> +	struct mpam_msc *msc;
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_msc_ris *ris;
>> +
>> +	idx = srcu_read_lock(&mpam_srcu);
>> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> 
> This can be list_for_each_entry_srcu(). (I thought I'd already commented
> but turns out that was on another patch.)




> 
>> +		msc = vmsc->msc;
>> +
>> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> 
> Also here.


Yup - thanks I missed these. Fixed.
I bet I went cross-eyed between here and mpam_apply_config() as they are structurally similar.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-12 15:08     ` Ben Horgan
@ 2025-10-06 16:00       ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-06 16:00 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 16:08, Ben Horgan wrote:
> On 9/11/25 16:46, Ben Horgan wrote:
>> On 9/10/25 21:43, James Morse wrote:
>>> Reading a monitor involves configuring what you want to monitor, and
>>> reading the value. Components made up of multiple MSC may need values
>>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>>> The maximum 'not ready' time should have been provided by firmware.
>>>
>>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>>> not ready, then wait the full timeout value before trying again.

>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>>> index cf190f896de1..1543c33c5d6a 100644
>>> --- a/drivers/resctrl/mpam_devices.c
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> +
>>> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>>> +				   u32 *flt_val)
>>> +{
>>> +	struct mon_cfg *ctx = m->ctx;
>>> +
>>> +	/*
>>> +	 * For CSU counters its implementation-defined what happens when not
>>> +	 * filtering by partid.
>>> +	 */
>>> +	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>>> +
>>> +	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
>>> +	if (m->ctx->match_pmg) {
>>> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>>> +		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
>>> +	}
>>> +
>>> +	switch (m->type) {
>>> +	case mpam_feat_msmon_csu:
>>> +		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
>>> +
>>> +		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
>>> +			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
>>> +					       ctx->csu_exclude_clean);
>>> +
>>> +		break;
>>> +	case mpam_feat_msmon_mbwu:
>>> +		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
> 
> As you mentioned offline, this zeroes the other bits in *ctl_val.

Yes - a bug introduced during a late night, I've fixed this up.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
                     ` (2 preceding siblings ...)
  2025-10-03 16:54   ` Fenghua Yu
@ 2025-10-06 23:13   ` Gavin Shan
  2025-10-17 18:51     ` James Morse
  3 siblings, 1 reply; 200+ messages in thread
From: Gavin Shan @ 2025-10-06 23:13 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Hi James,

On 9/11/25 6:42 AM, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> Add support for creating and destroying structures to allow a hierarchy
> of resources to be created.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Fixed a comp/vmsc typo.
>   * Removed duplicate description from the commit message.
>   * Moved parenthesis in the add_to_garbage() macro.
>   * Check for out of range ris_idx when creating ris.
>   * Removed GFP as probe_lock is no longer a spin lock.
>   * Removed alloc flag as ended up searching the lists itself.
>   * Added a comment about affinity masks not overlapping.
> 
> Changes since RFC:
>   * removed a pr_err() debug message that crept in.
> ---
>   drivers/resctrl/mpam_devices.c  | 406 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  90 +++++++
>   include/linux/arm_mpam.h        |   8 +-
>   3 files changed, 493 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index efc4738e3b4d..c7f4981b3545 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -18,7 +18,6 @@
>   #include <linux/printk.h>
>   #include <linux/slab.h>
>   #include <linux/spinlock.h>
> -#include <linux/srcu.h>
>   #include <linux/types.h>
>   
>   #include "mpam_internal.h"
> @@ -31,7 +30,7 @@
>   static DEFINE_MUTEX(mpam_list_lock);
>   static LIST_HEAD(mpam_all_msc);
>   
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
>   
>   /*
>    * Number of MSCs that have been probed. Once all MSC have been probed MPAM
> @@ -39,6 +38,402 @@ static struct srcu_struct mpam_srcu;
>    */
>   static atomic_t mpam_num_msc;
>   
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
> +	if (!vmsc)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(vmsc);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
> +
> +static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
> +				       struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (vmsc->msc->id == msc->id)
> +			return vmsc;
> +	}
> +
> +	return mpam_vmsc_alloc(comp, msc);
> +}
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(comp);
> +
> +	comp->comp_id = id;
> +	INIT_LIST_HEAD_RCU(&comp->vmsc);
> +	/* affinity is updated when ris are added */
> +	INIT_LIST_HEAD_RCU(&comp->class_list);
> +	comp->class = class;
> +
> +	list_add_rcu(&comp->class_list, &class->components);
> +
> +	return comp;
> +}
> +
> +static struct mpam_component *
> +mpam_component_get(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		if (comp->comp_id == id)
> +			return comp;
> +	}
> +
> +	return mpam_component_alloc(class, id);
> +}
> +
> +static struct mpam_class *
> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
> +{
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	class = kzalloc(sizeof(*class), GFP_KERNEL);
> +	if (!class)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(class);
> +
> +	INIT_LIST_HEAD_RCU(&class->components);
> +	/* affinity is updated when ris are added */
> +	class->level = level_idx;
> +	class->type = type;
> +	INIT_LIST_HEAD_RCU(&class->classes_list);
> +
> +	list_add_rcu(&class->classes_list, &mpam_classes);
> +
> +	return class;
> +}
> +
> +static struct mpam_class *
> +mpam_class_get(u8 level_idx, enum mpam_class_types type)
> +{
> +	bool found = false;
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		if (class->type == type && class->level == level_idx) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (found)
> +		return class;
> +
> +	return mpam_class_alloc(level_idx, type);
> +}
> +

The variable @found can be avoided if the found class can be returned immediately.

	list_for_each_entry(class, &mpam_classes, classes_list) {
		if (class->type == type && class->level == level_idx)
			return class;
	}

	return mpam_class_alloc(level_idx, type);

> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = (x);				\
> +	_x->garbage.to_free = _x;			\
> +	llist_add(&_x->garbage.llist, &mpam_garbage);	\
> +} while (0)
> +
> +static void mpam_class_destroy(struct mpam_class *class)
> +{
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&class->classes_list);
> +	add_to_garbage(class);
> +}
> +
> +static void mpam_comp_destroy(struct mpam_component *comp)
> +{
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&comp->class_list);
> +	add_to_garbage(comp);
> +
> +	if (list_empty(&class->components))
> +		mpam_class_destroy(class);
> +}
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_comp_destroy(comp);
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	/*
> +	 * It is assumed affinities don't overlap. If they do the class becomes
> +	 * unusable immediately.
> +	 */
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, &msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
> +	add_to_garbage(ris);
> +	ris->garbage.pdev = pdev;
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
> +
> +/*
> + * There are two ways of reaching a struct mpam_msc_ris. Via the
> + * class->component->vmsc->ris, or via the msc.
> + * When destroying the msc, the other side needs unlinking and cleaning up too.
> + */
> +static void mpam_msc_destroy(struct mpam_msc *msc)
> +{
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_msc_ris *ris, *tmp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> +		mpam_ris_destroy(ris);
> +
> +	list_del_rcu(&msc->all_msc_list);
> +	platform_set_drvdata(pdev, NULL);
> +
> +	add_to_garbage(msc);
> +	msc->garbage.pdev = pdev;
> +}
> +
> +static void mpam_free_garbage(void)
> +{
> +	struct mpam_garbage *iter, *tmp;
> +	struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +
> +	if (!to_free)
> +		return;
> +
> +	synchronize_srcu(&mpam_srcu);
> +
> +	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> +		if (iter->pdev)
> +			devm_kfree(&iter->pdev->dev, iter->to_free);
> +		else
> +			kfree(iter->to_free);
> +	}
> +}
> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity)
> +{
> +	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
> +}
> +
> +/*
> + * cpumask_of_node() only knows about online CPUs. This can't tell us whether
> + * a class is represented on all possible CPUs.
> + */
> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (node_id == cpu_to_node(cpu))
> +			cpumask_set_cpu(cpu, affinity);
> +	}
> +}
> +
> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
> +				 enum mpam_class_types type,
> +				 struct mpam_class *class,
> +				 struct mpam_component *comp)
> +{
> +	int err;
> +
> +	switch (type) {
> +	case MPAM_CLASS_CACHE:
> +		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
> +						     affinity);
> +		if (err)
> +			return err;
> +
> +		if (cpumask_empty(affinity))
> +			pr_warn_once("%s no CPUs associated with cache node",
> +				     dev_name(&msc->pdev->dev));
> +
> +		break;

"\n" missed in the error message and dev_warn_once() can be used:

		if (cpumask_empty(affinity))
			dev_warn_once(&msc->pdev->dev, "No CPUs associated with cache node\n");

> +	case MPAM_CLASS_MEMORY:
> +		get_cpumask_from_node_id(comp->comp_id, affinity);
> +		/* affinity may be empty for CPU-less memory nodes */
> +		break;
> +	case MPAM_CLASS_UNKNOWN:
> +		return 0;
> +	}
> +
> +	cpumask_and(affinity, affinity, &msc->accessibility);
> +
> +	return 0;
> +}
> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
> +		return -EINVAL;
> +
> +	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
> +		return -EBUSY;
> +
> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(ris);
> +
> +	class = mpam_class_get(class_id, type);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);
> +
> +	comp = mpam_component_get(class, component_id);
> +	if (IS_ERR(comp)) {
> +		if (list_empty(&class->components))
> +			mpam_class_destroy(class);
> +		return PTR_ERR(comp);
> +	}
> +
> +	vmsc = mpam_vmsc_get(comp, msc);
> +	if (IS_ERR(vmsc)) {
> +		if (list_empty(&comp->vmsc))
> +			mpam_comp_destroy(comp);
> +		return PTR_ERR(vmsc);
> +	}
> +
> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> +	if (err) {
> +		if (list_empty(&vmsc->ris))
> +			mpam_vmsc_destroy(vmsc);
> +		return err;
> +	}
> +
> +	ris->ris_idx = ris_idx;
> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> +	ris->vmsc = vmsc;
> +
> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +
> +	return 0;
> +}
> +
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id)
> +{
> +	int err;
> +
> +	mutex_lock(&mpam_list_lock);
> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> +				     component_id);
> +	mutex_unlock(&mpam_list_lock);
> +	if (err)
> +		mpam_free_garbage();
> +
> +	return err;
> +}
> +
>   /*
>    * An MSC can control traffic from a set of CPUs, but may only be accessible
>    * from a (hopefully wider) set of CPUs. The common reason for this is power
> @@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>   		return;
>   
>   	mutex_lock(&mpam_list_lock);
> -	platform_set_drvdata(pdev, NULL);
> -	list_del_rcu(&msc->all_msc_list);
> -	synchronize_srcu(&mpam_srcu);
> +	mpam_msc_destroy(msc);
>   	mutex_unlock(&mpam_list_lock);
> +
> +	mpam_free_garbage();
>   }
>   
>   static int mpam_msc_drv_probe(struct platform_device *pdev)
> @@ -95,6 +490,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>   			err = -ENOMEM;
>   			break;
>   		}
> +		init_garbage(msc);
>   
>   		mutex_init(&msc->probe_lock);
>   		mutex_init(&msc->part_sel_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 7c63d590fc98..02e9576ece6b 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -7,10 +7,29 @@
>   #include <linux/arm_mpam.h>
>   #include <linux/cpumask.h>
>   #include <linux/io.h>
> +#include <linux/llist.h>
>   #include <linux/mailbox_client.h>
>   #include <linux/mutex.h>
>   #include <linux/resctrl.h>
>   #include <linux/sizes.h>
> +#include <linux/srcu.h>
> +
> +#define MPAM_MSC_MAX_NUM_RIS	16
> +
> +/*
> + * Structures protected by SRCU may not be freed for a surprising amount of
> + * time (especially if perf is running). To ensure the MPAM error interrupt can
> + * tear down all the structures, build a list of objects that can be gargbage
> + * collected once synchronize_srcu() has returned.
> + * If pdev is non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> +	/* member of mpam_garbage */
> +	struct llist_node	llist;
> +
> +	void			*to_free;
> +	struct platform_device	*pdev;
> +};
>   
>   struct mpam_msc {
>   	/* member of mpam_all_msc */
> @@ -57,8 +76,79 @@ struct mpam_msc {
>   
>   	void __iomem		*mapped_hwpage;
>   	size_t			mapped_hwpage_sz;
> +
> +	struct mpam_garbage	garbage;
>   };
>   
> +struct mpam_class {
> +	/* mpam_components in this class */
> +	struct list_head	components;
> +
> +	cpumask_t		affinity;
> +
> +	u8			level;
> +	enum mpam_class_types	type;
> +
> +	/* member of mpam_classes */
> +	struct list_head	classes_list;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_component {
> +	u32			comp_id;
> +
> +	/* mpam_vmsc in this component */
> +	struct list_head	vmsc;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_class:components */
> +	struct list_head	class_list;
> +
> +	/* parent: */
> +	struct mpam_class	*class;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_vmsc {
> +	/* member of mpam_component:vmsc_list */
> +	struct list_head	comp_list;
> +
> +	/* mpam_msc_ris in this vmsc */
> +	struct list_head	ris;
> +
> +	/* All RIS in this vMSC are members of this MSC */
> +	struct mpam_msc		*msc;
> +
> +	/* parent: */
> +	struct mpam_component	*comp;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_msc_ris {
> +	u8			ris_idx;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_vmsc:ris */
> +	struct list_head	vmsc_list;
> +
> +	/* member of mpam_msc:ris */
> +	struct list_head	msc_list;
> +
> +	/* parent: */
> +	struct mpam_vmsc	*vmsc;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +/* List of all classes - protected by srcu*/
> +extern struct srcu_struct mpam_srcu;
> +extern struct list_head mpam_classes;
> +
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 3d6c39c667c3..3206f5ddc147 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -38,11 +38,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
>   static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>   #endif
>   
> -static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> -				  enum mpam_class_types type, u8 class_id,
> -				  int component_id)
> -{
> -	return -EINVAL;
> -}
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id);
>   
>   #endif /* __LINUX_ARM_MPAM_H */

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-09-10 20:42 ` [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
  2025-09-11 15:00   ` Jonathan Cameron
  2025-09-12  7:33   ` Markus Elfring
@ 2025-10-06 23:25   ` Gavin Shan
  2 siblings, 0 replies; 200+ messages in thread
From: Gavin Shan @ 2025-10-06 23:25 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Hi James,

On 9/11/25 6:42 AM, James Morse wrote:
> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
> 
> Link: https://developer.arm.com/documentation/ihi0099/latest/
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v1:
>   * Whitespace.
>   * Added constants for CASSOC and XCL.
>   * Merged FLT/CTL defines.
>   * Fixed MSMON_CFG_CSU_CTL_TYPE_CSU definition.
> 
> Changes since RFC:
>   * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
>   * Whitepsace churn.
>   * Cite a more recent document.
>   * Removed some stale feature, fixed some names etc.
> ---
>   drivers/resctrl/mpam_internal.h | 267 ++++++++++++++++++++++++++++++++
>   1 file changed, 267 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 02e9576ece6b..109f03df46c2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -152,4 +152,271 @@ extern struct list_head mpam_classes;
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/
> + */
> +#define MPAM_ARCHITECTURE_V1    0x10
> +
> +/* Memory mapped control pages: */

":" seems unnecessary.

> +/* ID Register offsets in the memory mapped page */
> +#define MPAMF_IDR		0x0000  /* features id register */
> +#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
> +#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
> +#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
> +#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
> +#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
> +#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
> +#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
> +#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
> +#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
> +#define MPAMF_IIDR		0x0018  /* implementer id register */
> +#define MPAMF_AIDR		0x0020  /* architectural id register */
> +
> +/* Configuration and Status Register offsets in the memory mapped page */
> +#define MPAMCFG_PART_SEL	0x0100  /* partid to configure: */

":" seems unnecessary.

> +#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
> +#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
> +#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
> +#define MPAMCFG_CASSOC		0x0118  /* cache-associativity config */
> +#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
> +#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
> +#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
> +#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
> +#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
> +#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
> +#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
> +

[...]

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
                     ` (2 preceding siblings ...)
  2025-10-03 17:56   ` Fenghua Yu
@ 2025-10-06 23:42   ` Gavin Shan
  3 siblings, 0 replies; 200+ messages in thread
From: Gavin Shan @ 2025-10-06 23:42 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen

On 9/11/25 6:42 AM, James Morse wrote:
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
> 
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed as each CPU's online call is made.
> 
> This adds the low-level MSC register accessors.
> 
> Once all MSCs reported by the firmware have been probed from a CPU in
> their respective cpu-affinity set, the probe-time cpuhp callbacks are
> replaced.  The replacement callbacks will ultimately need to handle
> save/restore of the runtime MSC state across power transitions, but for
> now there is nothing to do in them: so do nothing.
> 
> The architecture's context switch code will be enabled by a static-key,
> this can be set by mpam_enable(), but must be done from process context,
> not a cpuhp callback because both take the cpuhp lock.
> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
> to test if all the MSCs have been probed. If probing fails, mpam_disable()
> is scheduled to unregister the cpuhp callbacks and free memory.
> 
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Removed register bounds check. If the firmware tables are wrong the
>     resulting translation fault should be enough to debug this.
>   * Removed '&' in front of a function pointer.
>   * Pulled mpam_disable() into this patch.
>   * Disable mpam when probing fails to avoid extra work on broken platforms.
>   * Added mpam_disbale_reason as there are now two non-debug reasons for this
>     to happen.
> ---
>   drivers/resctrl/mpam_devices.c  | 173 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |   5 +
>   2 files changed, 177 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-09-12 12:22   ` Jonathan Cameron
@ 2025-10-07 11:11     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-07 11:11 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 13:22, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:00 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> When CPUs come online the MSC's original configuration should be restored.
>>
>> Add struct mpam_config to hold the configuration. This has a bitmap of
>> features that were modified. Once the maximum partid is known, allocate
>> a configuration array for each component, and reprogram each RIS
>> configuration from this.

> Trivial comments
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> +static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
>> +{
>> +	memset(reset_cfg, 0, sizeof(*reset_cfg));
> 
> Might as well do the following and skip the memset.
> 
> 	*reset_cfg = (struct mpam_config) {
> 		.features = ~0,
> 		.cpbm = ~0,
> 		.mbw_pbm = ~0,
> 		.mbw_max = MPAM...
> 		.reset_cpbm = true,
> 		.reset_mbw_pbm = true,
> 	};

Sure,


>> +	reset_cfg->features = ~0;
>> +	reset_cfg->cpbm = ~0;
>> +	reset_cfg->mbw_pbm = ~0;
>> +	reset_cfg->mbw_max = MPAMCFG_MBW_MAX_MAX;
>> +
>> +	reset_cfg->reset_cpbm = true;
>> +	reset_cfg->reset_mbw_pbm = true;
>> +}
> 
>> +static int mpam_allocate_config(void)
>> +{
>> +	int err = 0;
> 
> Always set before use. Maybe push down so it is in tighter scope and
> can declare and initialize to final value in one line.

Sure,


>> +	struct mpam_class *class;
>> +	struct mpam_component *comp;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	list_for_each_entry(class, &mpam_classes, classes_list) {
>> +		list_for_each_entry(comp, &class->components, class_list) {
>> +			err = __allocate_component_cfg(comp);
>> +			if (err)
>> +				return err;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
> 
> 
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index b69fa9199cb4..17570d9aae9b 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -169,11 +169,7 @@ struct mpam_props {
>>  	u16			num_mbwu_mon;
>>  };
>>  
>> -static inline bool mpam_has_feature(enum mpam_device_features feat,
>> -				    struct mpam_props *props)
>> -{
>> -	return (1 << feat) & props->features;
>> -}
>> +#define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
> 
> If this is worth doing push it back to original introduction.
> I'm not sure it is necessary.

It's the change in this patch that makes it necessary, maybe_update_config() goes calling
mpam_has_feature() on a configuration instead of a class/msc/ris props structure. I could
have made that a separate helper to get the type right - but making it a macro was simpler.

I'll push it back earlier.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters
  2025-09-12 13:27   ` Jonathan Cameron
@ 2025-10-09 17:48     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-09 17:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 12/09/2025 14:27, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:05 +0000
> James Morse <james.morse@arm.com> wrote:
>> From: Rohit Mathew <rohit.mathew@arm.com>
>>
>> mpam v0.1 and versions above v1.0 support optional long counter for
>> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
>> indicating support for long counters. As of now, a 44 bit counter
>> represented by HAS_LONG field (bit 30) and a 63 bit counter represented
>> by LWD (bit 29) can be optionally integrated. Probe for these counters
>> and set corresponding feature bits if any of these counters are present.

> I'd like a little more justification of the 'front facing' use for the first
> feature bit.  To me that seems confusing but I may well be missing why
> we can't have 3 exclusive features.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index eeb62ed94520..bae9fa9441dc 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -795,7 +795,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>>  				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
>>  		}
>>  		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
>> -			bool hw_managed;
>> +			bool has_long, hw_managed;
>>  			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
>>  
>>  			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
>> @@ -805,6 +805,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>>  			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
>>  				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
>>  
>> +			/*
>> +			 * Treat long counter and its extension, lwd as mutually
>> +			 * exclusive feature bits. Though these are dependent
>> +			 * fields at the implementation level, there would never
>> +			 * be a need for mpam_feat_msmon_mbwu_44counter (long
>> +			 * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
>> +			 * bits to be set together.
>> +			 *
>> +			 * mpam_feat_msmon_mbwu isn't treated as an exclusive
>> +			 * bit as this feature bit would be used as the "front
>> +			 * facing feature bit" for any checks related to mbwu
>> +			 * monitors.

> Why do we need such a 'front facing' bit?  Why isn't it sufficient just to
> add a little helper or macro to find out if mbwu is turned on?

(I read Rohit's front-facing as top-level).

I think Rohit thought it would be simpler - there is one feature enum that gets passed in
from the resctrl glue code saying "I want to read a bandwidth counter", because there
is only one, and it doesn't care what size. I think Rohit didn't want to touch that code!

As that is really a separate concept, I think its worth handling explicitly:
mpam_feat_msmon_mbwu means there are counters, and mpam_feat_msmon_mbwu_{31,44,63}counter
say which ones are supported.

The helper you suggest an then pick which one is best.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-12 13:21   ` Jonathan Cameron
@ 2025-10-09 17:48     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-09 17:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 14:21, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:03 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Reading a monitor involves configuring what you want to monitor, and
>> reading the value. Components made up of multiple MSC may need values
>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>> The maximum 'not ready' time should have been provided by firmware.
>>
>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>> not ready, then wait the full timeout value before trying again.
>>
>> CC: Shanker Donthineni <sdonthineni@nvidia.com>
>> Signed-off-by: James Morse <james.morse@arm.com>

>> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>> +				     u32 flt_val)
>> +{
>> +	struct mpam_msc *msc = m->ris->vmsc->msc;
>> +
>> +	/*
>> +	 * Write the ctl_val with the enable bit cleared, reset the counter,
>> +	 * then enable counter.
>> +	 */
>> +	switch (m->type) {
>> +	case mpam_feat_msmon_csu:
>> +		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
>> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
>> +		mpam_write_monsel_reg(msc, CSU, 0);
>> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>> +		break;
>> +	case mpam_feat_msmon_mbwu:
>> +		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
>> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>> +		mpam_write_monsel_reg(msc, MBWU, 0);
>> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>> +		break;
> 
> Given nothing to do later, I'd just return at end of each case.
> Entirely up to you though as this is just a personal style preference.

Out of habit I try to avoid functions returning from different places, as it makes it
harder to add locking later. Maybe the fancy cleanup c++ thing changes that.


>> +	default:
>> +		return;

But I'm clearly inconsistent!
I've changes this as you suggest.


>> +	}
>> +}
> 
>> +
>> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
>> +{
>> +	int err, idx;
>> +	struct mpam_msc *msc;
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_msc_ris *ris;
>> +
>> +	idx = srcu_read_lock(&mpam_srcu);
> 
> 	guard(srcu)(&mpam_srcu);
> 
> Then you can do direct returns on errors which looks like it will simplify
> things somewhat by letting you just return on err.
> 
> 
>> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
>> +		msc = vmsc->msc;
> I'd bring the declaration down here as well.
> 		struct mpam_msc *msc = vmsc->msc;
> Could bring ris down here as well.
> 
>> +
>> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
>> +			arg->ris = ris;
>> +
>> +			err = smp_call_function_any(&msc->accessibility,
>> +						    __ris_msmon_read, arg,
>> +						    true);
>> +			if (!err && arg->err)
>> +				err = arg->err;
>> +			if (err)
>> +				break;
>> +		}
>> +		if (err)
>> +			break;
> 
> This won't be needed if you returned on error above.
> 
>> +	}
>> +	srcu_read_unlock(&mpam_srcu, idx);
>> +
>> +	return err;

> And you only reach here with above changes if err == 0 so return 0; appropriate.

I've done all of the above.
(I'd even already worked it out from your earlier feedback)



>> +}
>> +
>> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>> +		    enum mpam_device_features type, u64 *val)
>> +{
>> +	int err;
>> +	struct mon_read arg;
>> +	u64 wait_jiffies = 0;
>> +	struct mpam_props *cprops = &comp->class->props;
>> +
>> +	might_sleep();
>> +
>> +	if (!mpam_is_enabled())
>> +		return -EIO;
>> +
>> +	if (!mpam_has_feature(type, cprops))
>> +		return -EOPNOTSUPP;
>> +
>> +	memset(&arg, 0, sizeof(arg));

> Either use = { }; at declaration or maybe
> 	arg = (struct mon_read) {
> 		.ctx = ctx,
> 		.type = type,
> 		.val = val,
> 	};
> 
> rather than bothering with separate memset.

The memset is because arg gets re-used, but there are fields not touched here that get
modified, like 'err'. But this struct assignment works too...


>> +	arg.ctx = ctx;
>> +	arg.type = type;
>> +	arg.val = val;
>> +	*val = 0;
>> +
>> +	err = _msmon_read(comp, &arg);
>> +	if (err == -EBUSY && comp->class->nrdy_usec)
>> +		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
>> +
>> +	while (wait_jiffies)
>> +		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
>> +
>> +	if (err == -EBUSY) {
>> +		memset(&arg, 0, sizeof(arg));

> Same as above. 

Yup - its like this so that they look the same.


>> +		arg.ctx = ctx;
>> +		arg.type = type;
>> +		arg.val = val;
>> +		*val = 0;
>> +
>> +		err = _msmon_read(comp, &arg);
>> +	}
>> +
>> +	return err;
>> +}
> 


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-09-25  2:30   ` Fenghua Yu
@ 2025-10-09 17:48     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-09 17:48 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 25/09/2025 03:30, Fenghua Yu wrote:
> On 9/10/25 13:43, James Morse wrote:
>> Reading a monitor involves configuring what you want to monitor, and
>> reading the value. Components made up of multiple MSC may need values
>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>> The maximum 'not ready' time should have been provided by firmware.
>>
>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>> not ready, then wait the full timeout value before trying again.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index cf190f896de1..1543c33c5d6a 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -898,6 +898,232 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)

>> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
>> +{
>> +    int err, idx;

> Can err be initialized to some error code e.g. -ENODEV?

Feedback from Jonathan has led to numerous changes here.


>> +    struct mpam_msc *msc;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    idx = srcu_read_lock(&mpam_srcu);
>> +    list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
>> +        msc = vmsc->msc;
>> +
>> +        list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
>> +            arg->ris = ris;
>> +
>> +            err = smp_call_function_any(&msc->accessibility,
>> +                            __ris_msmon_read, arg,
>> +                            true);
>> +            if (!err && arg->err)
>> +                err = arg->err;
>> +            if (err)
>> +                break;
>> +        }
>> +        if (err)
>> +            break;
>> +    }
> 
> comp->vmsc or vmsc->ris usually are not empty. But in some condition, they can be empty.

Really? There is no call to create a VMSC - only to create a RIS with a set of ids. If you
provide an ID for a VMSC that doesn't exist yet - it gets created for the RIS to be added to.

There was some defensive programming earlier, (and a comment that said it wasn't
possible). That was some left over debug.


> In that case, uninitialized err value may cause unexpected behavior for the callers.
> 
> So it's better to initialize err to avoid any complexity.

Not complexity - the risk is returning an uninitialised value.

After Jonathan's feedback, and my guess at to what Zheng is seeing, it looks like this:
----------%<----------
static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
{
	int err,  any_err = 0;
	struct mpam_vmsc *vmsc;

	guard(srcu)(&mpam_srcu);
	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
				 srcu_read_lock_held(&mpam_srcu)) {
		struct mpam_msc *msc = vmsc->msc;
		struct mpam_msc_ris *ris;

		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
					 srcu_read_lock_held(&mpam_srcu)) {
			arg->ris = ris;

			err = smp_call_function_any(&msc->accessibility,
						    __ris_msmon_read, arg,
						    true);
			if (!err && arg->err)
				err = arg->err;

			/*
			 * Save one error to be returned to the caller, but
			 * keep reading counters so that get reprogrammed. On
			 * platforms with NRDY this lets us wait once.
			 */
			if (err)
				any_err = err;
		}
	}

	return any_err;
}
----------%<----------


>> +    srcu_read_unlock(&mpam_srcu, idx);
>> +
>> +    return err;
>> +}


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-09-12 13:24   ` Jonathan Cameron
@ 2025-10-09 17:48     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-09 17:48 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 14:24, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:04 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Bandwidth counters need to run continuously to correctly reflect the
>> bandwidth.
>>
>> The value read may be lower than the previous value read in the case
>> of overflow and when the hardware is reset due to CPU hotplug.
>>
>> Add struct mbwu_state to track the bandwidth counter to allow overflow
>> and power management to be handled.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>

> Trivial comment inline.  I haven't spent enough time thinking about this
> to give a proper review so no tags yet.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 1543c33c5d6a..eeb62ed94520 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -918,6 +918,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>>  	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>>  
>>  	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
>> +

> Unrelated change.  If it makes sense figure out where to push it back to.

Done. This is may favourite mistake to make with a merge conflict!


Thanks,

James


>>  	if (m->ctx->match_pmg) {
>>  		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>>  		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
> 


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported
  2025-09-12 13:29   ` Jonathan Cameron
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 12/09/2025 14:29, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:06 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> From: Rohit Mathew <rohit.mathew@arm.com>
>>
>> If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
>> the RIS, use long/LWD counter instead of the regular 31 bit mbwu
>> counter.
>>
>> Only 32bit accesses to the MSC are required to be supported by the
>> spec, but these registers are 64bits. The lower half may overflow
>> into the higher half between two 32bit reads. To avoid this, use
>> a helper that reads the top half multiple times to check for overflow.
>>
>> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
>> [morse: merged multiple patches from Rohit]
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!

Your push back on the 'front facing' thing in the previous patch made some knock on
changes here, but I think they're minor.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-12 13:33   ` Jonathan Cameron
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 14:33, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:07 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> resctrl expects to reset the bandwidth counters when the filesystem
>> is mounted.
>>
>> To allow this, add a helper that clears the saved mbwu state. Instead
>> of cross calling to each CPU that can access the component MSC to
>> write to the counter, set a flag that causes it to be zero'd on the
>> the next read. This is easily done by forcing a configuration update.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>

> Minor comments inline.


>> @@ -1245,6 +1257,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>>  	return err;
>>  }
>>  
>> +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
>> +{
>> +	int idx;
>> +	struct mpam_msc *msc;
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_msc_ris *ris;
>> +
>> +	if (!mpam_is_enabled())
>> +		return;
>> +
>> +	idx = srcu_read_lock(&mpam_srcu);
> 
> Maybe guard() though it doesn't add that much here.

'Fixed' already based on your other feedback.


>> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> 
> Reason not to use _srcu variants?

Typo - I'd switched it all to srcu because of the pcc thing's need to sleep, but didn't
fix all these properly.



>> +		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
>> +			continue;
>> +
>> +		msc = vmsc->msc;
>> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
>> +			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
>> +				continue;
>> +
>> +			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
>> +				continue;
>> +
>> +			ris->mbwu_state[ctx->mon].correction = 0;
>> +			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
>> +			mpam_mon_sel_unlock(msc);
>> +		}
>> +	}
>> +	srcu_read_unlock(&mpam_srcu, idx);
>> +}
>> +
>>  static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>>  {
>>  	u32 num_words, msb;
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index c190826dfbda..7cbcafe8294a 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -223,10 +223,12 @@ struct mon_cfg {
>>  
>>  /*
>>   * Changes to enabled and cfg are protected by the msc->lock.
>> - * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
>> + * Changes to reset_on_next_read, prev_val and correction are protected by the
>> + * msc's mon_sel_lock.

> Getting close to the point where a list of one per line would reduce churn.
> If you anticipate adding more to this in future I'd definitely consider it.
> e.g.
>  * msc's mon_sel_lcok protects:
>  * - reset_on_next_read
>  * - prev_val
>  * - correction
>  */

It doesn't get expanded further, this is the last patch of the driver. But this is easier
to read, so I'll do that.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-18  2:35   ` Shaopeng Tan (Fujitsu)
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Shaopeng,

On 18/09/2025 03:35, Shaopeng Tan (Fujitsu) wrote:
>> resctrl expects to reset the bandwidth counters when the filesystem is
>> mounted.
>>
>> To allow this, add a helper that clears the saved mbwu state. Instead of cross
>> calling to each CPU that can access the component MSC to write to the counter,
>> set a flag that causes it to be zero'd on the the next read. This is easily done by
>> forcing a configuration update.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 3080a81f0845..8254d6190ca2 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1112,7 +1122,10 @@ static void __ris_msmon_read(void *arg)
>>  	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
>>  	clean_msmon_ctl_val(&cur_ctl);
>>  	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>> -	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
>> +	config_mismatch = cur_flt != flt_val ||
>> +			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>> +
>> +	if (config_mismatch || reset_on_next_read)
>>  		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);

I don't have a platform that implements any of the bandwidth counters, so may need a hand
to debug this ...

> mbm_handle_overflow() calls __ris_msmon_read() every second. 
> If there are multiple monitor groups, the config_mismatch will "true" every second. 

It shouldn't be - I think you've forced it into a pathalogical case that the resctrl glue
code tries very hard to avoid.

The pattern of allocating a montior, detecing a mismatch and reconfiguring it is needed
for CSU. That stuff is re-usable for MBWU, but you never want it to happen outside
control/monitor group creation because it means you're losing data.

For those reading along at home:
resctrl expects there to be as many hardware monitors as PARTID*PMG - because every
control and monitor group has 'mbm_total_bytes' or equivalent files. User-space can read
these at any time, and the deal is they start at 0 from boot, and reset when the control
or monitor group is created.

This means the MPAM driver needs to have enough, and it needs to pre-configure them on
startup.

The resctrl glue code calls this 'free running'. It means when you call
resctrl_arch_mon_ctx_alloc() for a bandwidth monitor - it doesn't allocate a context, but
returns a magic out of range value 'USE_RMID_IDX' so that subsequent calls use the
pre-allocated monitor.

If you don't have PARTID*PMG's worth of monitors - you can't have resctrl's
mbm_total_bytes interface. People regularly complain about this - but the alternative is
counters that randomly reset, meaning you could never trust the value.
I have no intention of supporting that mode, (its already available in /dev/urandom!)

If you're seeing this mismatch happen from the overflow thread - I think you've forced
the mbwu counters on when you don't have enough monitors.
Even if the resctrl overflow 'thread' used the same mon_ctx - USE_RMID_IDX means it will
access a different hardware monitor each time.
Another option is clean_msmon_ctl_val() is missing a bit that is set by hardware, causing
the values to mismatch when they shouldn't.

Could you check mon_ctx is USE_RMID_IDX, and check which bits are mismatching?

> Then "mbwu_state->prev_val = 0;" in function write_msmon_ctl_flt_vals() will be always run.
> This means that for multiple monitoring groups, the MemoryBandwidth monitoring value is cleared every second.

Yes - this should never happen because the overflow thread should never cause a mismatch,
and the montiros should only be reconfigured when control/monitor groups are allocated.

Thanks,

James

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-09-26  4:11   ` Fenghua Yu
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 26/09/2025 05:11, Fenghua Yu wrote:
> On 9/10/25 13:43, James Morse wrote:
>> resctrl expects to reset the bandwidth counters when the filesystem
>> is mounted.
>>
>> To allow this, add a helper that clears the saved mbwu state. Instead
>> of cross calling to each CPU that can access the component MSC to
>> write to the counter, set a flag that causes it to be zero'd on the
>> the next read. This is easily done by forcing a configuration update.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Other than the following minor change,
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>

Thanks!


>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 3080a81f0845..8254d6190ca2 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c

>>   @@ -1245,6 +1257,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg
>> *ctx,
>>       return err;
>>   }
>>   +void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
>> +{
>> +    int idx;
>> +    struct mpam_msc *msc;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    if (!mpam_is_enabled())
>> +        return;
>> +
>> +    idx = srcu_read_lock(&mpam_srcu);

> guard(srcu)(&mpam_srcu);

Yeah, Jonathan had already suggested it.


>> +    list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
>> +        if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
>> +            continue;
>> +
>> +        msc = vmsc->msc;
>> +        list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
>> +            if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
>> +                continue;
>> +
>> +            if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
>> +                continue;
>> +
>> +            ris->mbwu_state[ctx->mon].correction = 0;
>> +            ris->mbwu_state[ctx->mon].reset_on_next_read = true;
>> +            mpam_mon_sel_unlock(msc);
>> +        }
>> +    }
>> +    srcu_read_unlock(&mpam_srcu, idx);
>> +}


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-12 13:37   ` Jonathan Cameron
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 14:37, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:08 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> The bitmap reset code has been a source of bugs. Add a unit test.
>>
>> This currently has to be built in, as the rest of the driver is
>> builtin.
>>
>> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>


> Few trivial comments inline.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!


>> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
>> index c30532a3a3a4..ef59b3057d5d 100644
>> --- a/drivers/resctrl/Kconfig
>> +++ b/drivers/resctrl/Kconfig
>> @@ -5,10 +5,20 @@ menuconfig ARM64_MPAM_DRIVER
>>  	  MPAM driver for System IP, e,g. caches and memory controllers.
>>  
>>  if ARM64_MPAM_DRIVER
>> +
>>  config ARM64_MPAM_DRIVER_DEBUG
>>  	bool "Enable debug messages from the MPAM driver"
>>  	depends on ARM64_MPAM_DRIVER
> 
> Doing this under an if for the same isn't useful. So if you want to do this
> style I'd do it before adding this earlier config option.

Yup, you pointed at the this same shape of bug earlier in the series.


>>  	help
>>  	  Say yes here to enable debug messages from the MPAM driver.
>>  
>> +config MPAM_KUNIT_TEST
>> +	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
>> +	depends on KUNIT=y
>> +	default KUNIT_ALL_TESTS
>> +	help
>> +	  Enable this option to run tests in the MPAM driver.
>> +
>> +	  If unsure, say N.
>> +
>>  endif
> 
>> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
>> new file mode 100644
>> index 000000000000..3e7058f7601c
>> --- /dev/null
>> +++ b/drivers/resctrl/test_mpam_devices.c
>> @@ -0,0 +1,68 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2024 Arm Ltd.
>> +/* This file is intended to be included into mpam_devices.c */
>> +
>> +#include <kunit/test.h>
>> +
>> +static void test_mpam_reset_msc_bitmap(struct kunit *test)
>> +{
>> +	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
>> +	struct mpam_msc fake_msc = {0};
> 
> = { }; is sufficient and what newer c specs have adopted to mean
> fill everything including holes in structures with 0.  There are some
> tests that ensure that behavior applies with older compilers + the options
> we use for building the kernel.

Muscle memory is difficult to overcome ... I've fixed this one, and will keep an eye out
for more.


>> +	u32 *test_result;
>> +
>> +	if (!buf)
>> +		return;
>> +
>> +	fake_msc.mapped_hwpage = buf;
>> +	fake_msc.mapped_hwpage_sz = SZ_16K;
>> +	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
>> +
>> +	mutex_init(&fake_msc.part_sel_lock);
>> +	mutex_lock(&fake_msc.part_sel_lock);

> Perhaps add a comment to say this is to satisfy lock markings?
> Otherwise someone might wonder why mutex_init() immediately followed
> by taking the lock maskes sense.

Makes sense, Done.

>> +
>> +	test_result = (u32 *)(buf + MPAMCFG_CPBM);


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-26  2:35   ` Fenghua Yu
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 26/09/2025 03:35, Fenghua Yu wrote:
> On 9/10/25 13:43, James Morse wrote:
>> The bitmap reset code has been a source of bugs. Add a unit test.
>>
>> This currently has to be built in, as the rest of the driver is
>> builtin.
>>
>> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-09-12 16:06   ` Ben Horgan
@ 2025-10-10 16:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:53 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 17:06, Ben Horgan wrote:
> On 9/10/25 21:43, James Morse wrote:
>> The bitmap reset code has been a source of bugs. Add a unit test.
>>
>> This currently has to be built in, as the rest of the driver is
>> builtin.

>> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
>> index c30532a3a3a4..ef59b3057d5d 100644
>> --- a/drivers/resctrl/Kconfig
>> +++ b/drivers/resctrl/Kconfig
>> @@ -5,10 +5,20 @@ menuconfig ARM64_MPAM_DRIVER
>>  	  MPAM driver for System IP, e,g. caches and memory controllers.
>>  
>>  if ARM64_MPAM_DRIVER
>> +
> 
> Nit: add the empty line in an earlier patch

Fixed,

> Reviewed-by: Ben Horgan <ben.horgan@arm.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-12 13:41   ` Jonathan Cameron
@ 2025-10-10 16:54     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:54 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 12/09/2025 14:41, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:43:09 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> When features are mismatched between MSC the way features are combined
>> to the class determines whether resctrl can support this SoC.
>>
>> Add some tests to illustrate the sort of thing that is expected to
>> work, and those that must be removed.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>

> Nice in general though I didn't go through the test expected results etc.

> A few comments inline.
> 
> Thanks and looking forward to seeing this go in.


>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 7cbcafe8294a..6119e4573187 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -189,7 +195,7 @@ struct mpam_props {
>>  	u16			dspri_wd;
>>  	u16			num_csu_mon;
>>  	u16			num_mbwu_mon;
>> -};
>> +} PACKED_FOR_KUNIT;
> 
> Add a comment on 'why'.

| * Kunit tests use memset() to set up feature combinations that should be
| * removed, and will false-positive if the compiler introduces padding that
| * isn't cleared during sanitisation.


>> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
>> index 3e7058f7601c..4eca8590c691 100644
>> --- a/drivers/resctrl/test_mpam_devices.c
>> +++ b/drivers/resctrl/test_mpam_devices.c
>> @@ -4,6 +4,325 @@
>>  
>>  #include <kunit/test.h>
>>  
>> +/*
>> + * This test catches fields that aren't being sanitised - but can't tell you
>> + * which one...
>> + */
>> +static void test__props_mismatch(struct kunit *test)
>> +{
>> +	struct mpam_props parent = { 0 };
>> +	struct mpam_props child;
>> +
>> +	memset(&child, 0xff, sizeof(child));
>> +	__props_mismatch(&parent, &child, false);
>> +
>> +	memset(&child, 0, sizeof(child));
>> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
>> +
>> +	memset(&child, 0xff, sizeof(child));
>> +	__props_mismatch(&parent, &child, true);
>> +
>> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
>> +}
>> +
>> +static struct list_head fake_classes_list;
>> +static struct mpam_class fake_class = { 0 };
>> +static struct mpam_component fake_comp1 = { 0 };
>> +static struct mpam_component fake_comp2 = { 0 };
>> +static struct mpam_vmsc fake_vmsc1 = { 0 };
>> +static struct mpam_vmsc fake_vmsc2 = { 0 };
>> +static struct mpam_msc fake_msc1 = { 0 };
>> +static struct mpam_msc fake_msc2 = { 0 };
>> +static struct mpam_msc_ris fake_ris1 = { 0 };
>> +static struct mpam_msc_ris fake_ris2 = { 0 };
>> +static struct platform_device fake_pdev = { 0 };
>> +
>> +static void test_mpam_enable_merge_features(struct kunit *test)
>> +{
>> +#define RESET_FAKE_HIEARCHY()	do {				\
>> +	INIT_LIST_HEAD(&fake_classes_list);			\
>> +								\
>> +	memset(&fake_class, 0, sizeof(fake_class));		\

> Maybe just use a function?  Seems to be changing stuff that is
> global mostly anyway so seems like it won't need large numbers
> of parameters or anything like that.

Sure - it only became global in v3.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-12 16:01   ` Ben Horgan
@ 2025-10-10 16:54     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:54 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 17:01, Ben Horgan wrote:
> On 9/10/25 21:43, James Morse wrote:
>> When features are mismatched between MSC the way features are combined
>> to the class determines whether resctrl can support this SoC.
>>
>> Add some tests to illustrate the sort of thing that is expected to
>> work, and those that must be removed.


> Looks good to me, I checked the tests for v1. I agree with Jonathan that
> you could make RESET_FAKE_HIEARCHY() a function now that you've changed
> to use globals.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-09-26  2:36   ` Fenghua Yu
@ 2025-10-10 16:54     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:54 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 26/09/2025 03:36, Fenghua Yu wrote:
> On 9/10/25 13:43, James Morse wrote:
>> When features are mismatched between MSC the way features are combined
>> to the class determines whether resctrl can support this SoC.
>>
>> Add some tests to illustrate the sort of thing that is expected to
>> work, and those that must be removed.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-25  9:32   ` Stanimir Varbanov
@ 2025-10-10 16:54     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:54 UTC (permalink / raw)
  To: Stanimir Varbanov, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Stan,

On 25/09/2025 10:32, Stanimir Varbanov wrote:
> On 9/10/25 11:42 PM, James Morse wrote:
>> The ACPI MPAM table uses the UID of a processor container specified in
>> the PPTT to indicate the subset of CPUs and cache topology that can
>> access each MPAM System Component (MSC).
>>
>> This information is not directly useful to the kernel. The equivalent
>> cpumask is needed instead.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.

>> +
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
>> + *                                       processor container
>> + * @acpi_cpu_id:	The UID of the processor container.
>> + * @cpus:		The resulting CPU mask.
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
>> + * Container, they may exist purely to describe a Private resource. CPUs
>> + * have to be leaves, so a Processor Container is a non-leaf that has the
>> + * 'ACPI Processor ID valid' flag set.
>> + *
>> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> 
> Leftover, drop this.

Good spot - thanks,


James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-10-02  3:35   ` Fenghua Yu
@ 2025-10-10 16:54     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:54 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 02/10/2025 04:35, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> The ACPI MPAM table uses the UID of a processor container specified in
>> the PPTT to indicate the subset of CPUs and cache topology that can
>> access each MPAM System Component (MSC).
>>
>> This information is not directly useful to the kernel. The equivalent
>> cpumask is needed instead.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
>>
>> CC: Dave Martin <dave.martin@arm.com>
>> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-10-03  0:15   ` Gavin Shan
@ 2025-10-10 16:55     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:55 UTC (permalink / raw)
  To: Gavin Shan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Gavin,

On 03/10/2025 01:15, Gavin Shan wrote:
> On 9/11/25 6:42 AM, James Morse wrote:
>> The ACPI MPAM table uses the UID of a processor container specified in
>> the PPTT to indicate the subset of CPUs and cache topology that can
>> access each MPAM System Component (MSC).
>>
>> This information is not directly useful to the kernel. The equivalent
>> cpumask is needed instead.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.

> With the description for the return value of acpi_pptt_get_cpus_from_container()
> is dropped since that function doesn't have a return value, as mentioned by
> Stanimir Varbanov.
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-10-02  3:55   ` Fenghua Yu
@ 2025-10-10 16:55     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:55 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 02/10/2025 04:55, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> In acpi_count_levels(), the initial value of *levels passed by the
>> caller is really an implementation detail of acpi_count_levels(), so it
>> is unreasonable to expect the callers of this function to know what to
>> pass in for this parameter.  The only sensible initial value is 0,
>> which is what the only upstream caller (acpi_get_cache_info()) passes.
>>
>> Use a local variable for the starting cache level in acpi_count_levels(),
>> and pass the result back to the caller via the function return value.
>>
>> Gid rid of the levels parameter, which has no remaining purpose.
>>
>> Fix acpi_get_cache_info() to match.
>>
>> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>


Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-10-02  4:30   ` Fenghua Yu
@ 2025-10-10 16:55     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:55 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 02/10/2025 05:30, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Other than minor comment issues as follows,
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks,


>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 7af7d62597df..c5f2a51d280b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -904,3 +904,65 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t
>> *cpus)
>>                        entry->length);
>>       }
>>   }
>> +
>> +/*
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later,

> s/,/./

Yup, already fixed. (I need to clean my glasses more often!)


>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
> 
> This comment is not clear. Maybe it's better to say:
> 
> + * If one CPU's L2 is shared with another CPU as L3, this function will return

Sure,


>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
>> + * the cache cannot be found.
>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-10-02  5:03   ` Fenghua Yu
@ 2025-10-10 16:55     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:55 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 02/10/2025 06:03, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>>
>> The driver needs to know which CPUs are associated with the cache.
>> The CPUs may not all be online, so cacheinfo does not have the
>> information.
>>
>> Add a helper to pull this information out of the PPTT.

>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index c5f2a51d280b..c379a9952b00 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -966,3 +966,62 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>> +/**
>> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
>> + *                       specified cache
>> + * @cache_id: The id field of the unified cache
>> + * @cpus: Where to build the cpumask
>> + *
>> + * Determine which CPUs are below this cache in the PPTT. This allows the property
>> + * to be found even if the CPUs are offline.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
>> + */
>> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
>> +{
>> +    u32 acpi_cpu_id;
>> +    int level, cpu, num_levels;
>> +    struct acpi_pptt_cache *cache;
>> +    struct acpi_pptt_cache_v1 *cache_v1;
>> +    struct acpi_pptt_processor *cpu_node;
>> +    struct acpi_table_header *table __free(acpi_table) =
>> acpi_get_table_ret(ACPI_SIG_PPTT, 0);
>> +
>> +    cpumask_clear(cpus);
>> +
>> +    if (IS_ERR(table))
>> +        return -ENOENT;
>> +
>> +    if (table->revision < 3)
>> +        return -ENOENT;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +        cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +        if (WARN_ON_ONCE(!cpu_node))
>> +            continue;
>> +        num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> +        /* Start at 1 for L1 */
>> +        for (level = 1; level <= num_levels; level++) {
>> +            cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +                             ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> +                             level, &cpu_node);
>> +            if (!cache)
>> +                continue;
>> +
>> +            cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> +                        cache,
>> +                        sizeof(struct acpi_pptt_cache));
>> +
>> +            if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> +                cache_v1->cache_id == cache_id)
>> +                cpumask_set_cpu(cpu, cpus);
> 
> This function is almost identical to find_acpi_cache_level_from_id() defined in patch #3.

Yes - there is already a lot of repetition in this file.

I'd previously suggested to Jeremy L to ahve a walker with callbacks, but he felt that
made it harder to read.
Jonathan suggested a for_each_acpi_pptt_entry() helper:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=pptt/for_each_pptt_entry/v0&id=353ceeba3d39c6b6a10eeb1a59c49649cdf719d8

I'm avoiding including that here as its ~30 patches already!


> To reduce code size and complexity, it's better to define a common helper to server both
> of the two functions.
> 
> e.g. define a helper acpi_pptt_get_level_cpumask_from_cache_id(u32 cache_id, int *lvl,
> cpu_mask_t *cpus)
> 
> This helper has the same code body to traverse the cache levels for all CPUs as
> find_acpi_cache_level_from_id() and acpi_pptt_get_cpumask_from_cache_id(). The major
> difference is here:
> 
> +            if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +                cache_v1->cache_id == cache_id) {
> +                if (cpus)
> +                    cpumask_set_cpu(cpu, cpus);
> +                if ((level) {
> +                    *lvl = level;
> +                    return 0;
> +                }
> 
> Then simplify the two callers as follows:
> int find_acpi_cache_level_from_id(u32 cache_id)
> {
>     int level;
>     int err = acpi_pptt_get_level_cpumask_from_cache_id(cache_id, &level, NULL);
>     if (err)
>         return err;
> 
>     return level;
> }
> 
> int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> {
>     return acpi_pptt_get_level_cpumask_from_cache_id(cache_id, NULL, cpus);
> }
> 

You've combined two functions that both walk the table (there are quiet a few more in this
file) - but they look for very different things. Your common helper is going to be much
more complex than either of these standalone.

I think Jonathan's for-each helper is the best path forward, that reduces the boiler plate
leaving the relevant differences.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-10-02  5:06   ` Fenghua Yu
@ 2025-10-10 16:55     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:55 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 02/10/2025 06:06, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> The bulk of the MPAM driver lives outside the arch code because it
>> largely manages MMIO devices that generate interrupts. The driver
>> needs a Kconfig symbol to enable it. As MPAM is only found on arm64
>> platforms, the arm64 tree is the most natural home for the Kconfig
>> option.
>>
>> This Kconfig option will later be used by the arch code to enable
>> or disable the MPAM context-switch code, and to register properties
>> of CPUs with the MPAM driver.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> CC: Dave Martin <dave.martin@arm.com>
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-10-03  0:32   ` Gavin Shan
@ 2025-10-10 16:55     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-10 16:55 UTC (permalink / raw)
  To: Gavin Shan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Gavin,

On 03/10/2025 01:32, Gavin Shan wrote:
> On 9/11/25 6:42 AM, James Morse wrote:
>> The bulk of the MPAM driver lives outside the arch code because it
>> largely manages MMIO devices that generate interrupts. The driver
>> needs a Kconfig symbol to enable it. As MPAM is only found on arm64
>> platforms, the arm64 tree is the most natural home for the Kconfig
>> option.
>>
>> This Kconfig option will later be used by the arch code to enable
>> or disable the MPAM context-switch code, and to register properties
>> of CPUs with the MPAM driver.

> Reviewed-by: Gavin Shan <gshan@redhat.com>

Thanks!

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-09-12 15:55   ` Ben Horgan
@ 2025-10-13 16:29     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-13 16:29 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Ben,

On 12/09/2025 16:55, Ben Horgan wrote:
> On 9/10/25 21:43, James Morse wrote:
>> Bandwidth counters need to run continuously to correctly reflect the
>> bandwidth.
>>
>> The value read may be lower than the previous value read in the case
>> of overflow and when the hardware is reset due to CPU hotplug.
>>
>> Add struct mbwu_state to track the bandwidth counter to allow overflow
>> and power management to be handled.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 1543c33c5d6a..eeb62ed94520 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -990,20 +992,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>>  		mpam_write_monsel_reg(msc, MBWU, 0);
>>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>> +
>> +		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
>> +		if (mbwu_state)
>> +			mbwu_state->prev_val = 0;

> What's the if condition doing here?

Yes, that looks like cruft....
It took the address of an array element - how could it be null?!


> The below could make more sense but I don't think you can get here if
> the allocation fails.

Heh ... only because __allocate_component_cfg() has lost the error value.
Without the outer/inner locking stuff, its feasible for  __allocate_component_cfg() to
return the error value directly.

With that fixed, and ignoring a bogus ctx->mon value - I agree you can't get a case where
this needs checking.


I think this was originally testing if the array had been allocated, and its been folded
wrongly at some point in the past. I assume I kept those bogus tests around as I saw it
blow up with nonsense num_mbwu_mon - which is something I'll retest.


>> +
>>  		break;
>>  	default:
>>  		return;
>>  	}
>>  }

>> @@ -2106,6 +2227,35 @@ static int __allocate_component_cfg(struct mpam_component *comp)
>>  		return -ENOMEM;
>>  	init_garbage(comp->cfg);
>>  
>> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
>> +		if (!vmsc->props.num_mbwu_mon)
>> +			continue;
>> +
>> +		msc = vmsc->msc;
>> +		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
>> +			if (!ris->props.num_mbwu_mon)
>> +				continue;
>> +
>> +			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
>> +					     sizeof(*ris->mbwu_state),
>> +					     GFP_KERNEL);
>> +			if (!mbwu_state) {
>> +				__destroy_component_cfg(comp);
>> +				err = -ENOMEM;
>> +				break;
>> +			}
>> +
>> +			if (mpam_mon_sel_lock(msc)) {
>> +				init_garbage(mbwu_state);
>> +				ris->mbwu_state = mbwu_state;
>> +				mpam_mon_sel_unlock(msc);
>> +			}
> 
> The if statement is confusing now that mpam_mon_sel_lock()
> unconditionally returns true.

Sure, but this and the __must_check means all the paths that use this must be able to
return an error.

This is a churn-or-not trade-off for the inclusion of the firmware-backed support.
I'd prefer it to be hard to add code-paths that are going to create a lot of work when
that comes - especially as folk are promising platforms that need this in the coming months.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks
       [not found]   ` <1f084a23-7211-4291-99b6-7f5192fb9096@nvidia.com>
@ 2025-10-17 18:50     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:50 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 05/10/2025 21:43, Fenghua Yu wrote:
> Hi, James,
> 
> On 9/10/25 13:42, James Morse wrote:
>> When a CPU comes online, it may bring a newly accessible MSC with
>> it. Only the default partid has its value reset by hardware, and
>> even then the MSC might not have been reset since its config was
>> previously dirtyied. e.g. Kexec.
> 
> typo: s/dirtyied/dirtied/
> 
> 
>>
>> Any in-use partid must have its configuration restored, or reset.
>> In-use partids may be held in caches and evicted later.
>>
>> MSC are also reset when CPUs are taken offline to cover cases where
>> firmware doesn't reset the MSC over reboot using UEFI, or kexec
>> where there is no firmware involvement.
>>
>> If the configuration for a RIS has not been touched since it was
>> brought online, it does not need resetting again.
>>
>> To reset, write the maximum values for all discovered controls.
>>
>> CC: Rohit Mathew<Rohit.Mathew@arm.com>
>> Signed-off-by: James Morse<james.morse@arm.com>
> 
> Other than the minor comments,
> 
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks,


> [SNIP]
>> +static void mpam_reset_ris(struct mpam_msc_ris *ris)
>> +{
>> +    u16 partid, partid_max;
>> +
>> +    mpam_assert_srcu_read_lock_held();
>> +
>> +    if (ris->in_reset_state)
>> +        return;
>> +
>> +    spin_lock(&partid_max_lock);
>> +    partid_max = mpam_partid_max;
>> +    spin_unlock(&partid_max_lock);
>> +    for (partid = 0; partid < partid_max; partid++)
> 
>  * Should partid_max be inclusive? So it's "partid < partid_max + 1" here?
> 
> MPAM spec says max PARTID is inclusive: "The range of valid PARTIDs is 0 to the maximum
> PARTID, inclusive. The maximum values of a PARTID implemented by different MSCs need not
> be the same".

Hmmm, if partid_max is 0 then this loop should run only once, so yet.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-09-26 14:48       ` Jonathan Cameron
@ 2025-10-17 18:50         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:50 UTC (permalink / raw)
  To: Jonathan Cameron, Lorenzo Pieralisi, Hanjun Guo
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Jonathan,

On 26/09/2025 15:48, Jonathan Cameron wrote:
>>>> +	char uid[16];
>>>> +	u32 acpi_id;
>>>> +
>>>> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
>>>> +		return 0;
>>>> +
>>>> +	if (table->revision < 1)
>>>> +		return 0;
>>>> +
>>>> +	table_end = (char *)table + table->length;
>>>> +
>>>> +	while (table_offset < table_end) {
>>>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>>>> +		table_offset += tbl_msc->length;
>>>> +
>>>> +		if (table_offset > table_end) {
>>>> +			pr_debug("MSC entry overlaps end of ACPI table\n");
>>>> +			break;  
>>
>>> That this isn't considered an error is a bit subtle and made me wonder
>>> if there was a use of uninitialized pdev (there isn't because err == 0)  
>>
>> Its somewhat a philosophical arguement. I don't expect the kernel to have to validate
>> these tables, they're not provided by the user and there quickly becomes a point where
>> you have to trust them, and they have to be correct.

> Potential buffer overrun is to me always worth error out screaming, but I get your
> broader point.   Maybe just make it a pr_err()

Sure, done.


>> At the other extreme is the asusmption the table is line-noise and we should check
>> everything to avoid out of bounds accesses. Dave wanted the diagnostic messages on these.
>>
>> As this is called from an initcall, the best you get is an inexplicable print message.
>> (what should we say - update your firmware?)
> 
> Depends on whether you can lean hard on the firmware team. Much easier
> for me if I can tell them the board doesn't boot because they got it wrong.
> 
> That would have been safer if we had this upstream in advance of hardware, but indeed
> a little high risk today as who knows what borked tables are out there.
> 
> Personal preference though is to error out on things like this and handle the papering
> over at the top level.  Don't put extra effort into checking tables are invalid
> but if we happen to notice as part of code safety stuff like sizes then good to scream
> about it.
> 
>>
>>
>> Silently failing in this code is always safe as the driver has a count of the number of
>> MSC, and doesn't start accessing the hardware until its found them all.
>> (this is because to find the system wide minimum value - and its not worth starting if
>>  its not possible to finish).
>>
>>
>>> Why not return here?  
>>
>> Just because there was no other return in the loop, and I hate surprise returns.
>>
>> I'll change it if it avoids thinking about how that platform_device_put() gets skipped!
>>
>>
>>>   
>>>> +		}
>>>> +
>>>> +		/*
>>>> +		 * If any of the reserved fields are set, make no attempt to
>>>> +		 * parse the MSC structure. This MSC will still be counted,
>>>> +		 * meaning the MPAM driver can't probe against all MSC, and
>>>> +		 * will never be enabled. There is no way to enable it safely,
>>>> +		 * because we cannot determine safe system-wide partid and pmg
>>>> +		 * ranges in this situation.
>>>> +		 */  
>>
>>> This is decidedly paranoid. I'd normally expect the architecture to be based
>>> on assumption that is fine for old software to ignore new fields.  ACPI itself
>>> has fairly firm rules on this (though it goes wrong sometimes :)  
>>
>> Yeah - the MPAM table isn't properly structured as subtables. I don't see how they are
>> going to extend it if they need to.
>>
>> The paranoia is that anything set in these reserved fields probably indicates something
>> the driver needs to know about: a case in point is the way PCC was added.
>>
>> I'd much prefer we skip creation of MSC devices that have properties we don't understand.
>> acpi_mpam_count_msc() still counts them - which means the driver never finds all the MSC,
>> and never touches the hardware.
>>
>> MPAM isn't a critical feature, its better that it be disabled than make things worse.
>> (the same attitude holds with the response to the MPAM error interrupt - reset everything
>>  and pack up shop. This is bettern than accidentally combining important/unimportant
>>  tasks)
>>
>>
>>> I'm guessing there is something out there that made this necessary though so
>>> keep it if you actually need it.  
>>
>> It's a paranoid/violent reaction to the way PCC was added - without something like this,
>> that would have led to the OS trying to map the 0 page and poking around in it - never
>> likely to go well.
>>
>> Doing this does let them pull another PCC without stable kernels going wrong.
>> Ultimately I think they'll need to replace the table with one that is properly structured.
>> For now - this is working with what we have.

> Fair enough. I'm too lazy / behind with reviews to go scream via our channels about
> problems here.  Paranoia it is.  Maybe we'll end up backporting some 'fixes' that
> ignore nicely added fields with appropriate control bits to turn them on.
> So be it if that happens.

Yup - I'm expecting to at least backport "ignore this feature" patches when the table gets
changed. It's not how its supposed to work, but the missing subtable header wasn't caught
until it was too late.
MPAM isn't a critical feature, so it shouldn't matter too much if old kernels on new
hardware can't use it.


>>>> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
>>>> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
>>>> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
>>>> +			continue;
>>>> +		}
>>>> +
>>>> +		if (!tbl_msc->mmio_size) {
>>>> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
>>>> +			continue;
>>>> +		}
>>>> +
>>>> +		if (decode_interface_type(tbl_msc, &iface)) {
>>>> +			pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
>>>> +			continue;
>>>> +		}
>>>> +
>>>> +		next_res = 0;
>>>> +		next_prop = 0;
>>>> +		memset(res, 0, sizeof(res));
>>>> +		memset(props, 0, sizeof(props));
>>>> +
>>>> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);  
>>>
>>> https://lore.kernel.org/all/20241009124120.1124-13-shiju.jose@huawei.com/
>>> was a proposal to add a DEFINE_FREE() to clean these up.  Might be worth a revisit.
>>> Then Greg was against the use it was put to and asking for an example of where
>>> it helped.  Maybe this is that example.
>>>
>>> If you do want to do that, I'd factor out a bunch of the stuff here as a helper
>>> so we can have the clean ownership pass of a return_ptr().  
>>> Similar to what Shiju did here (this is the usecase for platform device that
>>> Greg didn't like).
>>> https://lore.kernel.org/all/20241009124120.1124-14-shiju.jose@huawei.com/
>>>
>>> Even without that I think factoring some of this out and hence being able to
>>> do returns on errors and put the if (err) into the loop would be a nice
>>> improvement to readability.  
>>
>> If you think its more readable I'll structure it like that.
> 
> The refactor yes. I'd keep clear of the the DEFINE_FREE() unless you have
> some spare time ;)

spare what?



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-10-02  3:21   ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table Fenghua Yu
@ 2025-10-17 18:50     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:50 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich

Hi Fenghua,

On 02/10/2025 04:21, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
>> the MPAM driver with optional discovered data.


>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
>> new file mode 100644
>> index 000000000000..fd9cfa143676
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c

>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>> +                    struct acpi_mpam_resource_node *res)
>> +{
>> +    int level, nid;
>> +    u32 cache_id;
>> +
>> +    switch (res->locator_type) {
>> +    case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>> +        cache_id = res->locator.cache_locator.cache_reference;
>> +        level = find_acpi_cache_level_from_id(cache_id);
>> +        if (level <= 0) {
>> +            pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> 
> Since level could be negative value here, printing it as %u converts it to positive value
> and will cause debug difficulty. For example, -ENOENT returned by
> find_acpi_cache_level_from_id() will be printed as 4294967294(instead of -2) which is hard
> to know the error code.
> 
> Suggest to change this to %d:

Sure, its never meant to be seen!


>             pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);
> 
>> +            return -EINVAL;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>> +                       level, cache_id);
>> +    case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>> +        nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>> +        if (nid == NUMA_NO_NODE)
>> +            nid = 0;
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +                       255, nid);
>> +    default:
>> +        /* These get discovered later and treated as unknown */
>> +        return 0;
>> +    }
>> +}

>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>> +                     struct platform_device *pdev,
>> +                     u32 *acpi_id)
>> +{
>> +    char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
>> +    bool acpi_id_valid = false;
>> +    struct acpi_device *buddy;
>> +    char uid[11];
>> +    int err;
>> +
>> +    memset(&hid, 0, sizeof(hid));
>> +    memcpy(hid, &tbl_msc->hardware_id_linked_device,
>> +           sizeof(tbl_msc->hardware_id_linked_device));
>> +
>> +    if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>> +        *acpi_id = tbl_msc->instance_id_linked_device;
>> +        acpi_id_valid = true;
>> +    }
>> +
>> +    err = snprintf(uid, sizeof(uid), "%u",
>> +               tbl_msc->instance_id_linked_device);
>> +    if (err >= sizeof(uid)) {
> 
> err could be negative error code.

Not here it can't, from lib/vsprintf.c's documentation of snprintf()
| * The return value is the number of characters which would be
| * generated for the given input, excluding the trailing null,
| * as per ISO C99.  If the return is greater than or equal to
| * @size, the resulting string is truncated.

> This error validation only checks size but not error code.
> 
> Better to change it to
> 
>         if (err < 0 || err >= sizeof(uid))
> 


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table
  2025-10-03  0:58   ` Gavin Shan
@ 2025-10-17 18:51     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:51 UTC (permalink / raw)
  To: Gavin Shan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Gavin,

On 03/10/2025 01:58, Gavin Shan wrote:
> On 9/11/25 6:42 AM, James Morse wrote:
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
>> the MPAM driver with optional discovered data.

>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
>> new file mode 100644
>> index 000000000000..fd9cfa143676
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c

>> +static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
>> +                   u32 flags, int *irq,
>> +                   u32 processor_container_uid)
>> +{
>> +    int sense;
>> +
>> +    if (!intid)
>> +        return false;
>> +
>> +    if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
>> +        ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
>> +        return false;
>> +
>> +    sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
>> +
>> +    if (16 <= intid && intid < 32 && processor_container_uid != GLOBAL_AFFINITY) {
>> +        pr_err_once("Partitioned interrupts not supported\n");
>> +        return false;
>> +    }
>> +
>> +    *irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
>> +    if (*irq <= 0) {
>> +        pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
>> +                intid);
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
> 
> 0 is allowed by acpi_register_gsi().
> 
>     if (*irq < 0) {
>         pr_err_once(...);
>         return false;
>     }

Really? I thought irq-zero was nonsense.
acpi_register_gsi() does this:
|        irq = irq_create_fwspec_mapping(&fwspec);
|        if (!irq)
|                return -EINVAL;
|
|        return irq;


>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>> +                    struct acpi_mpam_resource_node *res)
>> +{
>> +    int level, nid;
>> +    u32 cache_id;
>> +
>> +    switch (res->locator_type) {
>> +    case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>> +        cache_id = res->locator.cache_locator.cache_reference;
>> +        level = find_acpi_cache_level_from_id(cache_id);
>> +        if (level <= 0) {
>> +            pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
>> +            return -EINVAL;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>> +                       level, cache_id);
>> +    case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>> +        nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>> +        if (nid == NUMA_NO_NODE)
>> +            nid = 0;
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +                       255, nid);
> 
> It's perhaps worthy a warning message when @nid is explicitly set to zero due to
> the bad proximity domain, something like below.
> 
>         if (nid == NUMA_NO_NODE) {
>             nid = 0;
>             if (num_possible_nodes() > 1) {
>                 pr_warn("Bad proximity domain %d, mapped to node 0\n",
>                     res->locator.memory_locator.proximity_domain);
>             }
>         }

This was to catch the case where you build the kernel without NUMA support - which
wouldn't be an error. But that returns 0 instead of NUMA_NO_NODE, so NUMA_NO_NODE only
occurs when there is a bug. I'l add this - but it'll be a pr_debug() as the message is
only of use to about 4 people!


>> +    default:
>> +        /* These get discovered later and treated as unknown */
>> +        return 0;
>> +    }
>> +}

>> +int acpi_mpam_count_msc(void)
>> +{
>> +    struct acpi_table_header *table __free(acpi_table) =
>> acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +    char *table_end, *table_offset = (char *)(table + 1);
>> +    struct acpi_mpam_msc_node *tbl_msc;
>> +    int count = 0;
>> +
>> +    if (IS_ERR(table))
>> +        return 0;
>> +
>> +    if (table->revision < 1)
>> +        return 0;
>> +
>> +    table_end = (char *)table + table->length;
>> +
>> +    while (table_offset < table_end) {
>> +        tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +        if (!tbl_msc->mmio_size)
>> +            continue;
>> +
>> +        if (tbl_msc->length < sizeof(*tbl_msc))
>> +            return -EINVAL;
>> +        if (tbl_msc->length > table_end - table_offset)
>> +            return -EINVAL;
>> +        table_offset += tbl_msc->length;
>> +
>> +        count++;
>> +    }
>> +
>> +    return count;
>> +}

> acpi_mpam_count_msc() iterates the existing MSC node, which is part of acpi_mpam_parse().
> So the question is why we can't drop acpi_mpam_count_msc() and maintain a variable to
> count the existing MSC nodes in acpi_mpam_parse() ?

Once the platform device has been created, the driver's probe call can be called, and that
needs to know how many MSC there are going to be. Doing it like this means we don't depend
on the driver probe function not being called until we got to the end of the list.
(the comments about the initcall dependencies are already annoying - I prefer not to add
any that are specific to MPAM).

This also lets us catch non-backward compatible ACPI table changes, which has already
happened with PCC. acpi_mpam_parse() skips MSC with reserved fields set,
acpi_mpam_count_msc() does not - this means we are guaranteed the values will mismatch
if any of those reserved fields are set, and the driver will never try to touch the hardware.


Thanks,

James



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-26 14:55       ` Jonathan Cameron
@ 2025-10-17 18:51         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:51 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich

Hi Jonathan,

On 26/09/2025 15:55, Jonathan Cameron wrote:
>>>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>>>> new file mode 100644
>>>> index 000000000000..7c63d590fc98
>>>> --- /dev/null
>>>> +++ b/drivers/resctrl/mpam_internal.h
>>>> @@ -0,0 +1,65 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +// Copyright (C) 2025 Arm Ltd.
>>>> +
>>>> +#ifndef MPAM_INTERNAL_H
>>>> +#define MPAM_INTERNAL_H
>>>> +
>>>> +#include <linux/arm_mpam.h>
>>>> +#include <linux/cpumask.h>
>>>> +#include <linux/io.h>
>>>> +#include <linux/mailbox_client.h>
>>>> +#include <linux/mutex.h>  
>>>
>>> spinlock.h  
>>
>> Fixed,
>>
>>
>>>> +#include <linux/resctrl.h>  
>>>
>>> Not spotting anything rsctl yet.  So maybe this belongs later.  
>>
>> There shouldn't be anything that depends on resctrl in this series - looks like
>> this is a 2018 era bug in the way I carved this up!
>>
>>
>>>> +#include <linux/sizes.h>
>>>> +
>>>> +struct mpam_msc {
>>>> +	/* member of mpam_all_msc */
>>>> +	struct list_head        all_msc_list;
>>>> +
>>>> +	int			id;  
>>>
>>> I'd follow (approx) include what you use principles to make later header
>>> shuffling easier. So a forward def for this.  
>>
>> -ENOPARSE
>>
>> I'm sure I'll work this out from your later comments.
> 
> I missed on the comment (I think). Would have made more sense a line later.
> Add a forwards def
> 
> struct platform_device;
> 
> as no reason to include the appropriate header
> (and you didn't anwyay).
> 

Right - gotcha.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-03  3:53   ` Gavin Shan
@ 2025-10-17 18:51     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:51 UTC (permalink / raw)
  To: Gavin Shan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich

Hi Gavin,

On 03/10/2025 04:53, Gavin Shan wrote:
> Hi James,
> 
> On 9/11/25 6:42 AM, James Morse wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644
>> index 000000000000..efc4738e3b4d
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c

>> +/*
>> + * An MSC can control traffic from a set of CPUs, but may only be accessible
>> + * from a (hopefully wider) set of CPUs. The common reason for this is power
>> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
>> + * corresponding cache may also be powered off. By making accesses from
>> + * one of those CPUs, we ensure this isn't the case.
>> + */
>> +static int update_msc_accessibility(struct mpam_msc *msc)
>> +{
>> +    u32 affinity_id;
>> +    int err;
>> +
>> +    err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
>> +                       &affinity_id);
>> +    if (err)
>> +        cpumask_copy(&msc->accessibility, cpu_possible_mask);
>> +    else
>> +        acpi_pptt_get_cpus_from_container(affinity_id,
>> +                          &msc->accessibility);
>> +
>> +    return 0;
>> +
>> +    return err;
>> +}
>> +
> 
> Double return here and different values have been returned. I think here we
> need "return err". In this case, we needn't copy @cpu_possible_mask on error
> because the caller mpam_msc_drv_probe() will release the MSC instance.

This was the botched removal of the DT support. I'm surprised the compiler is so
forgiving. (already pointed out and already fixed)


>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +    int err;
>> +    struct mpam_msc *msc;
>> +    struct resource *msc_res;
>> +    struct device *dev = &pdev->dev;
>> +    void *plat_data = pdev->dev.platform_data;
>> +
>> +    mutex_lock(&mpam_list_lock);
>> +    do {
>> +        msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +        if (!msc) {
>> +            err = -ENOMEM;
>> +            break;
>> +        }
>> +
>> +        mutex_init(&msc->probe_lock);
>> +        mutex_init(&msc->part_sel_lock);
>> +        mutex_init(&msc->outer_mon_sel_lock);
>> +        raw_spin_lock_init(&msc->inner_mon_sel_lock);
>> +        msc->id = pdev->id;
>> +        msc->pdev = pdev;
>> +        INIT_LIST_HEAD_RCU(&msc->all_msc_list);
>> +        INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +        err = update_msc_accessibility(msc);
>> +        if (err)
>> +            break;
>> +        if (cpumask_empty(&msc->accessibility)) {
>> +            dev_err_once(dev, "MSC is not accessible from any CPU!");
>> +            err = -EINVAL;
>> +            break;
>> +        }
>> +
> 
> This check (cpumask_empty()) would be part of update_msc_accessibility() since
> msc->accessibility is sorted out in that function where it should be validated.


Could be - but isn't. This is because with the DT support in update_msc_accessibility()
that function is more complex, and its simpler to get the caller to check things like
this.

Even if no-one ever gets DT support upstream, I don't think this matters.


>> +        if (device_property_read_u32(&pdev->dev, "pcc-channel",
>> +                         &msc->pcc_subspace_id))
>> +            msc->iface = MPAM_IFACE_MMIO;
>> +        else
>> +            msc->iface = MPAM_IFACE_PCC;
>> +
>> +        if (msc->iface == MPAM_IFACE_MMIO) {
>> +            void __iomem *io;
>> +
>> +            io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +                                    &msc_res);
>> +            if (IS_ERR(io)) {
>> +                dev_err_once(dev, "Failed to map MSC base address\n");
>> +                err = PTR_ERR(io);
>> +                break;
>> +            }
>> +            msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +            msc->mapped_hwpage = io;
>> +        }
>> +
>> +        list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
>> +        platform_set_drvdata(pdev, msc);
>> +    } while (0);
>> +    mutex_unlock(&mpam_list_lock);
>> +
>> +    if (!err) {
>> +        /* Create RIS entries described by firmware */
>> +        err = acpi_mpam_parse_resources(msc, plat_data);
>> +    }
>> +
>> +    if (err && msc)
>> +        mpam_msc_drv_remove(pdev);
>> +
>> +    if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
>> +        pr_info("Discovered all MSC\n");
>> +
>> +    return err;
>> +}


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-09-26 18:15       ` Markus Elfring
@ 2025-10-17 18:51         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:51 UTC (permalink / raw)
  To: Markus Elfring, linux-acpi, linux-arm-kernel
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang, Ben Horgan,
	Carl Worth, Catalin Marinas, D Scott Phillips, Danilo Krummrich,
	Dave Martin, David Hildenbrand, Drew Fustini, Fenghua Yu,
	Greg Kroah-Hartman, Hanjun Guo, Jamie Iles, Jonathan Cameron,
	Koba Ko, Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rob Herring, Rohit Mathew, Shanker Donthineni,
	Sudeep Holla, Shaopeng Tan, Wang ShaoBo, Will Deacon, Xin Hao

Hi Markus,

On 26/09/2025 19:15, Markus Elfring wrote:
>>> …
>>>> +++ b/drivers/resctrl/mpam_devices.c
>>> …
>>>>> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>>> +		    enum mpam_class_types type, u8 class_id, int component_id)
>>>> +{
>>>> +	int err;
>>>> +
>>>> +	mutex_lock(&mpam_list_lock);
>>>> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
>>>> +				     component_id);
>>>> +	mutex_unlock(&mpam_list_lock);
>>> …
>>>
>>> Under which circumstances would you become interested to apply a statement
>>> like “guard(mutex)(&mpam_list_lock);”?
>>> https://elixir.bootlin.com/linux/v6.17-rc5/source/include/linux/mutex.h#L228
>>
>> None! The bit of this you cut out is a call to mpam_free_garbage() which calls
>> synchronize_srcu(). That may sleep for a while. The whole point of the deferred free-ing
>> is it does not happen under the lock. The 'guard' magic means the compiler gets to choose
>> when to call unlock.
> 
> How does this feedback fit to the proposed addition of a mutex_lock()/mutex_unlock()
> call combination (which might be achievable also with another programming interface)?

Right - I've muddled the horde of "must use guard srcu" with the horde of "must use guard
mutex". In this case I'd still prefer we don't spuriously hold the write side lock when
doing the deferred free-ing.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-10-03 16:54   ` Fenghua Yu
@ 2025-10-17 18:51     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:51 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Hi Fenghua,

On 03/10/2025 17:54, Fenghua Yu wrote:
> On 9/10/25 13:42, James Morse wrote:
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> Add support for creating and destroying structures to allow a hierarchy
>> of resources to be created.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index efc4738e3b4d..c7f4981b3545 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c

>> +#define add_to_garbage(x)                \
>> +do {                            \
>> +    __typeof__(x) _x = (x);                \
>> +    _x->garbage.to_free = _x;            \
>> +    llist_add(&_x->garbage.llist, &mpam_garbage);    \
>> +} while (0)
>> +static void mpam_free_garbage(void)
>> +{
>> +    struct mpam_garbage *iter, *tmp;
>> +    struct llist_node *to_free = llist_del_all(&mpam_garbage);
>> +
> 
> Should this be protected by mpam_list_lock and check if the lock is held?
> 
> +    lockdep_assert_held(&mpam_list_lock);
> 
> Multiple threads may add and free garbage in parallel. Please see later free_garbage() is
> not protected by any lock.

Indeed - because its using llist instead. That is safe for concurrent use as you can only
consume the whole list in one go with a cmpxchg(val, NULL). In the event of a race, one
gets to own the llist and walk throught it - the other sees an empty list.


> 
>> +    if (!to_free)
>> +        return;
>> +
>> +    synchronize_srcu(&mpam_srcu);
>> +
>> +    llist_for_each_entry_safe(iter, tmp, to_free, llist) {
>> +        if (iter->pdev)
>> +            devm_kfree(&iter->pdev->dev, iter->to_free);
>> +        else
>> +            kfree(iter->to_free);
>> +    }
>> +}

>> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>> +                  enum mpam_class_types type, u8 class_id,
>> +                  int component_id)
>> +{
>> +    int err;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +    struct mpam_class *class;
>> +    struct mpam_component *comp;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
>> +        return -EINVAL;
>> +
>> +    if (test_and_set_bit(ris_idx, &msc->ris_idxs))
>> +        return -EBUSY;
>> +
> 
> Should setting msc->ris_idxs bit be moved to the end of this function after all error
> handling paths? The reason is this bit is better to be 0 (or recovered) if any error
> happens. It's hard to recover it to 0 for each error handling. The easiest way is to set
> it at the end of the function.

This is an up front test for firmware tables that describe one RIS twice.
No error recovery is needed as this is all this bitfield is used for.


>> +    ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
>> +    if (!ris)
>> +        return -ENOMEM;
>> +    init_garbage(ris);
>> +
>> +    class = mpam_class_get(class_id, type);
>> +    if (IS_ERR(class))
>> +        return PTR_ERR(class);
>> +
>> +    comp = mpam_component_get(class, component_id);
>> +    if (IS_ERR(comp)) {
>> +        if (list_empty(&class->components))
>> +            mpam_class_destroy(class);
>> +        return PTR_ERR(comp);
>> +    }
>> +
>> +    vmsc = mpam_vmsc_get(comp, msc);
>> +    if (IS_ERR(vmsc)) {
>> +        if (list_empty(&comp->vmsc))
>> +            mpam_comp_destroy(comp);
>> +        return PTR_ERR(vmsc);
>> +    }
>> +
>> +    err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
>> +    if (err) {
>> +        if (list_empty(&vmsc->ris))
>> +            mpam_vmsc_destroy(vmsc);
>> +        return err;
>> +    }
>> +
>> +    ris->ris_idx = ris_idx;
>> +    INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> 
> vmsc_list will be used but not initialized. Missing INIT_LIST_HEAD_RCU(&ris->msc_list) here?

Fixed,


>> +    ris->vmsc = vmsc;
>> +
>> +    cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>> +    cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>> +    list_add_rcu(&ris->vmsc_list, &vmsc->ris);
>> +

> Setting the msc->ris_idxs here is better to avoid to clear it in each error handling path.

But misses the error it is supposed to catch...


>> +    return 0;
>> +}
>> +
>> @@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>>           return;
>>         mutex_lock(&mpam_list_lock);
>> -    platform_set_drvdata(pdev, NULL);
>> -    list_del_rcu(&msc->all_msc_list);
>> -    synchronize_srcu(&mpam_srcu);
>> +    mpam_msc_destroy(msc);
>>       mutex_unlock(&mpam_list_lock);
>> +
>> +    mpam_free_garbage();
> 
> Should mpam_free_garbage() be protected by mpam_list_lock? It may race with adding
> garbage. I can see other adding and freeing garbage are protected by mpam_list_lock but
> not this one.

No - it uses llist and is part of the deferred freeing, it should not need any locks.


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-10-06 23:13   ` Gavin Shan
@ 2025-10-17 18:51     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:51 UTC (permalink / raw)
  To: Gavin Shan, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan

Hi Gavin,

On 07/10/2025 00:13, Gavin Shan wrote:
> On 9/11/25 6:42 AM, James Morse wrote:
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> Add support for creating and destroying structures to allow a hierarchy
>> of resources to be created.

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index efc4738e3b4d..c7f4981b3545 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c

>> +static struct mpam_class *
>> +mpam_class_get(u8 level_idx, enum mpam_class_types type)
>> +{
>> +    bool found = false;
>> +    struct mpam_class *class;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_for_each_entry(class, &mpam_classes, classes_list) {
>> +        if (class->type == type && class->level == level_idx) {
>> +            found = true;
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (found)
>> +        return class;
>> +
>> +    return mpam_class_alloc(level_idx, type);
>> +}
>> +
> 
> The variable @found can be avoided if the found class can be returned immediately.
> 
>     list_for_each_entry(class, &mpam_classes, classes_list) {
>         if (class->type == type && class->level == level_idx)
>             return class;
>     }
> 
>     return mpam_class_alloc(level_idx, type);

Yes, feedback like this already came from Jonathan.


>> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
>> +                 enum mpam_class_types type,
>> +                 struct mpam_class *class,
>> +                 struct mpam_component *comp)
>> +{
>> +    int err;
>> +
>> +    switch (type) {
>> +    case MPAM_CLASS_CACHE:
>> +        err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
>> +                             affinity);
>> +        if (err)
>> +            return err;
>> +
>> +        if (cpumask_empty(affinity))
>> +            pr_warn_once("%s no CPUs associated with cache node",
>> +                     dev_name(&msc->pdev->dev));
>> +
>> +        break;
> 
> "\n" missed in the error message and dev_warn_once() can be used:
> 
>         if (cpumask_empty(affinity))
>             dev_warn_once(&msc->pdev->dev, "No CPUs associated with cache node\n");

Yup, I'd mopped up most of these but missed this one.


>> +    case MPAM_CLASS_MEMORY:
>> +        get_cpumask_from_node_id(comp->comp_id, affinity);
>> +        /* affinity may be empty for CPU-less memory nodes */
>> +        break;
>> +    case MPAM_CLASS_UNKNOWN:
>> +        return 0;
>> +    }
>> +
>> +    cpumask_and(affinity, affinity, &msc->accessibility);
>> +
>> +    return 0;
>> +}


Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-09-11 15:00   ` Jonathan Cameron
@ 2025-10-17 18:53     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-10-17 18:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan

Hi Jonathan,

On 11/09/2025 16:00, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:49 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
>> (MSCs) with an identity/configuration page.
>>
>> Add the definitions for these registers as offset within the page(s).

> I'm not sure why some things ended up in this patch and others didn't.
> MPAMCFG_EN for example isn't here.

Things were added once I'd already written this, and I only updated it with 'new' features
where they were actually useful for feature parity with resctrl/Intel-RDT.


> If doing a separate 'register defines' patch I'd do the lot as of
> the current spec.

I've not done this because its a time sink for no benefit. The kernel doesn't use any of
the 'missing' features. While I agree it would be nice if the list were up to date - it
will become stale pretty quickly, so its not an achievable goal...


> 
>>
>> Link: https://developer.arm.com/documentation/ihi0099/latest/
> 
> Maybe link a specific version? I'm not sure if I'm looking at is the same one
> as you were when you wrote this. That will become worse over time.  I'm definitely
> seeing extra bits in a number of registers.
> 
> I'm lazy enough not to go see if the cover letter calls out a version.
> 
> Anyhow, various small things on ordering that would have made this easier to review
> against the spec.



>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 02e9576ece6b..109f03df46c2 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -152,4 +152,271 @@ extern struct list_head mpam_classes;
>>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>>  				   cpumask_t *affinity);
>>  
>> +/*
>> + * MPAM MSCs have the following register layout. See:
>> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
>> + * Component Specification.
>> + * https://developer.arm.com/documentation/ihi0099/latest/
> 
> Maybe be friendly and give some section number references.

Heh, linking to the 'latest' means those will change...


> 
>> + */
>> +#define MPAM_ARCHITECTURE_V1    0x10
>> +
>> +/* Memory mapped control pages: */
>> +/* ID Register offsets in the memory mapped page */
>> +#define MPAMF_IDR		0x0000  /* features id register */
>> +#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
> 
> Any reason this one is out of order with respect to the addresses?

No - I must have been going mad!


>> +#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
>> +#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
>> +#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
>> +#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
>> +#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
>> +#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
>> +#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
>> +#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
>> +#define MPAMF_IIDR		0x0018  /* implementer id register */
>> +#define MPAMF_AIDR		0x0020  /* architectural id register */
> 
> These 3 as well. I'm not sure what the ordering is conveying but probably easier to just
> to put them in address order.
> 
> There are some other cases of this below.

... I reckon the ones in funny places were the ones that the original FVP supported
i.e. only the mandatory ones, which wasn't particularly useful.



>> +/* MPAMF_IIDR - MPAM implementation ID register */
>> +#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
>> +#define MPAMF_IIDR_PRODUCTID_SHIFT	20
>> +#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
>> +#define MPAMF_IIDR_VARIANT_SHIFT	16
>> +#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
>> +#define MPAMF_IIDR_REVISON_SHIFT	12
>> +#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
>> +#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0

> I'd expect to see FIELD_GET/ PREP rather than use of shifts. Can we drop the defines?

Sure,

> Pick an order for reg field definitions. Until here they've been low to high.

I think I've got that more consistent now...


>> +/* Error conditions in accessing memory mapped registers */
>> +#define MPAM_ERRCODE_NONE			0
>> +#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
>> +#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
>> +#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
>> +#define MPAM_ERRCODE_REQ_PMG_RANGE		4
>> +#define MPAM_ERRCODE_MONITOR_RANGE		5
>> +#define MPAM_ERRCODE_INTPARTID_RANGE		6
>> +#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
> 
> Seems there are more in latest spec..

Yup, it the frequent game of spot-the-difference.
I've updated that as part of your other feedback.


>> +
>> +/*
>> + * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
>> + *                    usage monitor control register
>> + * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
>> + *                     bandwidth usage monitor control register
>> + */
>> +#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
>> +#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
>> +#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
>> +#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
>> +#define MSMON_CFG_x_CTL_SCLEN			BIT(19)

> On the spec I'm looking at this is reserved in CSU_CTL

It's only defined for MSMON: "Value scaling enable",. I'll move it after the
MSMON_CFG_MBWU_CTL_TYPE_MBWU define below.


>> +#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
>> +#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
>> +#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
>> +#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
>> +#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
>> +#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
>> +#define MSMON_CFG_x_CTL_EN			BIT(31)

> I guess this combining of definitions will show some advante in common code
> later but right now it seems confusing given not all bits are present in both.

When I started these were the same!

It is dealt with in common code, I don't think any of the bits that are different are
used by the driver.



Thanks,

James


^ permalink raw reply	[flat|nested] 200+ messages in thread

end of thread, other threads:[~2025-10-17 18:54 UTC | newest]

Thread overview: 200+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 20:42 [PATCH v2 00/29] arm_mpam: Add basic mpam driver James Morse
2025-09-10 20:42 ` [PATCH v2 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
2025-09-11 10:43   ` Jonathan Cameron
2025-09-11 10:48     ` Jonathan Cameron
2025-09-19 16:10     ` James Morse
2025-09-25  9:32   ` Stanimir Varbanov
2025-10-10 16:54     ` James Morse
2025-10-02  3:35   ` Fenghua Yu
2025-10-10 16:54     ` James Morse
2025-10-03  0:15   ` Gavin Shan
2025-10-10 16:55     ` James Morse
2025-09-10 20:42 ` [PATCH v2 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
2025-09-11 10:46   ` Jonathan Cameron
2025-09-19 16:10     ` James Morse
2025-09-11 14:08   ` Ben Horgan
2025-09-19 16:10     ` James Morse
2025-10-02  3:55   ` Fenghua Yu
2025-10-10 16:55     ` James Morse
2025-10-03  0:17   ` Gavin Shan
2025-09-10 20:42 ` [PATCH v2 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
2025-09-11 10:59   ` Jonathan Cameron
2025-09-19 16:10     ` James Morse
2025-09-11 15:27   ` Lorenzo Pieralisi
2025-09-19 16:10     ` James Morse
2025-10-02  4:30   ` Fenghua Yu
2025-10-10 16:55     ` James Morse
2025-10-03  0:23   ` Gavin Shan
2025-09-10 20:42 ` [PATCH v2 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
2025-09-11 11:06   ` Jonathan Cameron
2025-09-19 16:10     ` James Morse
2025-10-02  5:03   ` Fenghua Yu
2025-10-10 16:55     ` James Morse
2025-09-10 20:42 ` [PATCH v2 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
2025-09-12 10:14   ` Ben Horgan
2025-10-02  5:06   ` Fenghua Yu
2025-10-10 16:55     ` James Morse
2025-10-03  0:32   ` Gavin Shan
2025-10-10 16:55     ` James Morse
2025-09-10 20:42 ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table James Morse
2025-09-11 13:17   ` Jonathan Cameron
2025-09-19 16:11     ` James Morse
2025-09-26 14:48       ` Jonathan Cameron
2025-10-17 18:50         ` James Morse
2025-09-11 14:56   ` Lorenzo Pieralisi
2025-09-19 16:11     ` James Morse
2025-09-16 13:17   ` [PATCH] arm_mpam: Try reading again if MPAM instance returns not ready Zeng Heng
2025-09-19 16:11     ` James Morse
2025-09-20 10:14       ` Zeng Heng
2025-10-02  3:21   ` [PATCH v2 06/29] ACPI / MPAM: Parse the MPAM table Fenghua Yu
2025-10-17 18:50     ` James Morse
2025-10-03  0:58   ` Gavin Shan
2025-10-17 18:51     ` James Morse
2025-09-10 20:42 ` [PATCH v2 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
2025-09-11 13:35   ` Jonathan Cameron
2025-09-23 16:41     ` James Morse
2025-09-26 14:55       ` Jonathan Cameron
2025-10-17 18:51         ` James Morse
2025-09-17 11:03   ` Ben Horgan
2025-09-29 17:44     ` James Morse
2025-10-03  3:53   ` Gavin Shan
2025-10-17 18:51     ` James Morse
2025-09-10 20:42 ` [PATCH v2 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
2025-09-11 14:22   ` Jonathan Cameron
2025-09-26 17:52     ` James Morse
2025-09-11 16:30   ` Markus Elfring
2025-09-26 17:52     ` James Morse
2025-09-26 18:15       ` Markus Elfring
2025-10-17 18:51         ` James Morse
2025-10-03 16:54   ` Fenghua Yu
2025-10-17 18:51     ` James Morse
2025-10-06 23:13   ` Gavin Shan
2025-10-17 18:51     ` James Morse
2025-09-10 20:42 ` [PATCH v2 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
2025-09-11 15:00   ` Jonathan Cameron
2025-10-17 18:53     ` James Morse
2025-09-12  7:33   ` Markus Elfring
2025-10-06 23:25   ` Gavin Shan
2025-09-10 20:42 ` [PATCH v2 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
2025-09-11 15:07   ` Jonathan Cameron
2025-09-29 17:44     ` James Morse
2025-09-12 10:42   ` Ben Horgan
2025-09-29 17:44     ` James Morse
2025-10-03 17:56   ` Fenghua Yu
2025-10-06 23:42   ` Gavin Shan
2025-09-10 20:42 ` [PATCH v2 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
2025-09-11 15:18   ` Jonathan Cameron
2025-09-29 17:44     ` James Morse
2025-09-12 11:11   ` Ben Horgan
2025-09-29 17:44     ` James Morse
2025-10-03 18:58   ` Fenghua Yu
2025-09-10 20:42 ` [PATCH v2 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
2025-09-11 15:24   ` Jonathan Cameron
2025-09-29 17:44     ` James Morse
2025-09-11 15:31   ` Ben Horgan
2025-09-29 17:44     ` James Morse
2025-10-05  0:09   ` Fenghua Yu
2025-09-10 20:42 ` [PATCH v2 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
2025-09-11 15:29   ` Jonathan Cameron
2025-09-29 17:45     ` James Morse
2025-09-11 15:37   ` Ben Horgan
2025-09-29 17:45     ` James Morse
2025-09-30 13:32       ` Ben Horgan
2025-10-05  0:53   ` Fenghua Yu
2025-09-10 20:42 ` [PATCH v2 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
2025-09-12 11:49   ` Jonathan Cameron
2025-09-29 17:45     ` James Morse
2025-10-05  1:28   ` Fenghua Yu
2025-09-10 20:42 ` [PATCH v2 15/29] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
2025-09-12 11:25   ` Ben Horgan
2025-09-12 14:52     ` Ben Horgan
2025-09-30 17:06       ` James Morse
2025-09-30 17:06     ` James Morse
2025-09-12 11:55   ` Jonathan Cameron
2025-09-30 17:06     ` James Morse
2025-09-30  2:51   ` Shaopeng Tan (Fujitsu)
2025-10-01  9:51     ` James Morse
     [not found]   ` <1f084a23-7211-4291-99b6-7f5192fb9096@nvidia.com>
2025-10-17 18:50     ` James Morse
2025-09-10 20:42 ` [PATCH v2 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
2025-09-12 11:57   ` Jonathan Cameron
2025-10-01  9:50     ` James Morse
2025-10-05 21:08   ` Fenghua Yu
2025-09-10 20:42 ` [PATCH v2 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
2025-09-12 11:42   ` Ben Horgan
2025-10-02 18:02     ` James Morse
2025-09-12 12:02   ` Jonathan Cameron
2025-09-30 17:06     ` James Morse
2025-09-25  7:16   ` Fenghua Yu
2025-10-02 18:02     ` James Morse
2025-09-10 20:42 ` [PATCH v2 18/29] arm_mpam: Register and enable IRQs James Morse
2025-09-12 12:12   ` Jonathan Cameron
2025-10-02 18:02     ` James Morse
2025-09-12 14:40   ` Ben Horgan
2025-10-02 18:03     ` James Morse
2025-09-12 15:22   ` Dave Martin
2025-10-03 18:02     ` James Morse
2025-09-25  6:33   ` Fenghua Yu
2025-10-03 18:03     ` James Morse
2025-09-10 20:42 ` [PATCH v2 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
2025-09-12 12:13   ` Jonathan Cameron
2025-10-03 18:03     ` James Morse
2025-09-12 14:42   ` Ben Horgan
2025-10-03 18:03     ` James Morse
2025-09-26  2:31   ` Fenghua Yu
2025-10-03 18:04     ` James Morse
2025-09-10 20:43 ` [PATCH v2 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
2025-09-12 12:22   ` Jonathan Cameron
2025-10-07 11:11     ` James Morse
2025-09-12 15:00   ` Ben Horgan
2025-09-25  6:53   ` Fenghua Yu
2025-10-03 18:04     ` James Morse
2025-09-10 20:43 ` [PATCH v2 21/29] arm_mpam: Probe and reset the rest of the features James Morse
2025-09-12 13:07   ` Jonathan Cameron
2025-10-03 18:05     ` James Morse
2025-09-10 20:43 ` [PATCH v2 22/29] arm_mpam: Add helpers to allocate monitors James Morse
2025-09-12 13:11   ` Jonathan Cameron
2025-10-06 14:57     ` James Morse
2025-10-06 15:56     ` James Morse
2025-09-10 20:43 ` [PATCH v2 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
2025-09-11 15:46   ` Ben Horgan
2025-09-12 15:08     ` Ben Horgan
2025-10-06 16:00       ` James Morse
2025-10-06 15:59     ` James Morse
2025-09-12 13:21   ` Jonathan Cameron
2025-10-09 17:48     ` James Morse
2025-09-25  2:30   ` Fenghua Yu
2025-10-09 17:48     ` James Morse
2025-09-10 20:43 ` [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
2025-09-12 13:24   ` Jonathan Cameron
2025-10-09 17:48     ` James Morse
2025-09-12 15:55   ` Ben Horgan
2025-10-13 16:29     ` James Morse
2025-09-10 20:43 ` [PATCH v2 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
2025-09-12 13:27   ` Jonathan Cameron
2025-10-09 17:48     ` James Morse
2025-09-10 20:43 ` [PATCH v2 26/29] arm_mpam: Use long MBWU counters if supported James Morse
2025-09-12 13:29   ` Jonathan Cameron
2025-10-10 16:53     ` James Morse
2025-09-26  4:51   ` Fenghua Yu
2025-09-10 20:43 ` [PATCH v2 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
2025-09-12 13:33   ` Jonathan Cameron
2025-10-10 16:53     ` James Morse
2025-09-18  2:35   ` Shaopeng Tan (Fujitsu)
2025-10-10 16:53     ` James Morse
2025-09-26  4:11   ` Fenghua Yu
2025-10-10 16:53     ` James Morse
2025-09-10 20:43 ` [PATCH v2 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
2025-09-12 13:37   ` Jonathan Cameron
2025-10-10 16:53     ` James Morse
2025-09-12 16:06   ` Ben Horgan
2025-10-10 16:53     ` James Morse
2025-09-26  2:35   ` Fenghua Yu
2025-10-10 16:53     ` James Morse
2025-09-10 20:43 ` [PATCH v2 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
2025-09-12 13:41   ` Jonathan Cameron
2025-10-10 16:54     ` James Morse
2025-09-12 16:01   ` Ben Horgan
2025-10-10 16:54     ` James Morse
2025-09-26  2:36   ` Fenghua Yu
2025-10-10 16:54     ` James Morse
2025-09-25  7:18 ` [PATCH v2 00/29] arm_mpam: Add basic mpam driver Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).