linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/29] arm_mpam: Add basic mpam driver
@ 2025-10-17 18:56 James Morse
  2025-10-17 18:56 ` [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
                   ` (30 more replies)
  0 siblings, 31 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hello,

A slew of minor changes, nothing really sticks out.
Changes are noted on each patch.

~

This is just enough MPAM driver for ACPI. DT got ripped out. If you need DT
support - please share your DTS so the DT folk know the binding is what is
needed.
This doesn't contain any of the resctrl code, meaning you can't actually drive it
from user-space yet. Because of that, its hidden behind CONFIG_EXPERT.
This will change once the user interface is connected up.

This is the initial group of patches that allows the resctrl code to be built
on top. Including that will increase the number of trees that may need to
coordinate, so breaking it up make sense.

The locking got simplified, but is still strange - this is because of the 'mpam-fb'
firmware interface specification that is still alpha. That thing needs to wait for
an interrupt after every system register write, which significantly impacts the
driver. Some features just won't work, e.g. reading the monitor registers via
perf.

I've not found a platform that can test all the behaviours around the monitors,
so this is where I'd expect the most bugs.

The MPAM spec that describes all the system and MMIO registers can be found here:
https://developer.arm.com/documentation/ddi0598/db/?lang=en
(Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
 This document has the best overview)

The expectation is this will go via the arm64 tree.


This series is based on v6.18-rc4, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v3

The rest of the driver can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1

What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/

This series was previously posted here:
[v2] lore.kernel.org/r/20250910204309.20751-1-james.morse@arm.com
[v1] lore.kernel.org/r/20250822153048.2287-1-james.morse@arm.com
[RFC] lore.kernel.org/r/20250711183648.30766-2-james.morse@arm.com


James Morse (27):
  ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear
    levels
  ACPI / PPTT: Find cache level by cache-id
  ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  arm64: kconfig: Add Kconfig entry for MPAM
  ACPI / MPAM: Parse the MPAM table
  arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  arm_mpam: Add the class and component structures for firmware
    described ris
  arm_mpam: Add MPAM MSC register layout definitions
  arm_mpam: Add cpuhp callbacks to probe MSC hardware
  arm_mpam: Probe hardware to find the supported partid/pmg values
  arm_mpam: Add helpers for managing the locking around the mon_sel
    registers
  arm_mpam: Probe the hardware features resctrl supports
  arm_mpam: Merge supported features during mpam_enable() into
    mpam_class
  arm_mpam: Reset MSC controls from cpuhp callbacks
  arm_mpam: Add a helper to touch an MSC from any CPU
  arm_mpam: Extend reset logic to allow devices to be reset any time
  arm_mpam: Register and enable IRQs
  arm_mpam: Use a static key to indicate when mpam is enabled
  arm_mpam: Allow configuration to be applied and restored during cpu
    online
  arm_mpam: Probe and reset the rest of the features
  arm_mpam: Add helpers to allocate monitors
  arm_mpam: Add mpam_msmon_read() to read monitor value
  arm_mpam: Track bandwidth counter state for overflow and power
    management
  arm_mpam: Add helper to reset saved mbwu state
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add kunit tests for props_mismatch()

Rohit Mathew (2):
  arm_mpam: Probe for long/lwd mbwu counters
  arm_mpam: Use long MBWU counters if supported

 arch/arm64/Kconfig                  |   25 +
 drivers/Kconfig                     |    2 +
 drivers/Makefile                    |    1 +
 drivers/acpi/arm64/Kconfig          |    3 +
 drivers/acpi/arm64/Makefile         |    1 +
 drivers/acpi/arm64/mpam.c           |  384 ++++
 drivers/acpi/pptt.c                 |  248 ++-
 drivers/acpi/tables.c               |    2 +-
 drivers/resctrl/Kconfig             |   22 +
 drivers/resctrl/Makefile            |    4 +
 drivers/resctrl/mpam_devices.c      | 2701 +++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h     |  661 +++++++
 drivers/resctrl/test_mpam_devices.c |  389 ++++
 include/linux/acpi.h                |   26 +
 include/linux/arm_mpam.h            |   58 +
 include/linux/platform_device.h     |    1 +
 16 files changed, 4519 insertions(+), 9 deletions(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h
 create mode 100644 drivers/resctrl/test_mpam_devices.c
 create mode 100644 include/linux/arm_mpam.h

-- 
2.39.5


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 11:26   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

The ACPI MPAM table uses the UID of a processor container specified in
the PPTT to indicate the subset of CPUs and cache topology that can
access each MPAM System Component (MSC).

This information is not directly useful to the kernel. The equivalent
cpumask is needed instead.

Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.

CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Grouped two nested if clauses differently to reduce scope of cpu_node.
 * Removed stale comment refering to the return value.

Changes since v1:
 * Replaced commit message with wording from Dave.
 * Fixed a stray plural.
 * Moved further down in the file to make use of get_pptt() helper.
 * Added a break to exit the loop early.

Changes since RFC:
 * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()

Changes since RFC:
 * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
 * Added missing : in kernel-doc
 * Made helper return void as this never actually returns an error.
---
 drivers/acpi/pptt.c  | 82 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  3 ++
 2 files changed, 85 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..58cfa3916a13 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -817,3 +817,85 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
+ * @table_hdr:		A reference to the PPTT table.
+ * @parent_node:	A pointer to the processor node in the @table_hdr.
+ * @cpus:		A cpumask to fill with the CPUs below @parent_node.
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+				     struct acpi_pptt_processor *parent_node,
+				     cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_id;
+	int cpu;
+
+	cpumask_clear(cpus);
+
+	for_each_possible_cpu(cpu) {
+		acpi_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+		while (cpu_node) {
+			if (cpu_node == parent_node) {
+				cpumask_set_cpu(cpu, cpus);
+				break;
+			}
+			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+		}
+	}
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ *                                       processor container
+ * @acpi_cpu_id:	The UID of the processor container.
+ * @cpus:		The resulting CPU mask.
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
+ * Container, they may exist purely to describe a Private resource. CPUs
+ * have to be leaves, so a Processor Container is a non-leaf that has the
+ * 'ACPI Processor ID valid' flag set.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+	struct acpi_table_header *table_hdr;
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 proc_sz;
+
+	cpumask_clear(cpus);
+
+	table_hdr = acpi_get_pptt();
+	if (!table_hdr)
+		return;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+	proc_sz = sizeof(struct acpi_pptt_processor);
+	while ((unsigned long)entry + proc_sz <= table_end) {
+
+		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR) {
+			struct acpi_pptt_processor *cpu_node;
+
+			cpu_node = (struct acpi_pptt_processor *)entry;
+			if (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID &&
+			    !acpi_pptt_leaf_node(table_hdr, cpu_node) &&
+			    cpu_node->acpi_processor_id == acpi_cpu_id) {
+					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+					break;
+			}
+		}
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 5ff5d99f6ead..4752ebd48132 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
 int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 {
 	return -EINVAL;
 }
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+						     cpumask_t *cpus) { }
 #endif
 
 void acpi_arch_init(void);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
  2025-10-17 18:56 ` [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 11:29   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

In acpi_count_levels(), the initial value of *levels passed by the
caller is really an implementation detail of acpi_count_levels(), so it
is unreasonable to expect the callers of this function to know what to
pass in for this parameter.  The only sensible initial value is 0,
which is what the only upstream caller (acpi_get_cache_info()) passes.

Use a local variable for the starting cache level in acpi_count_levels(),
and pass the result back to the caller via the function return value.

Get rid of the levels parameter, which has no remaining purpose.

Fix acpi_get_cache_info() to match.

Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Typo in commit message.

Changes since v1:
 * Rewritten commit message from Dave.
 * Minor changes to kernel doc comment.
 * Keep the much loved typo.

Changes since RFC:
 * Made acpi_count_levels() return the levels value.
---
 drivers/acpi/pptt.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 58cfa3916a13..63c3a344c075 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -177,14 +177,14 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
 }
 
 /**
- * acpi_count_levels() - Given a PPTT table, and a CPU node, count the cache
- * levels and split cache levels (data/instruction).
+ * acpi_count_levels() - Given a PPTT table, and a CPU node, count the
+ * total number of levels and split cache levels (data/instruction).
  * @table_hdr: Pointer to the head of the PPTT table
  * @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
  * @split_levels:	Number of split cache levels (data/instruction) if
  *			success. Can by NULL.
  *
+ * Return: number of levels.
  * Given a processor node containing a processing unit, walk into it and count
  * how many levels exist solely for it, and then walk up each level until we hit
  * the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * split cache levels (data/instruction) that exist at each level on the way
  * up.
  */
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
-			      struct acpi_pptt_processor *cpu_node,
-			      unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node,
+			     unsigned int *split_levels)
 {
+	int starting_level = 0;
+
 	do {
-		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
 		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
 	} while (cpu_node);
+
+	return starting_level;
 }
 
 /**
@@ -645,7 +649,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
 	if (!cpu_node)
 		return -ENOENT;
 
-	acpi_count_levels(table, cpu_node, levels, split_levels);
+	*levels = acpi_count_levels(table, cpu_node, split_levels);
 
 	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
 		 *levels, split_levels ? *split_levels : -1);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
  2025-10-17 18:56 ` [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
  2025-10-17 18:56 ` [PATCH v3 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 10:34   ` Ben Horgan
  2025-10-24 14:15   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
                   ` (27 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.

Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.

Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes sinec v2:
 * Search all caches, not just unified caches. This removes the need to count
   the caches first, but means a failure to find the table walks the table
   three times for different cache types.
 * Fixed return value of the no-acpi stub.
 * Punctuation typo in a comment,
 * Keep trying to parse the table even if a bogus CPU is encountered.
 * Specified CPUs share caches with other CPUs.

Changes since v1:
 * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
 * Removed a confusing comment.
 * Clarified the kernel doc.

Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 82 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  5 +++
 2 files changed, 87 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 63c3a344c075..50c8f2a3c927 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -350,6 +350,27 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
 	return found;
 }
 
+static struct acpi_pptt_cache *
+acpi_find_any_type_cache_node(struct acpi_table_header *table_hdr,
+			      u32 acpi_cpu_id, unsigned int level,
+			      struct acpi_pptt_processor **node)
+{
+	struct acpi_pptt_cache *cache;
+
+	cache = acpi_find_cache_node(table_hdr, acpi_cpu_id, CACHE_TYPE_UNIFIED,
+				     level, node);
+	if (cache)
+		return cache;
+
+	cache = acpi_find_cache_node(table_hdr, acpi_cpu_id, CACHE_TYPE_DATA,
+				     level, node);
+	if (cache)
+		return cache;
+
+	return acpi_find_cache_node(table_hdr, acpi_cpu_id, CACHE_TYPE_INST,
+				    level, node);
+}
+
 /**
  * update_cache_properties() - Update cacheinfo for the given processor
  * @this_leaf: Kernel cache info structure being updated
@@ -903,3 +924,64 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
 				     entry->length);
 	}
 }
+
+/*
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the cache
+ *
+ * Determine the level relative to any CPU for the cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later.
+ *
+ * If one CPU's L2 is shared with another CPU as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
+ * the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	int level, cpu;
+	u32 acpi_cpu_id;
+	struct acpi_pptt_cache *cache;
+	struct acpi_table_header *table;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+
+	table = acpi_get_pptt();
+	if (!table)
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			continue;
+
+		/* Start at 1 for L1 */
+		level = 1;
+		cache = acpi_find_any_type_cache_node(table, acpi_cpu_id, level,
+						      &cpu_node);
+		while (cache) {
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache, sizeof(*cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				return level;
+
+			level++;
+			cache = acpi_find_any_type_cache_node(table, acpi_cpu_id,
+							      level, &cpu_node);
+		}
+	}
+
+	return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4752ebd48132..be074bdfd4d1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1542,6 +1542,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1565,6 +1566,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
 						     cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	return -ENOENT;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (2 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 10:45   ` Ben Horgan
  2025-10-22 12:58   ` Jeremy Linton
  2025-10-17 18:56 ` [PATCH v3 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
                   ` (26 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Rohit Mathew

MPAM identifies CPUs by the cache_id in the PPTT cache structure.

The driver needs to know which CPUs are associated with the cache.
The CPUs may not all be online, so cacheinfo does not have the
information.

Add a helper to pull this information out of the PPTT.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Removed stray cleanup useage in preference for acpi_get_pptt().
 * Removed WARN_ON_ONCE() for symmetry with other helpers.
 * Dropped restriction on unified caches.

Changes since v1:
 * Added punctuation to the commit message.
 * Removed a comment about an alternative implementaion.
 * Made the loop continue with a warning if a CPU is missing from the PPTT.

Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  6 +++++
 2 files changed, 70 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 50c8f2a3c927..2f86f58699a6 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -985,3 +985,67 @@ int find_acpi_cache_level_from_id(u32 cache_id)
 
 	return -ENOENT;
 }
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ *					   specified cache
+ * @cache_id: The id field of the cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+	int level, cpu;
+	u32 acpi_cpu_id;
+	struct acpi_pptt_cache *cache;
+	struct acpi_table_header *table;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+
+	cpumask_clear(cpus);
+
+	table = acpi_get_pptt();
+	if (!table)
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			continue;
+
+		/* Start at 1 for L1 */
+		level = 1;
+		cache = acpi_find_any_type_cache_node(table, acpi_cpu_id, level,
+						      &cpu_node);
+		while (cache) {
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache, sizeof(*cache));
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache, sizeof(*cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				cpumask_set_cpu(cpu, cpus);
+
+			level++;
+			cache = acpi_find_any_type_cache_node(table, acpi_cpu_id,
+							      level, &cpu_node);
+		}
+	}
+
+	return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index be074bdfd4d1..a9dbacabdf89 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1543,6 +1543,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1570,6 +1571,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
 {
 	return -ENOENT;
 }
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+						      cpumask_t *cpus)
+{
+	return -ENOENT;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 05/29] arm64: kconfig: Add Kconfig entry for MPAM
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (3 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-17 18:56 ` [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table James Morse
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it. As MPAM is only found on arm64
platforms, the arm64 tree is the most natural home for the Kconfig
option.

This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and to register properties
of CPUs with the MPAM driver.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
CC: Dave Martin <dave.martin@arm.com>
---
Changes since v1:
 * Help text rewritten by Dave.
---
 arch/arm64/Kconfig | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6663ffd23f25..67015d51f7b5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2023,6 +2023,29 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 
+config ARM64_MPAM
+	bool "Enable support for MPAM"
+	help
+	  Memory System Resource Partitioning and Monitoring (MPAM) is an
+	  optional extension to the Arm architecture that allows each
+	  transaction issued to the memory system to be labelled with a
+	  Partition identifier (PARTID) and Performance Monitoring Group
+	  identifier (PMG).
+
+	  Memory system components, such as the caches, can be configured with
+	  policies to control how much of various physical resources (such as
+	  memory bandwidth or cache memory) the transactions labelled with each
+	  PARTID can consume.  Depending on the capabilities of the hardware,
+	  the PARTID and PMG can also be used as filtering criteria to measure
+	  the memory system resource consumption of different parts of a
+	  workload.
+
+	  Use of this extension requires CPU support, support in the
+	  Memory System Components (MSC), and a description from firmware
+	  of where the MSCs are in the address space.
+
+	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (4 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 12:29   ` Ben Horgan
  2025-10-24 16:13   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                   ` (24 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.

This happens in two stages. Platform devices are created first for the
MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
to discover the RIS entries the MSC contains.

For now the MPAM hook mpam_ris_create() is stubbed out, but will update
the MPAM driver with optional discovered data about the RIS entries.

CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v2:
 * Expanded commit message.
 * Moved explicit memset() to array initialisation.
 * Added comments on the sizing of arrays.
 * Moved MSC table entry parsing to a helper to allow use of a platform-device
   cleanup rune, result int more returns and fewer breaks.
 * Changed pre-processor macros for table bits.
 * Discover unsupported PPI partitions purely from the table to make gicv5
   easier, which also simplifies acpi_mpam_parse_irqs()
 * Gave interface type numbers pre-processor names.
 * Clarified some comments.
 * Fixed the WARN_ON comparison in acpi_mpam_parse_msc().
 * Made buffer over-run noisier.
 * Print an error condition as %d not %u.
 * Print a debug message when bad NUMA nodes are found.

Changes since v1:
 * Whitespace.
 * Gave GLOBAL_AFFINITY a pre-processor'd name.
 * Fixed assumption that there are zero functional dependencies.
 * Bounds check walking of the MSC RIS.
 * More bounds checking in the main table walk.
 * Check for nonsense numbers of function dependencies.
 * Smattering of pr_debug() to help folk feeding line-noise to the parser.
 * Changed the comment flavour on the SPDX string.
 * Removed additional table check.
 * More comment wrangling.

Changes since RFC:
 * Used DEFINE_RES_IRQ_NAMED() and friends macros.
 * Additional error handling.
 * Check for zero sized MSC.
 * Allow table revisions greater than 1. (no spec for revision 0!)
 * Use cleanup helpers to retrieve ACPI tables, which allows some functions
   to be folded together.
---
 arch/arm64/Kconfig              |   1 +
 drivers/acpi/arm64/Kconfig      |   3 +
 drivers/acpi/arm64/Makefile     |   1 +
 drivers/acpi/arm64/mpam.c       | 377 ++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c           |   2 +-
 include/linux/acpi.h            |  12 +
 include/linux/arm_mpam.h        |  48 ++++
 include/linux/platform_device.h |   1 +
 8 files changed, 444 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 include/linux/arm_mpam.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 67015d51f7b5..c5e66d5d72cd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
 	  optional extension to the Arm architecture that allows each
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
 
 config ACPI_APMT
 	bool
+
+config ACPI_MPAM
+	bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
 obj-$(CONFIG_ACPI_FFH)		+= ffh.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
+obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
 obj-$(CONFIG_ARM_AMBA)		+= amba.o
 obj-y				+= dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..59712397025d
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE                              BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                         GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID                    BIT(4)
+
+/*
+ * Encodings for the MSC node body interface type field.
+ * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
+#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
+
+static bool _is_ppi_partition(u32 flags)
+{
+	u32 aff_type, is_ppi;
+	bool ret;
+
+	is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID, flags);
+	if (!is_ppi)
+		return false;
+
+	aff_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
+	ret = (aff_type == ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
+	if (ret)
+		pr_err_once("Partitioned interrupts not supported\n");
+
+	return ret;
+}
+
+static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
+				   u32 flags, int *irq)
+{
+	u32 int_type;
+	int sense;
+
+	if (!intid)
+		return false;
+
+	if (_is_ppi_partition(flags))
+		return false;
+
+	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);
+	int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
+	if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+		return false;
+
+	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
+	if (*irq <= 0) {
+		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
+			    intid);
+		return false;
+	}
+
+	return true;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+				 struct acpi_mpam_msc_node *tbl_msc,
+				 struct resource *res, int *res_idx)
+{
+	u32 flags, intid;
+	int irq;
+
+	intid = tbl_msc->overflow_interrupt;
+	flags = tbl_msc->overflow_interrupt_flags;
+	if (acpi_mpam_register_irq(pdev, intid, flags, &irq))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+	intid = tbl_msc->error_interrupt;
+	flags = tbl_msc->error_interrupt_flags;
+	if (acpi_mpam_register_irq(pdev, intid, flags, &irq))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+				    struct acpi_mpam_resource_node *res)
+{
+	int level, nid;
+	u32 cache_id;
+
+	switch (res->locator_type) {
+	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+		cache_id = res->locator.cache_locator.cache_reference;
+		level = find_acpi_cache_level_from_id(cache_id);
+		if (level <= 0) {
+			pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);
+			return -EINVAL;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+				       level, cache_id);
+	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+		if (nid == NUMA_NO_NODE) {
+			pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
+				 res->locator.memory_locator.proximity_domain);
+			nid = 0;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+				       255, nid);
+	default:
+		/* These get discovered later and are treated as unknown */
+		return 0;
+	}
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc)
+{
+	int i, err;
+	char *ptr, *table_end;
+	struct acpi_mpam_resource_node *resource;
+
+	ptr = (char *)(tbl_msc + 1);
+	table_end = ptr + tbl_msc->length;
+	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+		u64 max_deps, remaining_table;
+
+		if (ptr + sizeof(*resource) > table_end)
+			return -EINVAL;
+
+		resource = (struct acpi_mpam_resource_node *)ptr;
+
+		remaining_table = table_end - ptr;
+		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
+		if (resource->num_functional_deps > max_deps) {
+			pr_debug("MSC has impossible number of functional dependencies\n");
+			return -EINVAL;
+		}
+
+		err = acpi_mpam_parse_resource(msc, resource);
+		if (err)
+			return err;
+
+		ptr += sizeof(*resource);
+		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
+	}
+
+	return 0;
+}
+
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+				     struct platform_device *pdev,
+				     u32 *acpi_id)
+{
+	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
+	bool acpi_id_valid = false;
+	struct acpi_device *buddy;
+	char uid[11];
+	int err;
+
+	memcpy(hid, &tbl_msc->hardware_id_linked_device,
+	       sizeof(tbl_msc->hardware_id_linked_device));
+
+	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+		*acpi_id = tbl_msc->instance_id_linked_device;
+		acpi_id_valid = true;
+	}
+
+	err = snprintf(uid, sizeof(uid), "%u",
+		       tbl_msc->instance_id_linked_device);
+	if (err >= sizeof(uid)) {
+		pr_debug("Failed to convert uid of device for power management.");
+		return acpi_id_valid;
+	}
+
+	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+	if (buddy)
+		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+	return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+				 enum mpam_msc_iface *iface)
+{
+	switch (tbl_msc->interface_type) {
+	case ACPI_MPAM_MSC_IFACE_MMIO:
+		*iface = MPAM_IFACE_MMIO;
+		return 0;
+	case ACPI_MPAM_MSC_IFACE_PCC:
+		*iface = MPAM_IFACE_PCC;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static struct platform_device * __init acpi_mpam_parse_msc(struct acpi_mpam_msc_node *tbl_msc)
+{
+	struct platform_device *pdev __free(platform_device_put) = platform_device_alloc("mpam_msc", tbl_msc->identifier);
+	int next_res = 0, next_prop = 0, err;
+	/* pcc, nrdy, affinity and a sentinel */
+	struct property_entry props[4] = { 0 };
+	/* mmio, 2xirq, no sentinel. */
+	struct resource res[3] = { 0 };
+	struct acpi_device *companion;
+	enum mpam_msc_iface iface;
+	char uid[16];
+	u32 acpi_id;
+
+	if (!pdev)
+		return ERR_PTR(-ENOMEM);
+
+	/* Some power management is described in the namespace: */
+	err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+	if (err > 0 && err < sizeof(uid)) {
+		companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+		if (companion)
+			ACPI_COMPANION_SET(&pdev->dev, companion);
+		else
+			pr_debug("MSC.%u: missing namespace entry\n",
+				 tbl_msc->identifier);
+	}
+
+	if (decode_interface_type(tbl_msc, &iface)) {
+		pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (iface == MPAM_IFACE_MMIO)
+		res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+						       tbl_msc->mmio_size,
+						       "MPAM:MSC");
+	else if (iface == MPAM_IFACE_PCC)
+		props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+							tbl_msc->base_address);
+
+	acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+
+	WARN_ON_ONCE(next_res > ARRAY_SIZE(res));
+	err = platform_device_add_resources(pdev, res, next_res);
+	if (err)
+		return ERR_PTR(err);
+
+	props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+						tbl_msc->max_nrdy_usec);
+
+	/*
+	 * The MSC's CPU affinity is described via its linked power
+	 * management device, but only if it points at a Processor or
+	 * Processor Container.
+	 */
+	if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
+		props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity", acpi_id);
+
+	WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));
+	err = device_create_managed_software_node(&pdev->dev, props, NULL);
+	if (err)
+		return ERR_PTR(err);
+
+	/*
+	 * Stash the table entry for acpi_mpam_parse_resources() to discover
+	 * what this MSC controls.
+	 */
+	err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+	if (err)
+		return ERR_PTR(err);
+
+	err = platform_device_add(pdev);
+	if (err)
+		return ERR_PTR(err);
+
+	return_ptr(pdev);
+}
+
+static int __init acpi_mpam_parse(void)
+{
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct acpi_mpam_msc_node *tbl_msc;
+	struct platform_device *pdev;
+
+	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		table_offset += tbl_msc->length;
+
+		if (table_offset > table_end) {
+			pr_err("MSC entry overlaps end of ACPI table\n");
+			return -EINVAL;
+		}
+
+		/*
+		 * If any of the reserved fields are set, make no attempt to
+		 * parse the MSC structure. This MSC will still be counted by
+		 * acpi_mpam_count_msc(), meaning the MPAM driver can't probe
+		 * against all MSC, and will never be enabled. There is no way
+		 * to enable it safely, because we cannot determine safe
+		 * system-wide partid and pmg ranges in this situation.
+		 */
+		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
+			pr_err_once("Unrecognised MSC, MPAM not usable\n");
+			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
+			continue;
+		}
+
+		if (!tbl_msc->mmio_size) {
+			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
+			continue;
+		}
+
+		pdev = acpi_mpam_parse_msc(tbl_msc);
+		if (IS_ERR(pdev))
+			return PTR_ERR(pdev);
+	}
+
+	return 0;
+}
+
+int acpi_mpam_count_msc(void)
+{
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct acpi_mpam_msc_node *tbl_msc;
+	int count = 0;
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		if (tbl_msc->length < sizeof(*tbl_msc))
+			return -EINVAL;
+		if (tbl_msc->length > table_end - table_offset)
+			return -EINVAL;
+		table_offset += tbl_msc->length;
+
+		count++;
+	}
+
+	return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 57fc8bc56166..4286e4af1092 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
 	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
 	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
 	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
-	ACPI_SIG_NBFT, ACPI_SIG_SWFT};
+	ACPI_SIG_NBFT, ACPI_SIG_SWFT, ACPI_SIG_MPAM};
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index a9dbacabdf89..9d66421f68ff 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_ACPI_H
 #define _LINUX_ACPI_H
 
+#include <linux/cleanup.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>	/* for struct resource */
 #include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+	struct acpi_table_header *table;
+	int status = acpi_get_table(signature, instance, &table);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-ENOENT);
+	return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
 		unsigned long table_size, int entry_id,
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..3d6c39c667c3
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+#define GLOBAL_AFFINITY		~0
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
+	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
+	MPAM_CLASS_MEMORY,      /* Main memory */
+	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+/* Parse the ACPI description of resources entries for this MSC. */
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+					    struct acpi_mpam_msc_node *tbl_msc)
+{
+	return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 074754c23d33..23a30ada2d4c 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
 extern int platform_device_add(struct platform_device *pdev);
 extern void platform_device_del(struct platform_device *pdev);
 extern void platform_device_put(struct platform_device *pdev);
+DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))
 
 struct platform_driver {
 	int (*probe)(struct platform_device *);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (5 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 12:43   ` Ben Horgan
                     ` (4 more replies)
  2025-10-17 18:56 ` [PATCH v3 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
                   ` (23 subsequent siblings)
  30 siblings, 5 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.

Start with driver probe/remove and mapping the MSC.

CC: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Comment in Kconfig about why EXPERT.
 * Dropped duplicate depends.
 * Fixed duplicate return statement.
 * Restructured driver probe to have a do_ function to allow breaks to be
   return instead...
 * Removed resctrl.h include, added spinlock.h
 * Removed stray DT function prototype
 * Removed stray PCC variables in struct mpam_msc.
 * Used ccflags not cflags for debug define.
 * Moved srcu header include to internal.h
 * Moved mpam_msc_destroy() into this patch.

Changes since v1:
 * Avoid selecting driver on other architectrues.
 * Removed PCC support stub.
 * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
 * Clarified a comment.
 * Stopped using mpam_num_msc as an id,a and made it atomic.
 * Size of -1 returned from cache_of_calculate_id()
 * Renamed some struct members.
 * Made a bunch of pr_err() dev_err_ocne().
 * Used more cleanup magic.
 * Inlined a print message.
 * Fixed error propagation from mpam_dt_parse_resources().
 * Moved cache accessibility checks earlier.
 * Change cleanup macro to use IS_ERR_OR_NULL().

Changes since RFC:
 * Check for status=broken DT devices.
 * Moved all the files around.
 * Made Kconfig symbols depend on EXPERT
---
 arch/arm64/Kconfig              |   1 +
 drivers/Kconfig                 |   2 +
 drivers/Makefile                |   1 +
 drivers/acpi/arm64/mpam.c       |   7 ++
 drivers/resctrl/Kconfig         |  13 +++
 drivers/resctrl/Makefile        |   4 +
 drivers/resctrl/mpam_devices.c  | 190 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  52 +++++++++
 include/linux/acpi.h            |   2 +-
 9 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c5e66d5d72cd..004d58cfbff8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
 	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
 
 source "drivers/cdx/Kconfig"
 
+source "drivers/resctrl/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 8e1ffa4358d5..20eb17596b89 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,6 +194,7 @@ obj-$(CONFIG_HTE)		+= hte/
 obj-$(CONFIG_DRM_ACCEL)		+= accel/
 obj-$(CONFIG_CDX_BUS)		+= cdx/
 obj-$(CONFIG_DPLL)		+= dpll/
+obj-y				+= resctrl/
 
 obj-$(CONFIG_DIBS)		+= dibs/
 obj-$(CONFIG_S390)		+= s390/
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
index 59712397025d..51c6f5fd4a5e 100644
--- a/drivers/acpi/arm64/mpam.c
+++ b/drivers/acpi/arm64/mpam.c
@@ -337,6 +337,13 @@ static int __init acpi_mpam_parse(void)
 	return 0;
 }
 
+/**
+ * acpi_mpam_count_msc() - Count the number of MSC described by firmware.
+ *
+ * Returns the number of of MSC, or zero for an error.
+ *
+ * This can be called before or in parallel with acpi_mpam_parse().
+ */
 int acpi_mpam_count_msc(void)
 {
 	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..58c83b5c8bfd
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,13 @@
+menuconfig ARM64_MPAM_DRIVER
+	bool "MPAM driver"
+	depends on ARM64 && ARM64_MPAM && EXPERT
+	help
+	  MPAM driver for System IP, e,g. caches and memory controllers.
+
+if ARM64_MPAM_DRIVER
+config ARM64_MPAM_DRIVER_DEBUG
+	bool "Enable debug messages from the MPAM driver"
+	help
+	  Say yes here to enable debug messages from the MPAM driver.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..898199dcf80d
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
+mpam-y						+= mpam_devices.o
+
+ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..d18eeec95f79
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+static struct srcu_struct mpam_srcu;
+
+/*
+ * Number of MSCs that have been probed. Once all MSC have been probed MPAM
+ * can be enabled.
+ */
+static atomic_t mpam_num_msc;
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+	u32 affinity_id;
+	int err;
+
+	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+				       &affinity_id);
+	if (err)
+		cpumask_copy(&msc->accessibility, cpu_possible_mask);
+	else
+		acpi_pptt_get_cpus_from_container(affinity_id,
+						  &msc->accessibility);
+	return err;
+}
+
+static int fw_num_msc;
+
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+	struct platform_device *pdev = msc->pdev;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&msc->all_msc_list);
+	platform_set_drvdata(pdev, NULL);
+}
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+	struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+	if (!msc)
+		return;
+
+	mutex_lock(&mpam_list_lock);
+	mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+
+	synchronize_srcu(&mpam_srcu);
+}
+
+static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	u32 tmp;
+	struct mpam_msc *msc;
+	struct resource *msc_res;
+	struct device *dev = &pdev->dev;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+	if (!msc)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&msc->probe_lock);
+	mutex_init(&msc->part_sel_lock);
+	msc->id = pdev->id;
+	msc->pdev = pdev;
+	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
+	INIT_LIST_HEAD_RCU(&msc->ris);
+
+	err = update_msc_accessibility(msc);
+	if (err)
+		return ERR_PTR(err);
+	if (cpumask_empty(&msc->accessibility)) {
+		dev_err_once(dev, "MSC is not accessible from any CPU!");
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
+		msc->iface = MPAM_IFACE_MMIO;
+	else
+		msc->iface = MPAM_IFACE_PCC;
+
+	if (msc->iface == MPAM_IFACE_MMIO) {
+		void __iomem *io;
+
+		io = devm_platform_get_and_ioremap_resource(pdev, 0,
+							    &msc_res);
+		if (IS_ERR(io)) {
+			dev_err_once(dev, "Failed to map MSC base address\n");
+			return (void *)io;
+		}
+		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+		msc->mapped_hwpage = io;
+	}
+
+	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
+	platform_set_drvdata(pdev, msc);
+
+	return msc;
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	struct mpam_msc *msc = NULL;
+	void *plat_data = pdev->dev.platform_data;
+
+	mutex_lock(&mpam_list_lock);
+	msc = do_mpam_msc_drv_probe(pdev);
+	mutex_unlock(&mpam_list_lock);
+	if (!IS_ERR(msc)) {
+		/* Create RIS entries described by firmware */
+		err = acpi_mpam_parse_resources(msc, plat_data);
+		if (err)
+			mpam_msc_drv_remove(pdev);
+	} else {
+		err = PTR_ERR(msc);
+	}
+
+	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
+		pr_info("Discovered all MSC\n");
+
+	return err;
+}
+
+static struct platform_driver mpam_msc_driver = {
+	.driver = {
+		.name = "mpam_msc",
+	},
+	.probe = mpam_msc_drv_probe,
+	.remove = mpam_msc_drv_remove,
+};
+
+static int __init mpam_msc_driver_init(void)
+{
+	if (!system_supports_mpam())
+		return -EOPNOTSUPP;
+
+	init_srcu_struct(&mpam_srcu);
+
+	fw_num_msc = acpi_mpam_count_msc();
+
+	if (fw_num_msc <= 0) {
+		pr_err("No MSC devices found in firmware\n");
+		return -EINVAL;
+	}
+
+	return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..6ac75f3613c3
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2025 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/sizes.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+
+struct platform_device;
+
+struct mpam_msc {
+	/* member of mpam_all_msc */
+	struct list_head        all_msc_list;
+
+	int			id;
+	struct platform_device *pdev;
+
+	/* Not modified after mpam_is_enabled() becomes true */
+	enum mpam_msc_iface	iface;
+	u32			nrdy_usec;
+	cpumask_t		accessibility;
+
+	/*
+	 * probe_lock is only taken during discovery. After discovery these
+	 * properties become read-only and the lists are protected by SRCU.
+	 */
+	struct mutex		probe_lock;
+	unsigned long		ris_idxs;
+	u32			ris_max;
+
+	/* mpam_msc_ris of this component */
+	struct list_head	ris;
+
+	/*
+	 * part_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+	 * by RIS).
+	 * If needed, take msc->probe_lock first.
+	 */
+	struct mutex		part_sel_lock;
+
+	void __iomem		*mapped_hwpage;
+	size_t			mapped_hwpage_sz;
+};
+#endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 9d66421f68ff..70f075b397ce 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -231,7 +231,7 @@ static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32
 		return ERR_PTR(-ENOENT);
 	return table;
 }
-DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR_OR_NULL(_T)) acpi_put_table(_T))
 
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (6 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 16:47   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.

To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)

Add support for creating and destroying structures to allow a hierarchy
of resources to be created.

CC: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Renamed _get functions _find to avoid folk looking for _put.
 * Made init_garbage a static-inline.
 * Moved garbage pdev assignment closer to allocation to make that pattern
   clearer.
 * Return found MSC when 'finding' as there is no longer a lock to unlock.
 * Added initialisation of ris->msc_list.
 * Use dev_warn_once() in preference to printing the device name.

Changes since v1:
 * Fixed a comp/vmsc typo.
 * Removed duplicate description from the commit message.
 * Moved parenthesis in the add_to_garbage() macro.
 * Check for out of range ris_idx when creating ris.
 * Removed GFP as probe_lock is no longer a spin lock.
 * Removed alloc flag as ended up searching the lists itself.
 * Added a comment about affinity masks not overlapping.

Changes since RFC:
 * removed a pr_err() debug message that crept in.
---
 drivers/resctrl/mpam_devices.c  | 390 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  93 ++++++++
 include/linux/arm_mpam.h        |   8 +-
 3 files changed, 483 insertions(+), 8 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index d18eeec95f79..8685e50f08c6 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -30,7 +30,7 @@
 static DEFINE_MUTEX(mpam_list_lock);
 static LIST_HEAD(mpam_all_msc);
 
-static struct srcu_struct mpam_srcu;
+struct srcu_struct mpam_srcu;
 
 /*
  * Number of MSCs that have been probed. Once all MSC have been probed MPAM
@@ -38,6 +38,379 @@ static struct srcu_struct mpam_srcu;
  */
 static atomic_t mpam_num_msc;
 
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+static inline void init_garbage(struct mpam_garbage *garbage)
+{
+	init_llist_node(&garbage->llist);
+}
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
+	if (!vmsc)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(&vmsc->garbage);
+
+	INIT_LIST_HEAD_RCU(&vmsc->ris);
+	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+	vmsc->comp = comp;
+	vmsc->msc = msc;
+
+	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+	return vmsc;
+}
+
+static struct mpam_vmsc *mpam_vmsc_find(struct mpam_component *comp,
+				        struct mpam_msc *msc)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (vmsc->msc->id == msc->id)
+			return vmsc;
+	}
+
+	return mpam_vmsc_alloc(comp, msc);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(&comp->garbage);
+
+	comp->comp_id = id;
+	INIT_LIST_HEAD_RCU(&comp->vmsc);
+	/* affinity is updated when ris are added */
+	INIT_LIST_HEAD_RCU(&comp->class_list);
+	comp->class = class;
+
+	list_add_rcu(&comp->class_list, &class->components);
+
+	return comp;
+}
+
+static struct mpam_component *
+mpam_component_find(struct mpam_class *class, int id)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(comp, &class->components, class_list) {
+		if (comp->comp_id == id)
+			return comp;
+	}
+
+	return mpam_component_alloc(class, id);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	class = kzalloc(sizeof(*class), GFP_KERNEL);
+	if (!class)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(&class->garbage);
+
+	INIT_LIST_HEAD_RCU(&class->components);
+	/* affinity is updated when ris are added */
+	class->level = level_idx;
+	class->type = type;
+	INIT_LIST_HEAD_RCU(&class->classes_list);
+
+	list_add_rcu(&class->classes_list, &mpam_classes);
+
+	return class;
+}
+
+static struct mpam_class *
+mpam_class_find(u8 level_idx, enum mpam_class_types type)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		if (class->type == type && class->level == level_idx)
+			return class;
+	}
+
+	return mpam_class_alloc(level_idx, type);
+}
+
+#define add_to_garbage(x)				\
+do {							\
+	__typeof__(x) _x = (x);				\
+	_x->garbage.to_free = _x;			\
+	llist_add(&_x->garbage.llist, &mpam_garbage);	\
+} while (0)
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&class->classes_list);
+	add_to_garbage(class);
+}
+
+static void mpam_comp_destroy(struct mpam_component *comp)
+{
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&comp->class_list);
+	add_to_garbage(comp);
+
+	if (list_empty(&class->components))
+		mpam_class_destroy(class);
+}
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+	struct mpam_component *comp = vmsc->comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&vmsc->comp_list);
+	add_to_garbage(vmsc);
+
+	if (list_empty(&comp->vmsc))
+		mpam_comp_destroy(comp);
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+	struct mpam_vmsc *vmsc = ris->vmsc;
+	struct mpam_msc *msc = vmsc->msc;
+	struct mpam_component *comp = vmsc->comp;
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	/*
+	 * It is assumed affinities don't overlap. If they do the class becomes
+	 * unusable immediately.
+	 */
+	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+	clear_bit(ris->ris_idx, &msc->ris_idxs);
+	list_del_rcu(&ris->vmsc_list);
+	list_del_rcu(&ris->msc_list);
+	add_to_garbage(ris);
+
+	if (list_empty(&vmsc->ris))
+		mpam_vmsc_destroy(vmsc);
+}
+
+static void mpam_free_garbage(void)
+{
+	struct mpam_garbage *iter, *tmp;
+	struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+	if (!to_free)
+		return;
+
+	synchronize_srcu(&mpam_srcu);
+
+	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+		if (iter->pdev)
+			devm_kfree(&iter->pdev->dev, iter->to_free);
+		else
+			kfree(iter->to_free);
+	}
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ * This helper walks the device tree to include offline CPUs too.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity)
+{
+	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (node_id == cpu_to_node(cpu))
+			cpumask_set_cpu(cpu, affinity);
+	}
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+				 enum mpam_class_types type,
+				 struct mpam_class *class,
+				 struct mpam_component *comp)
+{
+	int err;
+
+	switch (type) {
+	case MPAM_CLASS_CACHE:
+		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+						     affinity);
+		if (err)
+			return err;
+
+		if (cpumask_empty(affinity))
+			dev_warn_once(&msc->pdev->dev,
+				      "no CPUs associated with cache node\n");
+
+		break;
+	case MPAM_CLASS_MEMORY:
+		get_cpumask_from_node_id(comp->comp_id, affinity);
+		/* affinity may be empty for CPU-less memory nodes */
+		break;
+	case MPAM_CLASS_UNKNOWN:
+		return 0;
+	}
+
+	cpumask_and(affinity, affinity, &msc->accessibility);
+
+	return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	int err;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+	struct platform_device *pdev = msc->pdev;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
+		return -EINVAL;
+
+	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
+		return -EBUSY;
+
+	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
+	if (!ris)
+		return -ENOMEM;
+	init_garbage(&ris->garbage);
+	ris->garbage.pdev = pdev;
+
+	class = mpam_class_find(class_id, type);
+	if (IS_ERR(class))
+		return PTR_ERR(class);
+
+	comp = mpam_component_find(class, component_id);
+	if (IS_ERR(comp)) {
+		if (list_empty(&class->components))
+			mpam_class_destroy(class);
+		return PTR_ERR(comp);
+	}
+
+	vmsc = mpam_vmsc_find(comp, msc);
+	if (IS_ERR(vmsc)) {
+		if (list_empty(&comp->vmsc))
+			mpam_comp_destroy(comp);
+		return PTR_ERR(vmsc);
+	}
+
+	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+	if (err) {
+		if (list_empty(&vmsc->ris))
+			mpam_vmsc_destroy(vmsc);
+		return err;
+	}
+
+	ris->ris_idx = ris_idx;
+	INIT_LIST_HEAD_RCU(&ris->msc_list);
+	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+	ris->vmsc = vmsc;
+
+	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+
+	return 0;
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id)
+{
+	int err;
+
+	mutex_lock(&mpam_list_lock);
+	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+				     component_id);
+	mutex_unlock(&mpam_list_lock);
+	if (err)
+		mpam_free_garbage();
+
+	return err;
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -62,14 +435,25 @@ static int update_msc_accessibility(struct mpam_msc *msc)
 
 static int fw_num_msc;
 
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
 static void mpam_msc_destroy(struct mpam_msc *msc)
 {
 	struct platform_device *pdev = msc->pdev;
+	struct mpam_msc_ris *ris, *tmp;
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+		mpam_ris_destroy(ris);
+
 	list_del_rcu(&msc->all_msc_list);
 	platform_set_drvdata(pdev, NULL);
+
+	add_to_garbage(msc);
 }
 
 static void mpam_msc_drv_remove(struct platform_device *pdev)
@@ -83,7 +467,7 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
 	mpam_msc_destroy(msc);
 	mutex_unlock(&mpam_list_lock);
 
-	synchronize_srcu(&mpam_srcu);
+	mpam_free_garbage();
 }
 
 static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
@@ -99,6 +483,8 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
 	if (!msc)
 		return ERR_PTR(-ENOMEM);
+	init_garbage(&msc->garbage);
+	msc->garbage.pdev = pdev;
 
 	mutex_init(&msc->probe_lock);
 	mutex_init(&msc->part_sel_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6ac75f3613c3..1a5d96660382 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,14 +7,32 @@
 #include <linux/arm_mpam.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
 #include <linux/sizes.h>
 #include <linux/spinlock.h>
 #include <linux/srcu.h>
 
+#define MPAM_MSC_MAX_NUM_RIS	16
+
 struct platform_device;
 
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be gargbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+	/* member of mpam_garbage */
+	struct llist_node	llist;
+
+	void			*to_free;
+	struct platform_device	*pdev;
+};
+
 struct mpam_msc {
 	/* member of mpam_all_msc */
 	struct list_head        all_msc_list;
@@ -48,5 +66,80 @@ struct mpam_msc {
 
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
+
+	struct mpam_garbage	garbage;
 };
+
+struct mpam_class {
+	/* mpam_components in this class */
+	struct list_head	components;
+
+	cpumask_t		affinity;
+
+	u8			level;
+	enum mpam_class_types	type;
+
+	/* member of mpam_classes */
+	struct list_head	classes_list;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_component {
+	u32			comp_id;
+
+	/* mpam_vmsc in this component */
+	struct list_head	vmsc;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_class:components */
+	struct list_head	class_list;
+
+	/* parent: */
+	struct mpam_class	*class;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_vmsc {
+	/* member of mpam_component:vmsc_list */
+	struct list_head	comp_list;
+
+	/* mpam_msc_ris in this vmsc */
+	struct list_head	ris;
+
+	/* All RIS in this vMSC are members of this MSC */
+	struct mpam_msc		*msc;
+
+	/* parent: */
+	struct mpam_component	*comp;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_msc_ris {
+	u8			ris_idx;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_vmsc:ris */
+	struct list_head	vmsc_list;
+
+	/* member of mpam_msc:ris */
+	struct list_head	msc_list;
+
+	/* parent: */
+	struct mpam_vmsc	*vmsc;
+
+	struct mpam_garbage	garbage;
+};
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity);
+
 #endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 3d6c39c667c3..3206f5ddc147 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -38,11 +38,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
 static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 #endif
 
-static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
-				  enum mpam_class_types type, u8 class_id,
-				  int component_id)
-{
-	return -EINVAL;
-}
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id);
 
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (7 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-17 23:03   ` Fenghua Yu
                     ` (2 more replies)
  2025-10-17 18:56 ` [PATCH v3 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
                   ` (21 subsequent siblings)
  30 siblings, 3 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.

Add the definitions for these registers as offset within the page(s).

Link: https://developer.arm.com/documentation/ihi0099/latest/
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Removed a few colons.
 * fixed a typo.
 * Moved some definitions around.

Changes since v1:
 * Whitespace.
 * Added constants for CASSOC and XCL.
 * Merged FLT/CTL defines.
 * Fixed MSMON_CFG_CSU_CTL_TYPE_CSU definition.

Changes since RFC:
 * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
 * Whitepsace churn.
 * Cite a more recent document.
 * Removed some stale feature, fixed some names etc.
---
 drivers/resctrl/mpam_internal.h | 268 ++++++++++++++++++++++++++++++++
 1 file changed, 268 insertions(+)

diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1a5d96660382..1ef3e8e1d056 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -142,4 +142,272 @@ extern struct list_head mpam_classes;
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/latest/
+ */
+#define MPAM_ARCHITECTURE_V1    0x10
+
+/* Memory mapped control pages */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR		0x0000  /* features id register */
+#define MPAMF_IIDR		0x0018  /* implementer id register */
+#define MPAMF_AIDR		0x0020  /* architectural id register */
+#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
+#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
+#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
+#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
+#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL	0x0100  /* partid to configure */
+#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
+#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
+#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
+#define MPAMCFG_CASSOC		0x0118  /* cache-associativity config */
+#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
+#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
+#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
+#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
+#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
+#define MSMON_CSU		0x0840  /* current cache-usage */
+#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
+#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
+#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
+#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
+#define MPAMF_ESR		0x00F8  /* error status register */
+#define MPAMF_ECR		0x00F0  /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
+#define MPAMF_IDR_EXT			BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
+#define MPAMF_IDR_HAS_MSMON		BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
+#define MPAMF_IDR_HAS_RIS		BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
+#define MPAMF_IDR_HAS_ESR		BIT(39)
+#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
+#define MPAMF_MBW_IDR_WINDWR		BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX	GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
+#define MPAMF_IIDR_REVISION	GENMASK(15, 12)
+#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
+#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
+#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
+#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMCFG_CASSOC - MPAM cache maximum associativity partition configuration register */
+#define MPAMCFG_CASSOC_CASSOC		GENMASK(15, 0)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
+#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ *                     register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ *                    configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN		BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
+#define MPAMF_ESR_PMG		GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR		BIT(31)
+#define MPAMF_ESR_RIS		GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN		BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE			0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
+#define MPAM_ERRCODE_REQ_PMG_RANGE		4
+#define MPAM_ERRCODE_MONITOR_RANGE		5
+#define MPAM_ERRCODE_INTPARTID_RANGE		6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
+#define MPAM_ERRCODE_UNDEFINED_RIS_PART_SEL	8
+#define MPAM_ERRCODE_RIS_NO_CONTROL		9
+#define MPAM_ERRCODE_UNDEFINED_RIS_MON_SEL	10
+#define MPAM_ERRCODE_RIS_NO_MONITOR		11
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ *                    usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
+#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN			BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU		0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU		0x43
+
+#define MSMON_CFG_MBWU_CTL_SCLEN		BIT(19)
+
+/*
+ * MSMON_CFG_CSU_FLT -  Memory system performance monitor configure cache storage
+ *                      usage monitor filter register
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ *                      bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_x_FLT_PARTID			GENMASK(15, 0)
+#define MSMON_CFG_x_FLT_PMG			GENMASK(23, 16)
+
+#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
+#define MSMON_CFG_CSU_FLT_XCL			BIT(31)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ *            register
+ * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
+ *                     capture register
+ * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
+ *               monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ *                     capture register
+ */
+#define MSMON___VALUE		GENMASK(30, 0)
+#define MSMON___NRDY		BIT(31)
+#define MSMON___NRDY_L		BIT(63)
+#define MSMON___L_VALUE		GENMASK(43, 0)
+#define MSMON___LWD_VALUE	GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ *                  generation register
+ */
+#define MSMON_CAPT_EVNT_NOW	BIT(0)
+
 #endif /* MPAM_INTERNAL_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (8 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-29  7:24   ` Shaopeng Tan (Fujitsu)
  2025-10-17 18:56 ` [PATCH v3 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Lecopzer Chen, Ben Horgan

Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.

Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed as each CPU's online call is made.

This adds the low-level MSC register accessors.

Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced.  The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.

The architecture's context switch code will be enabled by a static-key,
this can be set by mpam_enable(), but must be done from process context,
not a cpuhp callback because both take the cpuhp lock.
Whenever a new MSC has been probed, the mpam_enable() work is scheduled
to test if all the MSCs have been probed. If probing fails, mpam_disable()
is scheduled to unregister the cpuhp callbacks and free memory.

CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Change since v2:
 * Simplified to if(err) break.
 * Pass name to mpam_register_cpuhp_callbacks() to allow the two callback
   types to be discerned in debugfs.

Changes since v1:
 * Removed register bounds check. If the firmware tables are wrong the
   resulting translation fault should be enough to debug this.
 * Removed '&' in front of a function pointer.
 * Pulled mpam_disable() into this patch.
 * Disable mpam when probing fails to avoid extra work on broken platforms.
 * Added mpam_disbale_reason as there are now two non-debug reasons for this
   to happen.
---
 drivers/resctrl/mpam_devices.c  | 174 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   5 +
 2 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8685e50f08c6..49f874fae0a6 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,6 +4,7 @@
 #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
 
 #include <linux/acpi.h>
+#include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
@@ -19,6 +20,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include "mpam_internal.h"
 
@@ -38,6 +40,25 @@ struct srcu_struct mpam_srcu;
  */
 static atomic_t mpam_num_msc;
 
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
+/* When mpam is disabled, the printed reason to aid debugging */
+static char *mpam_disable_reason;
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -81,6 +102,20 @@ static inline void init_garbage(struct mpam_garbage *garbage)
 {
 	init_llist_node(&garbage->llist);
 }
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
 
 static struct mpam_vmsc *
 mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
@@ -411,6 +446,86 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+	u64 idr;
+	struct device *dev = &msc->pdev->dev;
+
+	lockdep_assert_held(&msc->probe_lock);
+
+	idr = __mpam_read_reg(msc, MPAMF_AIDR);
+	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+		dev_err_once(dev, "MSC does not match MPAM architecture v1.x\n");
+		return -EIO;
+	}
+
+	msc->probed = true;
+
+	return 0;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
+{
+	return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+	int err = 0;
+	struct mpam_msc *msc;
+	bool new_device_probed = false;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			err = mpam_msc_hw_probe(msc);
+		mutex_unlock(&msc->probe_lock);
+
+		if (err)
+			break;
+		new_device_probed = true;
+	}
+
+	if (new_device_probed && !err)
+		schedule_work(&mpam_enable_work);
+	if (err) {
+		mpam_disable_reason = "error during probing";
+		schedule_work(&mpam_broken_work);
+	}
+
+	return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+	return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+					  int (*offline)(unsigned int offline),
+					  char *name)
+{
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+
+	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, name, online,
+					     offline);
+	if (mpam_cpuhp_state <= 0) {
+		pr_err("Failed to register cpuhp callbacks");
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -544,7 +659,8 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 	}
 
 	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
-		pr_info("Discovered all MSC\n");
+		mpam_register_cpuhp_callbacks(mpam_discovery_cpu_online, NULL,
+					      "mpam:drv_probe");
 
 	return err;
 }
@@ -557,6 +673,62 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+static void mpam_enable_once(void)
+{
+	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
+				      "mpam:online");
+
+	pr_info("MPAM enabled\n");
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+	struct mpam_msc *msc, *tmp;
+
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
+		mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+	mpam_free_garbage();
+
+	pr_err_once("MPAM disabled due to %s\n", mpam_disable_reason);
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+	static atomic_t once;
+	struct mpam_msc *msc;
+	bool all_devices_probed = true;
+
+	/* Have we probed all the hw devices? */
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			all_devices_probed = false;
+		mutex_unlock(&msc->probe_lock);
+
+		if (!all_devices_probed)
+			break;
+	}
+
+	if (all_devices_probed && !atomic_fetch_inc(&once))
+		mpam_enable_once();
+}
+
 static int __init mpam_msc_driver_init(void)
 {
 	if (!system_supports_mpam())
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1ef3e8e1d056..8865a7d81dd1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -50,6 +50,7 @@ struct mpam_msc {
 	 * properties become read-only and the lists are protected by SRCU.
 	 */
 	struct mutex		probe_lock;
+	bool			probed;
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
@@ -139,6 +140,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+void mpam_disable(struct work_struct *work);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (9 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 17:40   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may also have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
The limits from requestors (e.g. CPUs) also need taking into account.

While doing this, RIS entries that firmware didn't describe are created
under MPAM_CLASS_UNKNOWN.

While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Simplified return of mpam_get_or_create_ris()
 * Used guard() in mpam_register_requestor()
 * whitespace,
 * >= rather than > in a bounds checking warning.
 * Added comment explaining why printk() is counter-intuitively used.

Changes since v1:
 * Change to lock ordering now that the list-lock mutex isn't held from
   the cpuhp call.
 * Removed irq-unmaksed assert in requestor register.
 * Changed captialisation in print message.
---
 drivers/resctrl/mpam_devices.c  | 148 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   6 ++
 include/linux/arm_mpam.h        |  14 +++
 3 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 49f874fae0a6..910bb6cd5e4f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
 #include <linux/acpi.h>
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -43,6 +44,15 @@ static atomic_t mpam_num_msc;
 static int mpam_cpuhp_state;
 static DEFINE_MUTEX(mpam_cpuhp_state_lock);
 
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
 /*
  * mpam is enabled once all devices have been probed from CPU online callbacks,
  * scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -117,6 +127,69 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
 
 #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
 
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	WARN_ON_ONCE(reg + sizeof(u32) >= msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+	u64 idr_high = 0, idr_low;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	idr_low = mpam_read_partsel_reg(msc, IDR);
+	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+	return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+	guard(spinlock)(&partid_max_lock);
+	if (!partid_max_init) {
+		mpam_partid_max = partid_max;
+		mpam_pmg_max = pmg_max;
+		partid_max_init = true;
+	} else if (!partid_max_published) {
+		mpam_partid_max = min(mpam_partid_max, partid_max);
+		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+	} else {
+		/* New requestors can't lower the values */
+		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+			return -EBUSY;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
 static struct mpam_vmsc *
 mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
 {
@@ -427,6 +500,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
 	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
 	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
 	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+	list_add_rcu(&ris->msc_list, &msc->ris);
 
 	return 0;
 }
@@ -446,9 +520,36 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+						   u8 ris_idx)
+{
+	int err;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (!test_bit(ris_idx, &msc->ris_idxs)) {
+		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+					     0, 0);
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	list_for_each_entry(ris, &msc->ris, msc_list) {
+		if (ris->ris_idx == ris_idx) {
+			return ris;
+		}
+	}
+
+	return ERR_PTR(-ENOENT);
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
+	u16 partid_max;
+	u8 ris_idx, pmg_max;
+	struct mpam_msc_ris *ris;
 	struct device *dev = &msc->pdev->dev;
 
 	lockdep_assert_held(&msc->probe_lock);
@@ -459,6 +560,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		return -EIO;
 	}
 
+	/* Grab an IDR value to find out how many RIS there are */
+	mutex_lock(&msc->part_sel_lock);
+	idr = mpam_msc_read_idr(msc);
+	mutex_unlock(&msc->part_sel_lock);
+
+	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+	/* Use these values so partid/pmg always starts with a valid value */
+	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		idr = mpam_msc_read_idr(msc);
+		mutex_unlock(&msc->part_sel_lock);
+
+		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+		msc->partid_max = min(msc->partid_max, partid_max);
+		msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+		mutex_lock(&mpam_list_lock);
+		ris = mpam_get_or_create_ris(msc, ris_idx);
+		mutex_unlock(&mpam_list_lock);
+		if (IS_ERR(ris))
+			return PTR_ERR(ris);
+	}
+
+	spin_lock(&partid_max_lock);
+	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+	spin_unlock(&partid_max_lock);
+
 	msc->probed = true;
 
 	return 0;
@@ -675,10 +810,20 @@ static struct platform_driver mpam_msc_driver = {
 
 static void mpam_enable_once(void)
 {
+	/*
+	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+	 * longer change.
+	 */
+	spin_lock(&partid_max_lock);
+	partid_max_published = true;
+	spin_unlock(&partid_max_lock);
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
 
-	pr_info("MPAM enabled\n");
+	/* Use printk() to avoid the pr_fmt adding the function name. */
+	printk(KERN_INFO, "MPAM enabled with %u PARTIDs and %u PMGs\n",
+	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
 void mpam_disable(struct work_struct *ignored)
@@ -745,4 +890,5 @@ static int __init mpam_msc_driver_init(void)
 
 	return platform_driver_register(&mpam_msc_driver);
 }
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 8865a7d81dd1..9c08502e9c76 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -51,6 +51,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	u16			partid_max;
+	u8			pmg_max;
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
@@ -140,6 +142,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 3206f5ddc147..cb6e6cfbea0b 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -41,4 +41,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 		    enum mpam_class_types type, u8 class_id, int component_id);
 
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max:		The maximum PARTID value the requestor can generate.
+ * @pmg_max:		The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (10 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 17:43   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

The MSC MON_SEL register needs to be accessed from hardirq for the overflow
interrupt, and when taking an IPI to access these registers on platforms
where MSC are not accessible from every CPU. This makes an irqsave
spinlock the obvious lock to protect these registers. On systems with SCMI
or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
The SCMI or PCC platforms can't support an overflow interrupt, and
can't access the registers from hardirq context.

Clearly these two can't exist for one MSC at the same time.

Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
only support 'real' MMIO platforms.

In the future this lock will be split in two allowing SCMI/PCC platforms
to take a mutex. Because there are contexts where the SCMI/PCC platforms
can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
this now, so that all the error handling on these paths is present. This
allows the relevant paths to fail if they are needed on a platform where
this isn't possible, instead of having to make explicit checks of the
interface type.

Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Change since v1:
 * Made accesses to outer_lock_held READ_ONCE() for torn values in the failure
   case.
---
 drivers/resctrl/mpam_devices.c  |  3 ++-
 drivers/resctrl/mpam_internal.h | 38 +++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 910bb6cd5e4f..35011d3e8f1e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -738,6 +738,7 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 
 	mutex_init(&msc->probe_lock);
 	mutex_init(&msc->part_sel_lock);
+	mpam_mon_sel_lock_init(msc);
 	msc->id = pdev->id;
 	msc->pdev = pdev;
 	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
@@ -822,7 +823,7 @@ static void mpam_enable_once(void)
 				      "mpam:online");
 
 	/* Use printk() to avoid the pr_fmt adding the function name. */
-	printk(KERN_INFO, "MPAM enabled with %u PARTIDs and %u PMGs\n",
+	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
 	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9c08502e9c76..1afc52b36328 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -67,12 +67,50 @@ struct mpam_msc {
 	 */
 	struct mutex		part_sel_lock;
 
+	/*
+	 * mon_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+	 * Access to mon_sel is needed from both process and interrupt contexts,
+	 * but is complicated by firmware-backed platforms that can't make any
+	 * access unless they can sleep.
+	 * Always use the mpam_mon_sel_lock() helpers.
+	 * Accesses to mon_sel need to be able to fail if they occur in the wrong
+	 * context.
+	 * If needed, take msc->probe_lock first.
+	 */
+	raw_spinlock_t		_mon_sel_lock;
+	unsigned long		_mon_sel_flags;
+
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
 
 	struct mpam_garbage	garbage;
 };
 
+/* Returning false here means accesses to mon_sel must fail and report an error. */
+static inline bool __must_check mpam_mon_sel_lock(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(msc->iface != MPAM_IFACE_MMIO);
+
+	raw_spin_lock_irqsave(&msc->_mon_sel_lock, msc->_mon_sel_flags);
+	return true;
+}
+
+static inline void mpam_mon_sel_unlock(struct mpam_msc *msc)
+{
+	raw_spin_unlock_irqrestore(&msc->_mon_sel_lock, msc->_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+	lockdep_assert_held_once(&msc->_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
+{
+	raw_spin_lock_init(&msc->_mon_sel_lock);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (11 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 17:47   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Dave Martin

Expand the probing support with the control and monitor types
we can use with resctrl.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Moved some feature enum values that resctrl doesn't support to a later
   patch.
 * Swapped mpam_has_feature() out for the macro version that is used later.

Changes since v1:
 * added an underscore to a variable name.

Changes since RFC:
 * Made mpam_ris_hw_probe_hw_nrdy() more in C.
 * Added static assert on features bitmap size.
---
 drivers/resctrl/mpam_devices.c  | 147 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  33 +++++++
 2 files changed, 180 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 35011d3e8f1e..80c27c84dccc 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -142,6 +142,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
 }
 #define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
 
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	mpam_mon_sel_lock_held(msc);
+	return __mpam_read_reg(msc, reg);
+}
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	mpam_mon_sel_lock_held(msc);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_monsel_reg(msc, reg, val)   _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
 static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 {
 	u64 idr_high = 0, idr_low;
@@ -544,6 +558,133 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
 	return ERR_PTR(-ENOENT);
 }
 
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris *ris, u32 mon_reg)
+{
+	u32 now;
+	u64 mon_sel;
+	bool can_set, can_clear;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+		return false;
+
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	_mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+	_mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_set = now & MSMON___NRDY;
+
+	_mpam_write_monsel_reg(msc, mon_reg, 0);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_clear = !(now & MSMON___NRDY);
+	mpam_mon_sel_unlock(msc);
+
+	return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg)			\
+	_mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+	int err;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct device *dev = &msc->pdev->dev;
+	struct mpam_props *props = &ris->props;
+
+	lockdep_assert_held(&msc->probe_lock);
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	/* Cache Portion partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+		props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+		if (props->cpbm_wd)
+			mpam_set_feature(mpam_feat_cpor_part, props);
+	}
+
+	/* Memory bandwidth partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+		u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+		/* portion bitmap resolution */
+		props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+		if (props->mbw_pbm_bits &&
+		    FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_part, props);
+
+		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_max, props);
+	}
+
+	/* Performance Monitoring */
+	if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+		u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+		/*
+		 * If the firmware max-nrdy-us property is missing, the
+		 * CSU counters can't be used. Should we wait forever?
+		 */
+		err = device_property_read_u32(&msc->pdev->dev,
+					       "arm,not-ready-us",
+					       &msc->nrdy_usec);
+
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+			u32 csumonidr;
+
+			csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+			props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+			if (props->num_csu_mon) {
+				bool hw_managed;
+
+				mpam_set_feature(mpam_feat_msmon_csu, props);
+
+				/* Is NRDY hardware managed? */
+				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+				if (hw_managed)
+					mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+			}
+
+			/*
+			 * Accept the missing firmware property if NRDY appears
+			 * un-implemented.
+			 */
+			if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
+		}
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+			bool hw_managed;
+			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
+			if (props->num_mbwu_mon)
+				mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+			/* Is NRDY hardware managed? */
+			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+			if (hw_managed)
+				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+			/*
+			 * Don't warn about any missing firmware property for
+			 * MBWU NRDY - it doesn't make any sense!
+			 */
+		}
+	}
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
@@ -587,6 +728,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&mpam_list_lock);
 		if (IS_ERR(ris))
 			return PTR_ERR(ris);
+		ris->idr = idr;
+
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		mpam_ris_hw_probe(ris);
+		mutex_unlock(&msc->part_sel_lock);
 	}
 
 	spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1afc52b36328..be9ea0aab6d2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/bitmap.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
 #include <linux/llist.h>
@@ -13,6 +14,7 @@
 #include <linux/sizes.h>
 #include <linux/spinlock.h>
 #include <linux/srcu.h>
+#include <linux/types.h>
 
 #define MPAM_MSC_MAX_NUM_RIS	16
 
@@ -111,6 +113,33 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
 	raw_spin_lock_init(&msc->_mon_sel_lock);
 }
 
+/* Bits for mpam features bitmaps */
+enum mpam_device_features {
+	mpam_feat_cpor_part = 0,
+	mpam_feat_mbw_part,
+	mpam_feat_mbw_min,
+	mpam_feat_mbw_max,
+	mpam_feat_msmon,
+	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_hw_nrdy,
+	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_hw_nrdy,
+	MPAM_FEATURE_LAST
+};
+
+struct mpam_props {
+	DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
+
+	u16			cpbm_wd;
+	u16			mbw_pbm_bits;
+	u16			bwa_wd;
+	u16			num_csu_mon;
+	u16			num_mbwu_mon;
+};
+
+#define mpam_has_feature(_feat, x)	test_bit(_feat, (x)->features)
+#define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -150,6 +179,8 @@ struct mpam_vmsc {
 	/* mpam_msc_ris in this vmsc */
 	struct list_head	ris;
 
+	struct mpam_props	props;
+
 	/* All RIS in this vMSC are members of this MSC */
 	struct mpam_msc		*msc;
 
@@ -161,6 +192,8 @@ struct mpam_vmsc {
 
 struct mpam_msc_ris {
 	u8			ris_idx;
+	u64			idr;
+	struct mpam_props	props;
 
 	cpumask_t		affinity;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (12 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-17 18:56 ` [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks James Morse
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.

Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.

If bitmap properties are mismatched within a component we
cannot support the mismatched feature.

Care has to be taken as vMSC may hold mismatched RIS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Removed __func_ from pr_debug messages, pr_fmt has this covered.
 * Made a few debug messages used dev_dvg.
 * Dropped paranoia around empty vmsc/component lists.
 * Reworded comment describing the feature merging to state why the order
   matters and which helpers do what.
---
 drivers/resctrl/mpam_devices.c  | 214 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   3 +
 2 files changed, 217 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 80c27c84dccc..e150f4a0bfcd 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -956,8 +956,222 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_mbw_min, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_max, props))
+		return true;
+	return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
+	helper(parent) &&						\
+	((helper(child) && (parent)->field != (child)->field) ||	\
+	 (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias)		     \
+	mpam_has_feature((feat), (parent)) &&				     \
+	((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+	 (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias)			\
+	(alias) && !mpam_has_feature((feat), (parent)) &&		\
+	mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+			     struct mpam_props *child, bool alias)
+{
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+		parent->cpbm_wd = child->cpbm_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+				   cpbm_wd, alias)) {
+		pr_debug("cleared cpor_part\n");
+		mpam_clear_feature(mpam_feat_cpor_part, parent);
+		parent->cpbm_wd = 0;
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+		parent->mbw_pbm_bits = child->mbw_pbm_bits;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+				   mbw_pbm_bits, alias)) {
+		pr_debug("cleared mbw_part\n");
+		mpam_clear_feature(mpam_feat_mbw_part, parent);
+		parent->mbw_pbm_bits = 0;
+	}
+
+	/* bwa_wd is a count of bits, fewer bits means less precision */
+	if (alias && !mpam_has_bwa_wd_feature(parent) &&
+	    mpam_has_bwa_wd_feature(child)) {
+		parent->bwa_wd = child->bwa_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+				     bwa_wd, alias)) {
+		pr_debug("took the min bwa_wd\n");
+		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+	}
+
+	/* For num properties, take the minimum */
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+		parent->num_csu_mon = child->num_csu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+				   num_csu_mon, alias)) {
+		pr_debug("took the min num_csu_mon\n");
+		parent->num_csu_mon = min(parent->num_csu_mon,
+					  child->num_csu_mon);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+		parent->num_mbwu_mon = child->num_mbwu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+				   num_mbwu_mon, alias)) {
+		pr_debug("took the min num_mbwu_mon\n");
+		parent->num_mbwu_mon = min(parent->num_mbwu_mon,
+					   child->num_mbwu_mon);
+	}
+
+	if (alias) {
+		/* Merge features for aliased resources */
+		bitmap_or(parent->features, parent->features, child->features, MPAM_FEATURE_LAST);
+	} else {
+		/* Clear missing features for non aliasing */
+		bitmap_and(parent->features, parent->features, child->features, MPAM_FEATURE_LAST);
+	}
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+	struct mpam_props *cprops = &class->props;
+	struct mpam_props *vprops = &vmsc->props;
+	struct device *dev = &vmsc->msc->pdev->dev;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+	dev_dbg(dev, "Merging features for class:0x%lx &= vmsc:0x%lx\n",
+		(long)cprops->features, (long)vprops->features);
+
+	/* Take the safe value for any common features */
+	__props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_props *vprops = &vmsc->props;
+	struct device *dev = &vmsc->msc->pdev->dev;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+	dev_dbg(dev, "Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+		(long)vprops->features, (long)rprops->features);
+
+	/*
+	 * Merge mismatched features - Copy any features that aren't common,
+	 * but take the safe value for any common features.
+	 */
+	__props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_component *comp;
+
+	comp = list_first_entry(&class->components,
+				struct mpam_component, class_list);
+	vmsc = list_first_entry(&comp->vmsc,
+				struct mpam_vmsc, comp_list);
+
+	class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			__vmsc_props_mismatch(vmsc, ris);
+			class->nrdy_usec = max(class->nrdy_usec,
+					       vmsc->msc->nrdy_usec);
+		}
+	}
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+		__class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together by mpam_enable_merge_vmsc_features()
+ * as the first step so that mpam_enable_init_class_features() can initialise
+ * the class with a representive set of features.
+ * Next the mpam_enable_merge_class_features() bitwise-and's all the vmsc
+ * features to form the class features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, all_classes_list, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_vmsc_features(comp);
+
+		mpam_enable_init_class_features(class);
+
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_class_features(comp);
+	}
+}
+
 static void mpam_enable_once(void)
 {
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+	mutex_unlock(&mpam_list_lock);
+
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
 	 * longer change.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index be9ea0aab6d2..39331d81c481 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -139,6 +139,7 @@ struct mpam_props {
 
 #define mpam_has_feature(_feat, x)	test_bit(_feat, (x)->features)
 #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
+#define mpam_clear_feature(_feat, x)	clear_bit(_feat, (x)->features)
 
 struct mpam_class {
 	/* mpam_components in this class */
@@ -146,6 +147,8 @@ struct mpam_class {
 
 	cpumask_t		affinity;
 
+	struct mpam_props	props;
+	u32			nrdy_usec;
 	u8			level;
 	enum mpam_class_types	type;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (13 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 17:52   ` Jonathan Cameron
  2025-10-29  6:53   ` Shaopeng Tan (Fujitsu)
  2025-10-17 18:56 ` [PATCH v3 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
                   ` (15 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Rohit Mathew

When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtied. e.g. Kexec.

Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.

MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.

If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.

To reset, write the maximum values for all discovered controls.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Dropped srcu lockdep assert, the list_for_each helper has this covered.
 * removed a space from the patch subject
 * use guard lock/unlock for srcu in online/offline calls.
 * Remove mpam_assert_srcu_read_lock_held() and drop usage next to the list
   walker.
 * Fixed off by one in mpam_reset_ris()

Changes since RFC:
 * Last bitmap write will always be non-zero.
 * Dropped READ_ONCE() - the value can no longer change.
 * Write 0 to proporitional stride, remove the bwa_fract variable.
 * Removed nested srcu lock, the assert should cover it.
---
 drivers/resctrl/mpam_devices.c  | 109 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   3 +
 2 files changed, 112 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e150f4a0bfcd..02709b4ae9d4 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/bitfield.h>
+#include <linux/bitmap.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -746,8 +747,104 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+	u32 num_words, msb;
+	u32 bm = ~0;
+	int i;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	if (wd == 0)
+		return;
+
+	/*
+	 * Write all ~0 to all but the last 32bit-word, which may
+	 * have fewer bits...
+	 */
+	num_words = DIV_ROUND_UP(wd, 32);
+	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+		__mpam_write_reg(msc, reg, bm);
+
+	/*
+	 * ....and then the last (maybe) partial 32bit word. When wd is a
+	 * multiple of 32, msb should be 31 to write a full 32bit word.
+	 */
+	msb = (wd - 1) % 32;
+	bm = GENMASK(msb, 0);
+	__mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *rprops = &ris->props;
+
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+
+	mutex_lock(&msc->part_sel_lock);
+	__mpam_part_sel(ris->ris_idx, partid, msc);
+
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+		mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+
+	mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+	u16 partid, partid_max;
+
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+
+	if (ris->in_reset_state)
+		return;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid < partid_max + 1; partid++)
+		mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+	struct mpam_msc_ris *ris;
+
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+		mpam_reset_ris(ris);
+
+		/*
+		 * Set in_reset_state when coming online. The reset state
+		 * for non-zero partid may be lost while the CPUs are offline.
+		 */
+		ris->in_reset_state = online;
+	}
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
+	struct mpam_msc *msc;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_fetch_inc(&msc->online_refs) == 0)
+			mpam_reset_msc(msc, true);
+	}
+
 	return 0;
 }
 
@@ -786,6 +883,18 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 
 static int mpam_cpu_offline(unsigned int cpu)
 {
+	struct mpam_msc *msc;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_dec_and_test(&msc->online_refs))
+			mpam_reset_msc(msc, false);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 39331d81c481..9f062dd5a0bb 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/atomic.h>
 #include <linux/bitmap.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
@@ -46,6 +47,7 @@ struct mpam_msc {
 	enum mpam_msc_iface	iface;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	atomic_t		online_refs;
 
 	/*
 	 * probe_lock is only taken during discovery. After discovery these
@@ -197,6 +199,7 @@ struct mpam_msc_ris {
 	u8			ris_idx;
 	u64			idr;
 	struct mpam_props	props;
+	bool			in_reset_state;
 
 	cpumask_t		affinity;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 16/29] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (14 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-17 18:56 ` [PATCH v3 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.

Add a helper that schedules the provided function if necessary.

Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
 drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 02709b4ae9d4..ec089593acad 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -800,20 +800,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 	mutex_unlock(&msc->part_sel_lock);
 }
 
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
+	struct mpam_msc_ris *ris = arg;
 
 	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
 
 	if (ris->in_reset_state)
-		return;
+		return 0;
 
 	spin_lock(&partid_max_lock);
 	partid_max = mpam_partid_max;
 	spin_unlock(&partid_max_lock);
 	for (partid = 0; partid < partid_max + 1; partid++)
 		mpam_reset_ris_partid(ris, partid);
+
+	return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (cpumask_test_cpu(cpu, &msc->accessibility))
+		return cpu;
+
+	return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+	lockdep_assert_irqs_enabled();
+	lockdep_assert_cpus_held();
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+
+	return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
 }
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -821,7 +852,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	struct mpam_msc_ris *ris;
 
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
-		mpam_reset_ris(ris);
+		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
 		/*
 		 * Set in_reset_state when coming online. The reset state
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (15 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 15:14   ` Ben Horgan
  2025-10-17 18:56 ` [PATCH v3 18/29] arm_mpam: Register and enable IRQs James Morse
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.

Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Reduced the scop of arguments in mpam_reset_component_locked().

Changes since v1:
 * more complete use of _srcu helpers.
 * Use guard macro for srcu.
 * Dropped a might_sleep() - something else will bark.
---
 drivers/resctrl/mpam_devices.c | 58 ++++++++++++++++++++++++++++++++--
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index ec089593acad..545482e112b7 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -802,15 +802,13 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
- * pre-emptible.
+ * pre-emptible. Caller must hold mpam_srcu.
  */
 static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
 
-	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
-
 	if (ris->in_reset_state)
 		return 0;
 
@@ -1328,8 +1326,56 @@ static void mpam_enable_once(void)
 	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		struct mpam_msc *msc = vmsc->msc;
+		struct mpam_msc_ris *ris;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			if (!ris->in_reset_state)
+				mpam_touch_msc(msc, mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+		}
+	}
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(comp, &class->components, class_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_component_locked(comp);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+	cpus_read_lock();
+	mpam_reset_class_locked(class);
+	cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
 void mpam_disable(struct work_struct *ignored)
 {
+	int idx;
+	struct mpam_class *class;
 	struct mpam_msc *msc, *tmp;
 
 	mutex_lock(&mpam_cpuhp_state_lock);
@@ -1339,6 +1385,12 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_class(class);
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	mutex_lock(&mpam_list_lock);
 	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
 		mpam_msc_destroy(msc);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 18/29] arm_mpam: Register and enable IRQs
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (16 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 18:03   ` Jonathan Cameron
  2025-10-29  7:02   ` Shaopeng Tan (Fujitsu)
  2025-10-17 18:56 ` [PATCH v3 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
                   ` (12 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.

Only the irq handler accesses the MPAMF_ESR register, so no locking is
needed. The work to disable MPAM after an error needs to happen at process
context as it takes mutex. It also unregisters the interrupts, meaning
it can't be done from the threaded part of a threaded interrupt.
Instead, mpam_disable() gets scheduled.

Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.

Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.

CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Assign mpam_errcode_names[] from their preprocessor defines.
 * Removed overlapping ppi case that can't happen without ppi partitions.
 * Renamed zero/clear.
 * Convert atomic bitmaps to use mutex and bools.
 * Added 'disable reason' for mpam_enable().

Changes since v1:
 * Made mpam_unregister_irqs() safe to race with itself.
 * Removed threaded interrupts.
 * Schedule mpam_disable() from cpuhp callback in the case of an error.
 * Added mpam_disable_reason.
 * Use alloc_percpu()

Changes since RFC:
 * Use guard marco when walking srcu list.
 * Use INTEN macro for enabling interrupts.
 * Move partid_max_published up earlier in mpam_enable_once().
---
 drivers/resctrl/mpam_devices.c  | 283 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  13 ++
 2 files changed, 293 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 545482e112b7..f18a22f825a0 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
 #include <linux/list.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
@@ -170,6 +173,34 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 	return (idr_high << 32) | idr_low;
 }
 
+static void mpam_msc_clear_esr(struct mpam_msc *msc)
+{
+	u64 esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+	if (!esr_low)
+		return;
+
+	/*
+	 * Clearing the high/low bits of MPAMF_ESR can not be atomic.
+	 * Clear the top half first, so that the pending error bits in the
+	 * lower half prevent hardware from updating either half of the
+	 * register.
+	 */
+	if (msc->has_extd_esr)
+		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+	__mpam_write_reg(msc, MPAMF_ESR, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+	u64 esr_high = 0, esr_low;
+
+	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+	if (msc->has_extd_esr)
+		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+	return (esr_high << 32) | esr_low;
+}
+
 static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
 {
 	lockdep_assert_held(&msc->part_sel_lock);
@@ -723,6 +754,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
 		msc->partid_max = min(msc->partid_max, partid_max);
 		msc->pmg_max = min(msc->pmg_max, pmg_max);
+		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
 
 		mutex_lock(&mpam_list_lock);
 		ris = mpam_get_or_create_ris(msc, ris_idx);
@@ -737,6 +769,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&msc->part_sel_lock);
 	}
 
+	/* Clear any stale errors */
+	mpam_msc_clear_esr(msc);
+
 	spin_lock(&partid_max_lock);
 	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
 	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -860,6 +895,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	}
 }
 
+static void _enable_percpu_irq(void *_irq)
+{
+	int *irq = _irq;
+
+	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
 	struct mpam_msc *msc;
@@ -870,6 +912,9 @@ static int mpam_cpu_online(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			_enable_percpu_irq(&msc->reenable_error_ppi);
+
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
 			mpam_reset_msc(msc, true);
 	}
@@ -920,6 +965,9 @@ static int mpam_cpu_offline(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			disable_percpu_irq(msc->reenable_error_ppi);
+
 		if (atomic_dec_and_test(&msc->online_refs))
 			mpam_reset_msc(msc, false);
 	}
@@ -946,6 +994,42 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
 	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
+static int __setup_ppi(struct mpam_msc *msc)
+{
+	int cpu;
+
+	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
+	if (!msc->error_dev_id)
+		return -ENOMEM;
+
+	for_each_cpu(cpu, &msc->accessibility)
+		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+
+	return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+	int irq;
+
+	irq = platform_get_irq_byname_optional(msc->pdev, "error");
+	if (irq <= 0)
+		return 0;
+
+	/* Allocate and initialise the percpu device pointer for PPI */
+	if (irq_is_percpu(irq))
+		return __setup_ppi(msc);
+
+	/* sanity check: shared interrupts can be routed anywhere? */
+	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+		pr_err_once("msc:%u is a private resource with a shared error interrupt",
+			    msc->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -1023,6 +1107,7 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 
 	mutex_init(&msc->probe_lock);
 	mutex_init(&msc->part_sel_lock);
+	mutex_init(&msc->error_irq_lock);
 	mpam_mon_sel_lock_init(msc);
 	msc->id = pdev->id;
 	msc->pdev = pdev;
@@ -1037,6 +1122,10 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 		return ERR_PTR(-EINVAL);
 	}
 
+	err = mpam_msc_setup_error_irq(msc);
+	if (err)
+		return ERR_PTR(err);
+
 	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
 		msc->iface = MPAM_IFACE_MMIO;
 	else
@@ -1304,11 +1393,176 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
 	}
 }
 
+static char *mpam_errcode_names[16] = {
+	[MPAM_ERRCODE_NONE] 			= "No error",
+	[MPAM_ERRCODE_PARTID_SEL_RANGE]		= "PARTID_SEL_Range",
+	[MPAM_ERRCODE_REQ_PARTID_RANGE]		= "Req_PARTID_Range",
+	[MPAM_ERRCODE_MSMONCFG_ID_RANGE]	= "MSMONCFG_ID_RANGE",
+	[MPAM_ERRCODE_REQ_PMG_RANGE]		= "Req_PMG_Range",
+	[MPAM_ERRCODE_MONITOR_RANGE]		= "Monitor_Range",
+	[MPAM_ERRCODE_INTPARTID_RANGE]		= "intPARTID_Range",
+	[MPAM_ERRCODE_UNEXPECTED_INTERNAL]	= "Unexpected_INTERNAL",
+	[MPAM_ERRCODE_UNDEFINED_RIS_PART_SEL]	= "Undefined_RIS_PART_SEL",
+	[MPAM_ERRCODE_RIS_NO_CONTROL]		= "RIS_No_Control",
+	[MPAM_ERRCODE_UNDEFINED_RIS_MON_SEL]	= "Undefined_RIS_MON_SEL",
+	[MPAM_ERRCODE_RIS_NO_MONITOR]		= "RIS_No_Monitor",
+	[12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+	return 0;
+}
+
+/* This can run in mpam_disable(), and the interrupt handler on the same CPU */
+static int mpam_disable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, 0);
+
+	return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+	u64 reg;
+	u16 partid;
+	u8 errcode, pmg, ris;
+
+	if (WARN_ON_ONCE(!msc) ||
+	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &msc->accessibility)))
+		return IRQ_NONE;
+
+	reg = mpam_msc_read_esr(msc);
+
+	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+	if (!errcode)
+		return IRQ_NONE;
+
+	/* Clear level triggered irq */
+	mpam_msc_clear_esr(msc);
+
+	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+	pr_err_ratelimited("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+			   msc->id, mpam_errcode_names[errcode], partid, pmg,
+			   ris);
+
+	/* Disable this interrupt. */
+	mpam_disable_msc_ecr(msc);
+
+	/*
+	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
+	 * unregister the interrupt from the threaded part of the handler.
+	 */
+	mpam_disable_reason = "hardware error interrupt";
+	schedule_work(&mpam_broken_work);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static int mpam_register_irqs(void)
+{
+	int err, irq;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+		/* We anticipate sharing the interrupt with other MSCs */
+		if (irq_is_percpu(irq)) {
+			err = request_percpu_irq(irq, &mpam_ppi_handler,
+						 "mpam:msc:error",
+						 msc->error_dev_id);
+			if (err)
+				return err;
+
+			msc->reenable_error_ppi = irq;
+			smp_call_function_many(&msc->accessibility,
+					       &_enable_percpu_irq, &irq,
+					       true);
+		} else {
+			err = devm_request_irq(&msc->pdev->dev,irq,
+					       &mpam_spi_handler, IRQF_SHARED,
+					       "mpam:msc:error", msc);
+			if (err)
+				return err;
+		}
+
+		mutex_lock(&msc->error_irq_lock);
+		msc->error_irq_req = true;
+		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+		msc->error_irq_hw_enabled = true;
+		mutex_unlock(&msc->error_irq_lock);
+	}
+
+	return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+	int irq;
+	struct mpam_msc *msc;
+
+	guard(cpus_read_lock)();
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		mutex_lock(&msc->error_irq_lock);
+		if (msc->error_irq_hw_enabled) {
+			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+			msc->error_irq_hw_enabled = false;
+		}
+
+		if (msc->error_irq_req) {
+			if (irq_is_percpu(irq)) {
+				msc->reenable_error_ppi = 0;
+				free_percpu_irq(irq, msc->error_dev_id);
+			} else {
+				devm_free_irq(&msc->pdev->dev, irq, msc);
+			}
+			msc->error_irq_req = false;
+		}
+		mutex_unlock(&msc->error_irq_lock);
+	}
+}
+
 static void mpam_enable_once(void)
 {
-	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
-	mutex_unlock(&mpam_list_lock);
+	int err;
 
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
@@ -1318,6 +1572,27 @@ static void mpam_enable_once(void)
 	partid_max_published = true;
 	spin_unlock(&partid_max_lock);
 
+	/*
+	 * If all the MSC have been probed, enabling the IRQs happens next.
+	 * That involves cross-calling to a CPU that can reach the MSC, and
+	 * the locks must be taken in this order:
+	 */
+	cpus_read_lock();
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+
+	err = mpam_register_irqs();
+
+	mutex_unlock(&mpam_list_lock);
+	cpus_read_unlock();
+
+	if (err) {
+		pr_warn("Failed to register irqs: %d\n", err);
+		mpam_disable_reason = "Failed to enable.";
+		schedule_work(&mpam_broken_work);
+		return;
+	}
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
 
@@ -1385,6 +1660,8 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	mpam_unregister_irqs();
+
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
 				 srcu_read_lock_held(&mpam_srcu))
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f062dd5a0bb..a04b09abd814 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -47,6 +47,11 @@ struct mpam_msc {
 	enum mpam_msc_iface	iface;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	bool			has_extd_esr;
+
+	int				reenable_error_ppi;
+	struct mpam_msc * __percpu	*error_dev_id;
+
 	atomic_t		online_refs;
 
 	/*
@@ -60,6 +65,14 @@ struct mpam_msc {
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
+	/*
+	 * error_irq_lock is taken when registering/unregistering the error
+	 * interrupt and maniupulating the below flags.
+	 */
+	struct mutex		error_irq_lock;
+	bool			error_irq_req;
+	bool			error_irq_hw_enabled;
+
 	/* mpam_msc_ris of this component */
 	struct list_head	ris;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (17 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 18/29] arm_mpam: Register and enable IRQs James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 16:28   ` Ben Horgan
  2025-10-17 18:56 ` [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.

After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.

Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in response to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Removed the word 'TODO'.
 * Fixed a typo in the commit message.
---
 drivers/resctrl/mpam_devices.c  | 12 ++++++++++++
 drivers/resctrl/mpam_internal.h |  9 +++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index f18a22f825a0..ab37ed1fb5de 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,6 +29,8 @@
 
 #include "mpam_internal.h"
 
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
+
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
@@ -929,6 +931,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 	struct mpam_msc *msc;
 	bool new_device_probed = false;
 
+	if (mpam_is_enabled())
+		return 0;
+
 	guard(srcu)(&mpam_srcu);
 	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
 				 srcu_read_lock_held(&mpam_srcu)) {
@@ -1459,6 +1464,10 @@ static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
 	/* Disable this interrupt. */
 	mpam_disable_msc_ecr(msc);
 
+	/* Are we racing with the thread disabling MPAM? */
+	if (!mpam_is_enabled())
+		return IRQ_HANDLED;
+
 	/*
 	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
 	 * unregister the interrupt from the threaded part of the handler.
@@ -1593,6 +1602,7 @@ static void mpam_enable_once(void)
 		return;
 	}
 
+	static_branch_enable(&mpam_enabled);
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
 
@@ -1660,6 +1670,8 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	static_branch_disable(&mpam_enabled);
+
 	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a04b09abd814..d492df9a1735 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -9,6 +9,7 @@
 #include <linux/bitmap.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/jump_label.h>
 #include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
@@ -19,8 +20,16 @@
 
 #define MPAM_MSC_MAX_NUM_RIS	16
 
+
 struct platform_device;
 
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+	return static_branch_likely(&mpam_enabled);
+}
+
 /*
  * Structures protected by SRCU may not be freed for a surprising amount of
  * time (especially if perf is running). To ensure the MPAM error interrupt can
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (18 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 17:04   ` Ben Horgan
                     ` (2 more replies)
  2025-10-17 18:56 ` [PATCH v3 21/29] arm_mpam: Probe and reset the rest of the features James Morse
                   ` (10 subsequent siblings)
  30 siblings, 3 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Dave Martin, Ben Horgan

When CPUs come online the MSC's original configuration should be restored.

Add struct mpam_config to hold the configuration. This has a bitmap of
features that were modified. Once the maximum partid is known, allocate
a configuration array for each component, and reprogram each RIS
configuration from this.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Call mpam_init_reset_cfg() on alloated config as 0 is not longer correct.
 * init_garbage() on each config - the array has to be freed in one go, but
   otherwise this looks weird.
 * Use struct initialiser in mpam_init_reset_cfg(),
 * Moved int err definition.
 * Removed srcu lock taking based on squinting at the only caller.
 * Moved config reset to mpam_reset_component_cfg() for re-use in
   mpam_reset_component_locked(), previous memset() was not enough since zero
   no longer means reset.

Changes since v1:
 * Switched entry_rcu to srcu versions.

Changes since RFC:
 * Added a comment about the ordering around max_partid.
 * Allocate configurations after interrupts are registered to reduce churn.
 * Added mpam_assert_partid_sizes_fixed();
 * Make reset use an all-ones instead of zero config.
---
 drivers/resctrl/mpam_devices.c  | 284 +++++++++++++++++++++++++++++---
 drivers/resctrl/mpam_internal.h |  23 +++
 2 files changed, 287 insertions(+), 20 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index ab37ed1fb5de..e990ef67df5b 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -118,6 +118,17 @@ static inline void init_garbage(struct mpam_garbage *garbage)
 {
 	init_llist_node(&garbage->llist);
 }
+
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+	WARN_ON_ONCE(!partid_max_published);
+}
+
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
 	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
@@ -366,12 +377,16 @@ static void mpam_class_destroy(struct mpam_class *class)
 	add_to_garbage(class);
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp);
+
 static void mpam_comp_destroy(struct mpam_component *comp)
 {
 	struct mpam_class *class = comp->class;
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	__destroy_component_cfg(comp);
+
 	list_del_rcu(&comp->class_list);
 	add_to_garbage(comp);
 
@@ -812,48 +827,102 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 	__mpam_write_reg(msc, reg, bm);
 }
 
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+				      struct mpam_config *cfg)
 {
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
 
-	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
-
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
-	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
+	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
+		if (cfg->reset_cpbm)
+			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+					      rprops->cpbm_wd);
+		else
+			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_part, cfg)) {
+		if (cfg->reset_mbw_pbm)
+			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+					      rprops->mbw_pbm_bits);
+		else
+			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_min, cfg))
 		mpam_write_partsel_reg(msc, MBW_MIN, 0);
 
-	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
-		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_max, cfg))
+		mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
 
 	mutex_unlock(&msc->part_sel_lock);
 }
 
+struct reprogram_ris {
+	struct mpam_msc_ris *ris;
+	struct mpam_config *cfg;
+};
+
+/* Call with MSC lock held */
+static int mpam_reprogram_ris(void *_arg)
+{
+	u16 partid, partid_max;
+	struct reprogram_ris *arg = _arg;
+	struct mpam_msc_ris *ris = arg->ris;
+	struct mpam_config *cfg = arg->cfg;
+
+	if (ris->in_reset_state)
+		return 0;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid <= partid_max + 1; partid++)
+		mpam_reprogram_ris_partid(ris, partid, cfg);
+
+	return 0;
+}
+
+static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
+{
+	*reset_cfg = (struct mpam_config) {
+		.cpbm = ~0,
+		.mbw_pbm = ~0,
+		.mbw_max = MPAMCFG_MBW_MAX_MAX,
+
+		.reset_cpbm = true,
+		.reset_mbw_pbm = true,
+	};
+	bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible. Caller must hold mpam_srcu.
  */
 static int mpam_reset_ris(void *arg)
 {
-	u16 partid, partid_max;
+	struct mpam_config reset_cfg;
 	struct mpam_msc_ris *ris = arg;
+	struct reprogram_ris reprogram_arg;
 
 	if (ris->in_reset_state)
 		return 0;
 
-	spin_lock(&partid_max_lock);
-	partid_max = mpam_partid_max;
-	spin_unlock(&partid_max_lock);
-	for (partid = 0; partid < partid_max + 1; partid++)
-		mpam_reset_ris_partid(ris, partid);
+	mpam_init_reset_cfg(&reset_cfg);
+
+	reprogram_arg.ris = ris;
+	reprogram_arg.cfg = &reset_cfg;
+
+	mpam_reprogram_ris(&reprogram_arg);
 
 	return 0;
 }
@@ -897,6 +966,39 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	}
 }
 
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+	u16 partid;
+	bool reset;
+	struct mpam_config *cfg;
+	struct mpam_msc_ris *ris;
+
+	/*
+	 * No lock for mpam_partid_max as partid_max_published has been
+	 * set by mpam_enabled(), so the values can no longer change.
+	 */
+	mpam_assert_partid_sizes_fixed();
+
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!mpam_is_enabled() && !ris->in_reset_state) {
+			mpam_touch_msc(msc, &mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+			continue;
+		}
+
+		reset = true;
+		for (partid = 0; partid <= mpam_partid_max; partid++) {
+			cfg = &ris->vmsc->comp->cfg[partid];
+			if (!bitmap_empty(cfg->features, MPAM_FEATURE_LAST))
+				reset = false;
+
+			mpam_reprogram_ris_partid(ris, partid, cfg);
+		}
+		ris->in_reset_state = reset;
+	}
+}
+
 static void _enable_percpu_irq(void *_irq)
 {
 	int *irq = _irq;
@@ -918,7 +1020,7 @@ static int mpam_cpu_online(unsigned int cpu)
 			_enable_percpu_irq(&msc->reenable_error_ppi);
 
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
-			mpam_reset_msc(msc, true);
+			mpam_reprogram_msc(msc);
 	}
 
 	return 0;
@@ -1569,6 +1671,64 @@ static void mpam_unregister_irqs(void)
 	}
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+	add_to_garbage(comp->cfg);
+}
+
+static void mpam_reset_component_cfg(struct mpam_component *comp)
+{
+	int i;
+
+	mpam_assert_partid_sizes_fixed();
+
+	if (!comp->cfg)
+		return;
+
+	for (i = 0; i < mpam_partid_max + 1; i++)
+		mpam_init_reset_cfg(&comp->cfg[i]);
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+	mpam_assert_partid_sizes_fixed();
+
+	if (comp->cfg)
+		return 0;
+
+	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+	if (!comp->cfg)
+		return -ENOMEM;
+
+	/*
+	 * The array is free()d in one go, so only cfg[0]'s struture needs
+	 * to be initialised.
+	 */
+	init_garbage(&comp->cfg[0].garbage);
+
+	mpam_reset_component_cfg(comp);
+
+	return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list) {
+			int err = __allocate_component_cfg(comp);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
 static void mpam_enable_once(void)
 {
 	int err;
@@ -1588,15 +1748,25 @@ static void mpam_enable_once(void)
 	 */
 	cpus_read_lock();
 	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
+	do {
+		mpam_enable_merge_features(&mpam_classes);
 
-	err = mpam_register_irqs();
+		err = mpam_register_irqs();
+		if (err) {
+			pr_warn("Failed to register irqs: %d\n", err);
+			break;
+		}
 
+		err = mpam_allocate_config();
+		if (err) {
+			pr_err("Failed to allocate configuration arrays.\n");
+			break;
+		}
+	} while (0);
 	mutex_unlock(&mpam_list_lock);
 	cpus_read_unlock();
 
 	if (err) {
-		pr_warn("Failed to register irqs: %d\n", err);
 		mpam_disable_reason = "Failed to enable.";
 		schedule_work(&mpam_broken_work);
 		return;
@@ -1617,6 +1787,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
 	struct mpam_vmsc *vmsc;
 
 	lockdep_assert_cpus_held();
+	mpam_assert_partid_sizes_fixed();
+
+	mpam_reset_component_cfg(comp);
 
 	guard(srcu)(&mpam_srcu);
 	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
@@ -1717,6 +1890,77 @@ void mpam_enable(struct work_struct *work)
 		mpam_enable_once();
 }
 
+struct mpam_write_config_arg {
+	struct mpam_msc_ris *ris;
+	struct mpam_component *comp;
+	u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+	struct mpam_write_config_arg *c = arg;
+
+	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+	return 0;
+}
+
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+	if (mpam_has_feature(feature, newcfg) &&			\
+	    (newcfg)->member != (cfg)->member) {			\
+		(cfg)->member = (newcfg)->member;			\
+		mpam_set_feature(feature, cfg);				\
+									\
+		(changes) = true;					\
+	}								\
+} while (0)
+
+static bool mpam_update_config(struct mpam_config *cfg,
+			       const struct mpam_config *newcfg)
+{
+	bool has_changes = false;
+
+	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, has_changes);
+	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, has_changes);
+	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, has_changes);
+
+	return has_changes;
+}
+
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg)
+{
+	struct mpam_write_config_arg arg;
+	struct mpam_msc_ris *ris;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	/* Don't pass in the current config! */
+	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+	if (!mpam_update_config(&comp->cfg[partid], cfg))
+		return 0;
+
+	arg.comp = comp;
+	arg.partid = partid;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			arg.ris = ris;
+			mpam_touch_msc(msc, __write_config, &arg);
+		}
+	}
+
+	return 0;
+}
+
 static int __init mpam_msc_driver_init(void)
 {
 	if (!system_supports_mpam())
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d492df9a1735..2f2a7369107b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -182,6 +182,20 @@ struct mpam_class {
 	struct mpam_garbage	garbage;
 };
 
+struct mpam_config {
+	/* Which configuration values are valid. */
+	DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
+
+	u32	cpbm;
+	u32	mbw_pbm;
+	u16	mbw_max;
+
+	bool	reset_cpbm;
+	bool	reset_mbw_pbm;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_component {
 	u32			comp_id;
 
@@ -190,6 +204,12 @@ struct mpam_component {
 
 	cpumask_t		affinity;
 
+	/*
+	 * Array of configuration values, indexed by partid.
+	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+	 */
+	struct mpam_config	*cfg;
+
 	/* member of mpam_class:components */
 	struct list_head	class_list;
 
@@ -249,6 +269,9 @@ extern u8 mpam_pmg_max;
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
 
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 21/29] arm_mpam: Probe and reset the rest of the features
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (19 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-20 17:16   ` Ben Horgan
  2025-10-17 18:56 ` [PATCH v3 22/29] arm_mpam: Add helpers to allocate monitors James Morse
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Rohit Mathew, Zeng Heng, Dave Martin

MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.

Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.

PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Moved some enum definitions in here.
 * Whitespace.

Changes since v1:
 * Added reset for cassoc.
 * Added detection of CSU XCL.
---
 drivers/resctrl/mpam_devices.c  | 188 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  22 +++-
 2 files changed, 208 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e990ef67df5b..19cc87aba51a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -229,6 +229,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
 	__mpam_part_sel_raw(partsel, msc);
 }
 
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+		      MPAMCFG_PART_SEL_INTERNAL;
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
 int mpam_register_requestor(u16 partid_max, u8 pmg_max)
 {
 	guard(spinlock)(&partid_max_lock);
@@ -650,10 +659,34 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct device *dev = &msc->pdev->dev;
 	struct mpam_props *props = &ris->props;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	lockdep_assert_held(&msc->probe_lock);
 	lockdep_assert_held(&msc->part_sel_lock);
 
+	/* Cache Capacity Partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+		if (props->cmax_wd &&
+		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+		if (props->cassoc_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cassoc, props);
+	}
+
 	/* Cache Portion partitioning */
 	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
 		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -676,6 +709,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
 		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
 			mpam_set_feature(mpam_feat_mbw_max, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_min, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_prop, props);
+	}
+
+	/* Priority partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+		u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+		props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+		if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_intpri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+		}
+
+		props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+		if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_dspri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+		}
 	}
 
 	/* Performance Monitoring */
@@ -700,6 +758,9 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 
 				mpam_set_feature(mpam_feat_msmon_csu, props);
 
+				if (FIELD_GET(MPAMF_CSUMON_IDR_HAS_XCL, csumonidr))
+					mpam_set_feature(mpam_feat_msmon_csu_xcl, props);
+
 				/* Is NRDY hardware managed? */
 				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
 				if (hw_managed)
@@ -721,6 +782,9 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			if (props->num_mbwu_mon)
 				mpam_set_feature(mpam_feat_msmon_mbwu, props);
 
+			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
+				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
 			/* Is NRDY hardware managed? */
 			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
 			if (hw_managed)
@@ -732,6 +796,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			 */
 		}
 	}
+
+	/*
+	 * RIS with PARTID narrowing don't have enough storage for one
+	 * configuration per PARTID. If these are in a class we could use,
+	 * reduce the supported partid_max to match the number of intpartid.
+	 * If the class is unknown, just ignore it.
+	 */
+	if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+	    class->type != MPAM_CLASS_UNKNOWN) {
+		u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+		u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+		mpam_set_feature(mpam_feat_partid_nrw, props);
+		msc->partid_max = min(msc->partid_max, partid_max);
+	}
 }
 
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -831,12 +910,28 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 				      struct mpam_config *cfg)
 {
+	u32 pri_val = 0;
+	u16 cmax = MPAMCFG_CMAX_CMAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
+	u16 dspri = GENMASK(rprops->dspri_wd, 0);
+	u16 intpri = GENMASK(rprops->intpri_wd, 0);
 
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
+	if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+		/* Update the intpartid mapping */
+		mpam_write_partsel_reg(msc, INTPARTID,
+				       MPAMCFG_INTPARTID_INTERNAL | partid);
+
+		/*
+		 * Then switch to the 'internal' partid to update the
+		 * configuration.
+		 */
+		__mpam_intpart_sel(ris->ris_idx, partid, msc);
+	}
+
 	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
 	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
 		if (cfg->reset_cpbm)
@@ -863,6 +958,35 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 	    mpam_has_feature(mpam_feat_mbw_max, cfg))
 		mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
 
+	if (mpam_has_feature(mpam_feat_mbw_prop, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_prop, cfg))
+		mpam_write_partsel_reg(msc, MBW_PROP, 0);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+		mpam_write_partsel_reg(msc, CMAX, cmax);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+		mpam_write_partsel_reg(msc, CMIN, 0);
+
+	if (mpam_has_feature(mpam_feat_cmax_cassoc, rprops))
+		mpam_write_partsel_reg(msc, CASSOC, MPAMCFG_CASSOC_CASSOC);
+
+	if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+	    mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+		/* aces high? */
+		if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+			intpri = 0;
+		if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+			dspri = 0;
+
+		if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+		if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+		mpam_write_partsel_reg(msc, PRI, pri_val);
+	}
+
 	mutex_unlock(&msc->part_sel_lock);
 }
 
@@ -1297,6 +1421,18 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
 		return true;
 	if (mpam_has_feature(mpam_feat_mbw_max, props))
 		return true;
+	if (mpam_has_feature(mpam_feat_mbw_prop, props))
+		return true;
+	return false;
+}
+
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+		return true;
 	return false;
 }
 
@@ -1355,6 +1491,23 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
 	}
 
+	if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+		parent->cmax_wd = child->cmax_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+				     cmax_wd, alias)) {
+		pr_debug("%s took the min cmax_wd\n", __func__);
+		parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+		parent->cassoc_wd = child->cassoc_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+				   cassoc_wd, alias)) {
+		pr_debug("%s cleared cassoc_wd\n", __func__);
+		mpam_clear_feature(mpam_feat_cmax_cassoc, parent);
+		parent->cassoc_wd = 0;
+	}
+
 	/* For num properties, take the minimum */
 	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
 		parent->num_csu_mon = child->num_csu_mon;
@@ -1374,6 +1527,41 @@ static void __props_mismatch(struct mpam_props *parent,
 					   child->num_mbwu_mon);
 	}
 
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+		parent->intpri_wd = child->intpri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+				   intpri_wd, alias)) {
+		pr_debug("%s took the min intpri_wd\n", __func__);
+		parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+		parent->dspri_wd = child->dspri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+				   dspri_wd, alias)) {
+		pr_debug("%s took the min dspri_wd\n", __func__);
+		parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+	}
+
+	/* TODO: alias support for these two */
+	/* {int,ds}pri may not have differing 0-low behaviour */
+	if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+		pr_debug("%s cleared intpri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_intpri_part, parent);
+		mpam_clear_feature(mpam_feat_intpri_part_0_low, parent);
+	}
+	if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+		pr_debug("%s cleared dspri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_dspri_part, parent);
+		mpam_clear_feature(mpam_feat_dspri_part_0_low, parent);
+	}
+
 	if (alias) {
 		/* Merge features for aliased resources */
 		bitmap_or(parent->features, parent->features, child->features, MPAM_FEATURE_LAST);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 2f2a7369107b..00edee9ebc6c 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -139,16 +139,30 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
 
 /* Bits for mpam features bitmaps */
 enum mpam_device_features {
-	mpam_feat_cpor_part = 0,
+	mpam_feat_cmax_softlim,
+	mpam_feat_cmax_cmax,
+	mpam_feat_cmax_cmin,
+	mpam_feat_cmax_cassoc,
+	mpam_feat_cpor_part,
 	mpam_feat_mbw_part,
 	mpam_feat_mbw_min,
 	mpam_feat_mbw_max,
+	mpam_feat_mbw_prop,
+	mpam_feat_intpri_part,
+	mpam_feat_intpri_part_0_low,
+	mpam_feat_dspri_part,
+	mpam_feat_dspri_part_0_low,
 	mpam_feat_msmon,
 	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_capture,
+	mpam_feat_msmon_csu_xcl,
 	mpam_feat_msmon_csu_hw_nrdy,
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_capture,
+	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
-	MPAM_FEATURE_LAST
+	mpam_feat_partid_nrw,
+	MPAM_FEATURE_LAST,
 };
 
 struct mpam_props {
@@ -157,6 +171,10 @@ struct mpam_props {
 	u16			cpbm_wd;
 	u16			mbw_pbm_bits;
 	u16			bwa_wd;
+	u16			cmax_wd;
+	u16			cassoc_wd;
+	u16			intpri_wd;
+	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 22/29] arm_mpam: Add helpers to allocate monitors
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (20 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 21/29] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-17 18:56 ` [PATCH v3 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Use ida_alloc_max() instead of ida_alloc_range().
---
 drivers/resctrl/mpam_devices.c  |  2 ++
 drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 19cc87aba51a..a29f97cd176a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -350,6 +350,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
 	class->level = level_idx;
 	class->type = type;
 	INIT_LIST_HEAD_RCU(&class->classes_list);
+	ida_init(&class->ida_csu_mon);
+	ida_init(&class->ida_mbwu_mon);
 
 	list_add_rcu(&class->classes_list, &mpam_classes);
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 00edee9ebc6c..96a02ea95583 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -197,6 +197,9 @@ struct mpam_class {
 	/* member of mpam_classes */
 	struct list_head	classes_list;
 
+	struct ida		ida_csu_mon;
+	struct ida		ida_mbwu_mon;
+
 	struct mpam_garbage	garbage;
 };
 
@@ -275,6 +278,38 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_max(&class->ida_csu_mon, cprops->num_csu_mon - 1,
+			     GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+	ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_max(&class->ida_mbwu_mon, cprops->num_mbwu_mon - 1,
+			     GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+	ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
 /* List of all classes - protected by srcu*/
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (21 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 22/29] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 18:18   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
                   ` (7 subsequent siblings)
  30 siblings, 1 reply; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.

Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.

CC: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Fixed ctl_val/flt_val assignment that led to always reading counter 0.
 * switch to using guard() version of srcu_read_lock()
 * Fixed use of rcu helpers when srcu is wanted.
 * Use return instead of break.
 * Moved variable declarations into the loop.
 * Use struct assignment instead of memset().
 * Whitespace.

Changes since v1:
 * Added XCL support.
 * Merged FLT/CTL constants.
 * a spelling mistake in a comment.
 * moved structrues around.
---
 drivers/resctrl/mpam_devices.c  | 233 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  19 +++
 2 files changed, 252 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a29f97cd176a..fb5414c6b3eb 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -880,6 +880,239 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+struct mon_read {
+	struct mpam_msc_ris		*ris;
+	struct mon_cfg			*ctx;
+	enum mpam_device_features	type;
+	u64				*val;
+	int				err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				   u32 *flt_val)
+{
+	struct mon_cfg *ctx = m->ctx;
+
+	/*
+	 * For CSU counters its implementation-defined what happens when not
+	 * filtering by partid.
+	 */
+	*ctl_val = MSMON_CFG_x_CTL_MATCH_PARTID;
+
+	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
+
+	if (m->ctx->match_pmg) {
+		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
+	}
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val |= MSMON_CFG_CSU_CTL_TYPE_CSU;
+
+		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
+			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
+					       ctx->csu_exclude_clean);
+
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val |= MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+			*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+
+		break;
+	default:
+		return;
+	}
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				    u32 *flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+		return;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		return;
+	default:
+		return;
+	}
+}
+
+/* Remove values set by the hardware to prevent apparent mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+				     u32 flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	/*
+	 * Write the ctl_val with the enable bit cleared, reset the counter,
+	 * then enable counter.
+	 */
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, CSU, 0);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	case mpam_feat_msmon_mbwu:
+		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Call with MSC lock held */
+static void __ris_msmon_read(void *arg)
+{
+	u64 now;
+	bool nrdy = false;
+	struct mon_read *m = arg;
+	struct mon_cfg *ctx = m->ctx;
+	struct mpam_msc_ris *ris = m->ris;
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+	if (!mpam_mon_sel_lock(msc)) {
+		m->err = -EIO;
+		return;
+	}
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+	/*
+	 * Read the existing configuration to avoid re-writing the same values.
+	 * This saves waiting for 'nrdy' on subsequent reads.
+	 */
+	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+	clean_msmon_ctl_val(&cur_ctl);
+	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		now = mpam_read_monsel_reg(msc, CSU);
+		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	case mpam_feat_msmon_mbwu:
+		now = mpam_read_monsel_reg(msc, MBWU);
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	default:
+		m->err = -EINVAL;
+		break;
+	}
+	mpam_mon_sel_unlock(msc);
+
+	if (nrdy) {
+		m->err = -EBUSY;
+		return;
+	}
+
+	now = FIELD_GET(MSMON___VALUE, now);
+	*m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+	int err,  any_err = 0;
+	struct mpam_vmsc *vmsc;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		struct mpam_msc *msc = vmsc->msc;
+		struct mpam_msc_ris *ris;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			arg->ris = ris;
+
+			err = smp_call_function_any(&msc->accessibility,
+						    __ris_msmon_read, arg,
+						    true);
+			if (!err && arg->err)
+				err = arg->err;
+
+			/*
+			 * Save one error to be returned to the caller, but
+			 * keep reading counters so that get reprogrammed. On
+			 * platforms with NRDY this lets us wait once.
+			 */
+			if (err)
+				any_err = err;
+		}
+	}
+
+	return any_err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features type, u64 *val)
+{
+	int err;
+	struct mon_read arg;
+	u64 wait_jiffies = 0;
+	struct mpam_props *cprops = &comp->class->props;
+
+	might_sleep();
+
+	if (!mpam_is_enabled())
+		return -EIO;
+
+	if (!mpam_has_feature(type, cprops))
+		return -EOPNOTSUPP;
+
+	arg = (struct mon_read) {
+		.ctx = ctx,
+		.type = type,
+		.val = val,
+	};
+	*val = 0;
+
+	err = _msmon_read(comp, &arg);
+	if (err == -EBUSY && comp->class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+	while (wait_jiffies)
+		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+	if (err == -EBUSY) {
+		arg = (struct mon_read) {
+			.ctx = ctx,
+			.type = type,
+			.val = val,
+		};
+		*val = 0;
+
+		err = _msmon_read(comp, &arg);
+	}
+
+	return err;
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 96a02ea95583..0c84e945c891 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -183,6 +183,22 @@ struct mpam_props {
 #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
 #define mpam_clear_feature(_feat, x)	clear_bit(_feat, (x)->features)
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	u16                     mon;
+	u8                      pmg;
+	bool                    match_pmg;
+	bool			csu_exclude_clean;
+	u32                     partid;
+	enum mon_filter_options opts;
+};
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -325,6 +341,9 @@ void mpam_disable(struct work_struct *work);
 int mpam_apply_config(struct mpam_component *comp, u16 partid,
 		      struct mpam_config *cfg);
 
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features, u64 *val);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (22 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-22 13:39   ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Zeng Heng
                     ` (2 more replies)
  2025-10-17 18:56 ` [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
                   ` (6 subsequent siblings)
  30 siblings, 3 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Bandwidth counters need to run continuously to correctly reflect the
bandwidth.

The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.

Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.

Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Removed bogus 'if (!mbwu_state)' checks.
 * Fixed __allocate_component_cfg() losing error return values.
 * Moved variable definitions into the loop.
 * Removed spurious lockdep assert from mpam_reset_component_cfg().

Changes since v1:
 * Fixed lock/unlock typo.
---
 drivers/resctrl/mpam_devices.c  | 146 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  23 +++++
 2 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index fb5414c6b3eb..deb1dcc6f6b1 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -955,6 +955,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 				     u32 flt_val)
 {
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 
 	/*
@@ -973,20 +974,31 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
 		mpam_write_monsel_reg(msc, MBWU, 0);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+
+		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+		mbwu_state->prev_val = 0;
+
 		break;
 	default:
 		return;
 	}
 }
 
+static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+{
+	/* TODO: scaling, and long counters */
+	return GENMASK_ULL(30, 0);
+}
+
 /* Call with MSC lock held */
 static void __ris_msmon_read(void *arg)
 {
-	u64 now;
 	bool nrdy = false;
 	struct mon_read *m = arg;
+	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
 	struct mpam_msc_ris *ris = m->ris;
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1014,11 +1026,28 @@ static void __ris_msmon_read(void *arg)
 		now = mpam_read_monsel_reg(msc, CSU);
 		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
 		now = mpam_read_monsel_reg(msc, MBWU);
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
+
+		if (nrdy)
+			break;
+
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+
+		/* Add any pre-overflow value to the mbwu_state->val */
+		if (mbwu_state->prev_val > now)
+			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+
+		mbwu_state->prev_val = now;
+		mbwu_state->correction += overflow_val;
+
+		/* Include bandwidth consumed before the last hardware reset */
+		now += mbwu_state->correction;
 		break;
 	default:
 		m->err = -EINVAL;
@@ -1031,7 +1060,6 @@ static void __ris_msmon_read(void *arg)
 		return;
 	}
 
-	now = FIELD_GET(MSMON___VALUE, now);
 	*m->val += now;
 }
 
@@ -1250,6 +1278,67 @@ static int mpam_reprogram_ris(void *_arg)
 	return 0;
 }
 
+/* Call with MSC lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+	int i;
+	struct mon_read mwbu_arg;
+	struct mpam_msc_ris *ris = _ris;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		if (ris->mbwu_state[i].enabled) {
+			mwbu_arg.ris = ris;
+			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+			mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+			__ris_msmon_read(&mwbu_arg);
+		}
+	}
+
+	return 0;
+}
+
+/* Call with MSC lock and held */
+static int mpam_save_mbwu_state(void *arg)
+{
+	int i;
+	u64 val;
+	struct mon_cfg *cfg;
+	u32 cur_flt, cur_ctl, mon_sel;
+	struct mpam_msc_ris *ris = arg;
+	struct msmon_mbwu_state *mbwu_state;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		mbwu_state = &ris->mbwu_state[i];
+		cfg = &mbwu_state->cfg;
+
+		if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+			return -EIO;
+
+		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+		val = mpam_read_monsel_reg(msc, MBWU);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+
+		cfg->mon = i;
+		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
+		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+		cfg->partid = FIELD_GET(MSMON_CFG_x_FLT_PARTID, cur_flt);
+		mbwu_state->correction += val;
+		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+		mpam_mon_sel_unlock(msc);
+	}
+
+	return 0;
+}
+
 static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
 {
 	*reset_cfg = (struct mpam_config) {
@@ -1322,6 +1411,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 * for non-zero partid may be lost while the CPUs are offline.
 		 */
 		ris->in_reset_state = online;
+
+		if (mpam_is_enabled() && !online)
+			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
 	}
 }
 
@@ -1355,6 +1447,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
 			mpam_reprogram_ris_partid(ris, partid, cfg);
 		}
 		ris->in_reset_state = reset;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
 	}
 }
 
@@ -2096,7 +2191,22 @@ static void mpam_unregister_irqs(void)
 
 static void __destroy_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
 	add_to_garbage(comp->cfg);
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		if (mpam_mon_sel_lock(msc)) {
+			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+				add_to_garbage(ris->mbwu_state);
+			mpam_mon_sel_unlock(msc);
+		}
+	}
 }
 
 static void mpam_reset_component_cfg(struct mpam_component *comp)
@@ -2114,6 +2224,8 @@ static void mpam_reset_component_cfg(struct mpam_component *comp)
 
 static int __allocate_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_vmsc *vmsc;
+
 	mpam_assert_partid_sizes_fixed();
 
 	if (comp->cfg)
@@ -2131,6 +2243,36 @@ static int __allocate_component_cfg(struct mpam_component *comp)
 
 	mpam_reset_component_cfg(comp);
 
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		struct mpam_msc *msc;
+		struct mpam_msc_ris *ris;
+		struct msmon_mbwu_state *mbwu_state;
+
+		if (!vmsc->props.num_mbwu_mon)
+			continue;
+
+		msc = vmsc->msc;
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->props.num_mbwu_mon)
+				continue;
+
+			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+					     sizeof(*ris->mbwu_state),
+					     GFP_KERNEL);
+			if (!mbwu_state) {
+				__destroy_component_cfg(comp);
+				return -ENOMEM;
+			}
+
+			init_garbage(&mbwu_state[0].garbage);
+
+			if (mpam_mon_sel_lock(msc)) {
+				ris->mbwu_state = mbwu_state;
+				mpam_mon_sel_unlock(msc);
+			}
+		}
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 0c84e945c891..28c475d18d86 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -199,6 +199,26 @@ struct mon_cfg {
 	enum mon_filter_options opts;
 };
 
+/*
+ * Changes to enabled and cfg are protected by the msc->lock.
+ * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ */
+struct msmon_mbwu_state {
+	bool		enabled;
+	struct mon_cfg	cfg;
+
+	/* The value last read from the hardware. Used to detect overflow. */
+	u64		prev_val;
+
+	/*
+	 * The value to add to the new reading to account for power management,
+	 * and shifts to trigger the overflow interrupt.
+	 */
+	u64		correction;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -291,6 +311,9 @@ struct mpam_msc_ris {
 	/* parent: */
 	struct mpam_vmsc	*vmsc;
 
+	/* msmon mbwu configuration is preserved over reset */
+	struct msmon_mbwu_state	*mbwu_state;
+
 	struct mpam_garbage	garbage;
 };
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (23 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-22 11:23   ` Ben Horgan
  2025-10-24 18:24   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported James Morse
                   ` (5 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

From: Rohit Mathew <rohit.mathew@arm.com>

mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
indicating support for long counters.

Probe these feature bits.

The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
monitors are supported, instead of muddling this with which size of
bandwidth monitors, add an explicit 31 bit counter feature.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[ morse: Added 31bit counter feature to simplify later logic ]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Added 31 bit counter type feature.
 * Altered commit message.
---
 drivers/resctrl/mpam_devices.c  | 13 +++++++++++--
 drivers/resctrl/mpam_internal.h |  3 +++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index deb1dcc6f6b1..f4d07234ce10 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -777,16 +777,25 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
 		}
 		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
-			bool hw_managed;
+			bool has_long, hw_managed;
 			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
 
 			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
-			if (props->num_mbwu_mon)
+			if (props->num_mbwu_mon) {
 				mpam_set_feature(mpam_feat_msmon_mbwu, props);
+				mpam_set_feature(mpam_feat_msmon_mbwu_31counter, props);
+			}
 
 			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
 				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
 
+			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumon_idr);
+			if (props->num_mbwu_mon && has_long) {
+				mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumon_idr))
+					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+			}
+
 			/* Is NRDY hardware managed? */
 			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
 			if (hw_managed)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 28c475d18d86..ff38b4bbfc2b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -158,6 +158,9 @@ enum mpam_device_features {
 	mpam_feat_msmon_csu_xcl,
 	mpam_feat_msmon_csu_hw_nrdy,
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_31counter,
+	mpam_feat_msmon_mbwu_44counter,
+	mpam_feat_msmon_mbwu_63counter,
 	mpam_feat_msmon_mbwu_capture,
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (24 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-22 12:31   ` Ben Horgan
  2025-10-24 18:29   ` Jonathan Cameron
  2025-10-17 18:56 ` [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
                   ` (4 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

From: Rohit Mathew <rohit.mathew@arm.com>

Now that the larger counter sizes are probed, make use of them.

Callers of mpam_msmon_read() may not know (or care!) about the different
counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
driver pick the counter to use.

Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit, added explicit counter selection ]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Removed mpam_feat_msmon_mbwu as a top-level bit for explicit 31bit counter
   selection.
 * Allow callers of mpam_msmon_read() to specify mpam_feat_msmon_mbwu and have
   the driver pick a supported counter size.
 * Rephrased commit message.

Changes since v1:
 * Only clear OFLOW_STATUS_L on MBWU counters.

Changes since RFC:
 * Commit message wrangling.
 * Refer to 31 bit counters as opposed to 32 bit (registers).
---
 drivers/resctrl/mpam_devices.c | 134 ++++++++++++++++++++++++++++-----
 1 file changed, 116 insertions(+), 18 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index f4d07234ce10..c207a6d2832c 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -897,6 +897,48 @@ struct mon_read {
 	int				err;
 };
 
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+	int retry = 3;
+	u32 mbwu_l_low;
+	u64 mbwu_l_high1, mbwu_l_high2;
+
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+	do {
+		mbwu_l_high1 = mbwu_l_high2;
+		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+		retry--;
+	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+	if (mbwu_l_high1 == mbwu_l_high2)
+		return (mbwu_l_high1 << 32) | mbwu_l_low;
+	return MSMON___NRDY_L;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
+	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
 static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 				   u32 *flt_val)
 {
@@ -924,7 +966,9 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 					       ctx->csu_exclude_clean);
 
 		break;
-	case mpam_feat_msmon_mbwu:
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
 		*ctl_val |= MSMON_CFG_MBWU_CTL_TYPE_MBWU;
 
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
@@ -946,7 +990,9 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
 		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
 		return;
-	case mpam_feat_msmon_mbwu:
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
 		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
 		return;
@@ -959,6 +1005,9 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 static void clean_msmon_ctl_val(u32 *cur_ctl)
 {
 	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+
+	if (FIELD_GET(MSMON_CFG_x_CTL_TYPE, *cur_ctl) == MSMON_CFG_MBWU_CTL_TYPE_MBWU)
+		*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
 }
 
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -978,10 +1027,15 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CSU, 0);
 		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 		break;
-	case mpam_feat_msmon_mbwu:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
+		mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
+		fallthrough;
+	case mpam_feat_msmon_mbwu_31counter:
 		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
 		mpam_write_monsel_reg(msc, MBWU, 0);
+
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 
 		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
@@ -993,10 +1047,19 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 	}
 }
 
-static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
 {
-	/* TODO: scaling, and long counters */
-	return GENMASK_ULL(30, 0);
+	/* TODO: implement scaling counters */
+	switch (type) {
+	case mpam_feat_msmon_mbwu_63counter:
+		return GENMASK_ULL(62, 0);
+	case mpam_feat_msmon_mbwu_44counter:
+		return GENMASK_ULL(43, 0);
+	case mpam_feat_msmon_mbwu_31counter:
+		return GENMASK_ULL(30, 0);
+	default:
+		return 0;
+	}
 }
 
 /* Call with MSC lock held */
@@ -1037,11 +1100,24 @@ static void __ris_msmon_read(void *arg)
 			nrdy = now & MSMON___NRDY;
 		now = FIELD_GET(MSMON___VALUE, now);
 		break;
-	case mpam_feat_msmon_mbwu:
-		now = mpam_read_monsel_reg(msc, MBWU);
-		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
-			nrdy = now & MSMON___NRDY;
-		now = FIELD_GET(MSMON___VALUE, now);
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
+		if (m->type != mpam_feat_msmon_mbwu_31counter) {
+			now = mpam_msc_read_mbwu_l(msc);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY_L;
+
+			if (m->type == mpam_feat_msmon_mbwu_63counter)
+				now = FIELD_GET(MSMON___LWD_VALUE, now);
+			else
+				now = FIELD_GET(MSMON___L_VALUE, now);
+		} else {
+			now = mpam_read_monsel_reg(msc, MBWU);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY;
+			now = FIELD_GET(MSMON___VALUE, now);
+		}
 
 		if (nrdy)
 			break;
@@ -1050,7 +1126,7 @@ static void __ris_msmon_read(void *arg)
 
 		/* Add any pre-overflow value to the mbwu_state->val */
 		if (mbwu_state->prev_val > now)
-			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;
 
 		mbwu_state->prev_val = now;
 		mbwu_state->correction += overflow_val;
@@ -1106,13 +1182,26 @@ static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
 	return any_err;
 }
 
+static enum mpam_device_features mpam_msmon_choose_counter(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, cprops))
+		return mpam_feat_msmon_mbwu_44counter;
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, cprops))
+		return mpam_feat_msmon_mbwu_63counter;
+
+	return mpam_feat_msmon_mbwu_31counter;
+}
+
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features type, u64 *val)
 {
 	int err;
 	struct mon_read arg;
 	u64 wait_jiffies = 0;
-	struct mpam_props *cprops = &comp->class->props;
+	struct mpam_class *class = comp->class;
+	struct mpam_props *cprops = &class->props;
 
 	might_sleep();
 
@@ -1129,9 +1218,12 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	};
 	*val = 0;
 
+	if (type == mpam_feat_msmon_mbwu)
+		type = mpam_msmon_choose_counter(class);
+
 	err = _msmon_read(comp, &arg);
-	if (err == -EBUSY && comp->class->nrdy_usec)
-		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+	if (err == -EBUSY && class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(class->nrdy_usec);
 
 	while (wait_jiffies)
 		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
@@ -1293,12 +1385,13 @@ static int mpam_restore_mbwu_state(void *_ris)
 	int i;
 	struct mon_read mwbu_arg;
 	struct mpam_msc_ris *ris = _ris;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
 		if (ris->mbwu_state[i].enabled) {
 			mwbu_arg.ris = ris;
 			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
-			mwbu_arg.type = mpam_feat_msmon_mbwu;
+			mwbu_arg.type = mpam_msmon_choose_counter(class);
 
 			__ris_msmon_read(&mwbu_arg);
 		}
@@ -1333,8 +1426,13 @@ static int mpam_save_mbwu_state(void *arg)
 		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
 
-		val = mpam_read_monsel_reg(msc, MBWU);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			val = mpam_msc_read_mbwu_l(msc);
+			mpam_msc_zero_mbwu_l(msc);
+		} else {
+			val = mpam_read_monsel_reg(msc, MBWU);
+			mpam_write_monsel_reg(msc, MBWU, 0);
+		}
 
 		cfg->mon = i;
 		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (25 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-24 18:34   ` Jonathan Cameron
  2025-10-29  7:14   ` Shaopeng Tan (Fujitsu)
  2025-10-17 18:56 ` [PATCH v3 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
                   ` (3 subsequent siblings)
  30 siblings, 2 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Fenghua Yu

resctrl expects to reset the bandwidth counters when the filesystem
is mounted.

To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Switched to guard and fixed non _srcu list walker.
 * Made a comment about what is proteted by which lock a list.
---
 drivers/resctrl/mpam_devices.c  | 46 ++++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  7 ++++-
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c207a6d2832c..89d4f42168ed 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1066,9 +1066,11 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
 static void __ris_msmon_read(void *arg)
 {
 	bool nrdy = false;
+	bool config_mismatch;
 	struct mon_read *m = arg;
 	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
+	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
 	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
@@ -1083,6 +1085,14 @@ static void __ris_msmon_read(void *arg)
 		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
 	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
 
+	if (m->type == mpam_feat_msmon_mbwu) {
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (mbwu_state) {
+			reset_on_next_read = mbwu_state->reset_on_next_read;
+			mbwu_state->reset_on_next_read = false;
+		}
+	}
+
 	/*
 	 * Read the existing configuration to avoid re-writing the same values.
 	 * This saves waiting for 'nrdy' on subsequent reads.
@@ -1090,7 +1100,10 @@ static void __ris_msmon_read(void *arg)
 	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
-	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+	config_mismatch = cur_flt != flt_val ||
+			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+	if (config_mismatch || reset_on_next_read)
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
 
 	switch (m->type) {
@@ -1242,6 +1255,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	return err;
 }
 
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	if (!mpam_is_enabled())
+		return;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+			continue;
+
+		msc = vmsc->msc;
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+				continue;
+
+			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+				continue;
+
+			ris->mbwu_state[ctx->mon].correction = 0;
+			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+			mpam_mon_sel_unlock(msc);
+		}
+	}
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index ff38b4bbfc2b..6632699ae814 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -204,10 +204,14 @@ struct mon_cfg {
 
 /*
  * Changes to enabled and cfg are protected by the msc->lock.
- * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ * The msc's mon_sel_lock protects:
+ * - reset_on_next_read
+ * - prev_val
+ * - correction
  */
 struct msmon_mbwu_state {
 	bool		enabled;
+	bool		reset_on_next_read;
 	struct mon_cfg	cfg;
 
 	/* The value last read from the hardware. Used to detect overflow. */
@@ -369,6 +373,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
 
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 28/29] arm_mpam: Add kunit test for bitmap reset
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (26 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-17 18:56 ` [PATCH v3 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Jonathan Cameron, Ben Horgan

The bitmap reset code has been a source of bugs. Add a unit test.

This currently has to be built in, as the rest of the driver is
builtin.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes from v2:
 * Copyright year,
 * Comment about lockdep checks,
 * { } instead of { 0 }
 * Removed a stray newline.
---
 drivers/resctrl/Kconfig             |  9 ++++
 drivers/resctrl/mpam_devices.c      |  4 ++
 drivers/resctrl/test_mpam_devices.c | 69 +++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 drivers/resctrl/test_mpam_devices.c

diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 58c83b5c8bfd..a2e9a6130461 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -10,4 +10,13 @@ config ARM64_MPAM_DRIVER_DEBUG
 	help
 	  Say yes here to enable debug messages from the MPAM driver.
 
+config MPAM_KUNIT_TEST
+	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+	depends on KUNIT=y
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to run tests in the MPAM driver.
+
+	  If unsure, say N.
+
 endif
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 89d4f42168ed..0dd048279e02 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2695,3 +2695,7 @@ static int __init mpam_msc_driver_init(void)
 }
 /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..0cfb41b665c4
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+	struct mpam_msc fake_msc = {};
+	u32 *test_result;
+
+	if (!buf)
+		return;
+
+	fake_msc.mapped_hwpage = buf;
+	fake_msc.mapped_hwpage_sz = SZ_16K;
+	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+	/* Satisfy lockdep checks */
+	mutex_init(&fake_msc.part_sel_lock);
+	mutex_lock(&fake_msc.part_sel_lock);
+
+	test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+	KUNIT_EXPECT_EQ(test, test_result[0], 1);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 1);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	{}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+	.name = "mpam_devices_test_suite",
+	.test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 29/29] arm_mpam: Add kunit tests for props_mismatch()
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (27 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-10-17 18:56 ` James Morse
  2025-10-18  1:01 ` [PATCH v3 00/29] arm_mpam: Add basic mpam driver Fenghua Yu
  2025-10-23  8:15 ` Shaopeng Tan (Fujitsu)
  30 siblings, 0 replies; 86+ messages in thread
From: James Morse @ 2025-10-17 18:56 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi
  Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Ben Horgan

When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.

Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
---
Changes since v2:
 * Comment on why packing is only needed for kunit tests,
 * Made the reset code a function not a macro.

Changes since v1:
 * Waggled some words in comments.
 * Moved a bunch of variables to be global - shuts up a compiler warning.
---
 drivers/resctrl/mpam_internal.h     |  14 +-
 drivers/resctrl/test_mpam_devices.c | 320 ++++++++++++++++++++++++++++
 2 files changed, 333 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6632699ae814..4f25681b56ab 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -25,6 +25,12 @@ struct platform_device;
 
 DECLARE_STATIC_KEY_FALSE(mpam_enabled);
 
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
 static inline bool mpam_is_enabled(void)
 {
 	return static_branch_likely(&mpam_enabled);
@@ -180,7 +186,13 @@ struct mpam_props {
 	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
-};
+
+/*
+ * Kunit tests use memset() to set up feature combinations that should be
+ * removed, and will false-positive if the compiler introduces padding that
+ * isn't cleared during sanitisation.
+ */
+} PACKED_FOR_KUNIT;
 
 #define mpam_has_feature(_feat, x)	test_bit(_feat, (x)->features)
 #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 0cfb41b665c4..3e8d564a0c64 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,324 @@
 
 #include <kunit/test.h>
 
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+	struct mpam_props parent = { 0 };
+	struct mpam_props child;
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, false);
+
+	memset(&child, 0, sizeof(child));
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, true);
+
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static struct list_head fake_classes_list;
+static struct mpam_class fake_class = { 0 };
+static struct mpam_component fake_comp1 = { 0 };
+static struct mpam_component fake_comp2 = { 0 };
+static struct mpam_vmsc fake_vmsc1 = { 0 };
+static struct mpam_vmsc fake_vmsc2 = { 0 };
+static struct mpam_msc fake_msc1 = { 0 };
+static struct mpam_msc fake_msc2 = { 0 };
+static struct mpam_msc_ris fake_ris1 = { 0 };
+static struct mpam_msc_ris fake_ris2 = { 0 };
+static struct platform_device fake_pdev = { 0 };
+
+static inline void reset_fake_hierarchy(void)
+{
+	INIT_LIST_HEAD(&fake_classes_list);
+
+	memset(&fake_class, 0, sizeof(fake_class));
+	fake_class.level = 3;
+	fake_class.type = MPAM_CLASS_CACHE;
+	INIT_LIST_HEAD_RCU(&fake_class.components);
+	INIT_LIST_HEAD(&fake_class.classes_list);
+
+	memset(&fake_comp1, 0, sizeof(fake_comp1));
+	memset(&fake_comp2, 0, sizeof(fake_comp2));
+	fake_comp1.comp_id = 1;
+	fake_comp2.comp_id = 2;
+	INIT_LIST_HEAD(&fake_comp1.vmsc);
+	INIT_LIST_HEAD(&fake_comp1.class_list);
+	INIT_LIST_HEAD(&fake_comp2.vmsc);
+	INIT_LIST_HEAD(&fake_comp2.class_list);
+
+	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));
+	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));
+	INIT_LIST_HEAD(&fake_vmsc1.ris);
+	INIT_LIST_HEAD(&fake_vmsc1.comp_list);
+	fake_vmsc1.msc = &fake_msc1;
+	INIT_LIST_HEAD(&fake_vmsc2.ris);
+	INIT_LIST_HEAD(&fake_vmsc2.comp_list);
+	fake_vmsc2.msc = &fake_msc2;
+
+	memset(&fake_ris1, 0, sizeof(fake_ris1));
+	memset(&fake_ris2, 0, sizeof(fake_ris2));
+	fake_ris1.ris_idx = 1;
+	INIT_LIST_HEAD(&fake_ris1.msc_list);
+	fake_ris2.ris_idx = 2;
+	INIT_LIST_HEAD(&fake_ris2.msc_list);
+
+	fake_msc1.pdev = &fake_pdev;
+	fake_msc2.pdev = &fake_pdev;
+
+	list_add(&fake_class.classes_list, &fake_classes_list);
+}
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+	reset_fake_hierarchy();
+
+	mutex_lock(&mpam_list_lock);
+
+	/* One Class+Comp, two RIS in one vMSC with common features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't control the same resource,
+	 * mismatched features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with incompatible overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 5;
+	fake_ris2.props.cpbm_wd = 3;
+	fake_ris1.props.mbw_pbm_bits = 5;
+	fake_ris2.props.mbw_pbm_bits = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't control the same resource,
+	 * mismatched features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with overlapping features that need tweaking */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+	fake_ris1.props.bwa_wd = 5;
+	fake_ris2.props.bwa_wd = 3;
+	fake_ris1.props.cmax_wd = 5;
+	fake_ris2.props.cmax_wd = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * RIS with different control properties need to be sanitised so the
+	 * class has the common set of properties.
+	 */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+	reset_fake_hierarchy();
+
+	/* One Class Two Comp with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class Two Comp with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple components can't control the same resource, mismatched features can
+	 * not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	mutex_unlock(&mpam_list_lock);
+}
+
 static void test_mpam_reset_msc_bitmap(struct kunit *test)
 {
 	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -58,6 +376,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
 
 static struct kunit_case mpam_devices_test_cases[] = {
 	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	KUNIT_CASE(test_mpam_enable_merge_features),
+	KUNIT_CASE(test__props_mismatch),
 	{}
 };
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-10-17 18:56 ` [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-10-17 23:03   ` Fenghua Yu
  2025-10-24 17:32   ` Jonathan Cameron
  2025-10-29  6:37   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 86+ messages in thread
From: Fenghua Yu @ 2025-10-17 23:03 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton, Gavin Shan,
	Ben Horgan

Hi, James,

On 10/17/25 11:56, James Morse wrote:
> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
>
> Add the definitions for these registers as offset within the page(s).
>
> Link: https://developer.arm.com/documentation/ihi0099/latest/
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>   * Removed a few colons.
>   * fixed a typo.
>   * Moved some definitions around.
>
> Changes since v1:
>   * Whitespace.
>   * Added constants for CASSOC and XCL.
>   * Merged FLT/CTL defines.
>   * Fixed MSMON_CFG_CSU_CTL_TYPE_CSU definition.
>
> Changes since RFC:
>   * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
>   * Whitepsace churn.
>   * Cite a more recent document.
>   * Removed some stale feature, fixed some names etc.
> ---
>   drivers/resctrl/mpam_internal.h | 268 ++++++++++++++++++++++++++++++++
>   1 file changed, 268 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 1a5d96660382..1ef3e8e1d056 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -142,4 +142,272 @@ extern struct list_head mpam_classes;
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/
> + */
> +#define MPAM_ARCHITECTURE_V1    0x10
> +
> +/* Memory mapped control pages */
> +/* ID Register offsets in the memory mapped page */
> +#define MPAMF_IDR		0x0000  /* features id register */
> +#define MPAMF_IIDR		0x0018  /* implementer id register */
> +#define MPAMF_AIDR		0x0020  /* architectural id register */
> +#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
> +#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
> +#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
> +#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
> +#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
> +#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
> +#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
> +#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
> +#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
> +
> +/* Configuration and Status Register offsets in the memory mapped page */
> +#define MPAMCFG_PART_SEL	0x0100  /* partid to configure */
> +#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
> +#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
> +#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
> +#define MPAMCFG_CASSOC		0x0118  /* cache-associativity config */
> +#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
> +#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
> +#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
> +#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
> +#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
> +#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
> +#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
> +
> +#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
> +#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
> +#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
> +#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
> +#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
> +#define MSMON_CSU		0x0840  /* current cache-usage */
> +#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
> +#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
> +#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
> +#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
> +#define MPAMF_ESR		0x00F8  /* error status register */
> +#define MPAMF_ECR		0x00F0  /* error control register */
> +
> +/* MPAMF_IDR - MPAM features ID register */
> +#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
> +#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
> +#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
> +#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
> +#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
> +#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
> +#define MPAMF_IDR_EXT			BIT(28)
> +#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
> +#define MPAMF_IDR_HAS_MSMON		BIT(30)
> +#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
> +#define MPAMF_IDR_HAS_RIS		BIT(32)
> +#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
> +#define MPAMF_IDR_HAS_ESR		BIT(39)
> +#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
> +
> +/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
> +#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
> +#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
> +#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
> +
> +/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
> +#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
> +
> +/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
> +#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
> +#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
> +#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
> +#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
> +#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
> +#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
> +
> +/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
> +#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
> +#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
> +#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
> +#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
> +#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
> +#define MPAMF_MBW_IDR_WINDWR		BIT(14)
> +#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
> +
> +/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
> +#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
> +#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
> +#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
> +#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
> +#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
> +#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
> +
> +/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
> +#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
> +#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
> +#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
> +#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
> +#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
> +#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
> +
> +/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
> +#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
> +#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
> +#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
> +#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
> +#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
> +
> +/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
> +#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX	GENMASK(15, 0)
> +
> +/* MPAMF_IIDR - MPAM implementation ID register */
> +#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
> +#define MPAMF_IIDR_REVISION	GENMASK(15, 12)
> +#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
> +#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)

Just a friendly reminder:

v2 defines _SHIFTs for each field in MPAMF_IIDR. They are removed here 
but will be used in the 2nd series.

It's not an issue for this series but just a friendly reminder to add 
them back in the 2nd series or the mpam/snapshot/6.18-rc1 cannot be built.

[SNIP]

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/29] arm_mpam: Add basic mpam driver
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (28 preceding siblings ...)
  2025-10-17 18:56 ` [PATCH v3 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-10-18  1:01 ` Fenghua Yu
  2025-10-23  8:15 ` Shaopeng Tan (Fujitsu)
  30 siblings, 0 replies; 86+ messages in thread
From: Fenghua Yu @ 2025-10-18  1:01 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton, Gavin Shan



On 10/17/25 11:56, James Morse wrote:
> Hello,
> 
> A slew of minor changes, nothing really sticks out.
> Changes are noted on each patch.
> 
> ~
> 
> This is just enough MPAM driver for ACPI. DT got ripped out. If you need DT
> support - please share your DTS so the DT folk know the binding is what is
> needed.

[SNIP]

> This series is based on v6.18-rc4, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v3
> 
> The rest of the driver can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1
> 
> What is MPAM? Set your time-machine to 2020:
> https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/

In case you build/test the mpam/snapshot/v6.18-rc1 branch, need to apply 
the following patch. After applying this patch, this series and the rest
of the drivers run successfully on my machine.

diff --git a/drivers/resctrl/mpam_internal.h 
b/drivers/resctrl/mpam_internal.h
index f890d1381af6..132d29e53ae9 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -645,6 +645,11 @@ static inline void
mpam_resctrl_teardown_class(struct mpam_class *class) { }
   #define MPAMF_IIDR_VARIANT    GENMASK(19, 16)
   #define MPAMF_IIDR_PRODUCTID    GENMASK(31, 20)

+#define MPAMF_IIDR_IMPLEMENTER_SHIFT    0
+#define MPAMF_IIDR_REVISION_SHIFT    12
+#define MPAMF_IIDR_VARIANT_SHIFT    16
+#define MPAMF_IIDR_PRODUCTID_SHIFT    20
+
   /* MPAMF_AIDR - MPAM architecture ID register */
   #define MPAMF_AIDR_ARCH_MINOR_REV    GENMASK(3, 0)
   #define MPAMF_AIDR_ARCH_MAJOR_REV    GENMASK(7, 4)

[SNIP]

Thanks.

-Fenghua

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-10-17 18:56 ` [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-10-20 10:34   ` Ben Horgan
  2025-10-24 14:15   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 10:34 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes sinec v2:
>  * Search all caches, not just unified caches. This removes the need to count
>    the caches first, but means a failure to find the table walks the table
>    three times for different cache types.
>  * Fixed return value of the no-acpi stub.
>  * Punctuation typo in a comment,
>  * Keep trying to parse the table even if a bogus CPU is encountered.
>  * Specified CPUs share caches with other CPUs.
> 
> Changes since v1:
>  * Droppeed the cleanup based table freeing, use acpi_get_pptt() instead.
>  * Removed a confusing comment.
>  * Clarified the kernel doc.
> 
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 82 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  5 +++
>  2 files changed, 87 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 63c3a344c075..50c8f2a3c927 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -350,6 +350,27 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>  	return found;
>  }
>  
> +static struct acpi_pptt_cache *
> +acpi_find_any_type_cache_node(struct acpi_table_header *table_hdr,
> +			      u32 acpi_cpu_id, unsigned int level,
> +			      struct acpi_pptt_processor **node)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	cache = acpi_find_cache_node(table_hdr, acpi_cpu_id, CACHE_TYPE_UNIFIED,
> +				     level, node);
> +	if (cache)
> +		return cache;
> +
> +	cache = acpi_find_cache_node(table_hdr, acpi_cpu_id, CACHE_TYPE_DATA,
> +				     level, node);
> +	if (cache)
> +		return cache;
> +
> +	return acpi_find_cache_node(table_hdr, acpi_cpu_id, CACHE_TYPE_INST,
> +				    level, node);
> +}
> +
>  /**
>   * update_cache_properties() - Update cacheinfo for the given processor
>   * @this_leaf: Kernel cache info structure being updated
> @@ -903,3 +924,64 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>  				     entry->length);
>  	}
>  }
> +
> +/*
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the cache
> + *
> + * Determine the level relative to any CPU for the cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later.
> + *
> + * If one CPU's L2 is shared with another CPU as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
> + * the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	int level, cpu;
> +	u32 acpi_cpu_id;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_table_header *table;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			continue;
> +
> +		/* Start at 1 for L1 */
> +		level = 1;
> +		cache = acpi_find_any_type_cache_node(table, acpi_cpu_id, level,
> +						      &cpu_node);
> +		while (cache) {
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache, sizeof(*cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +
> +			level++;

If there is more than one type of cache at a given level only one is
checked. For example, if there is an L1 data cache and an L1 instruction
cache then the L1 instruction cache will never be considered.

> +			cache = acpi_find_any_type_cache_node(table, acpi_cpu_id,
> +							      level, &cpu_node);
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-10-17 18:56 ` [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-10-20 10:45   ` Ben Horgan
  2025-10-22 12:58   ` Jeremy Linton
  1 sibling, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 10:45 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Removed stray cleanup useage in preference for acpi_get_pptt().
>  * Removed WARN_ON_ONCE() for symmetry with other helpers.
>  * Dropped restriction on unified caches.
> 
> Changes since v1:
>  * Added punctuation to the commit message.
>  * Removed a comment about an alternative implementaion.
>  * Made the loop continue with a warning if a CPU is missing from the PPTT.
> 
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  6 +++++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 50c8f2a3c927..2f86f58699a6 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -985,3 +985,67 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>  
>  	return -ENOENT;
>  }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	int level, cpu;
> +	u32 acpi_cpu_id;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_table_header *table;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	cpumask_clear(cpus);
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			continue;
> +
> +		/* Start at 1 for L1 */
> +		level = 1;
> +		cache = acpi_find_any_type_cache_node(table, acpi_cpu_id, level,
> +						      &cpu_node);
> +		while (cache) {
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache, sizeof(*cache));
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache, sizeof(*cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				cpumask_set_cpu(cpu, cpus);
> +
> +			level++;

Same comment as for the previous patch. You are bumping the level
without considering if there is another cache at that level.

> +			cache = acpi_find_any_type_cache_node(table, acpi_cpu_id,
> +							      level, &cpu_node);
> +		}
> +	}
> +
> +	return 0;
> +}
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table
  2025-10-17 18:56 ` [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-10-20 12:29   ` Ben Horgan
  2025-10-24 16:13   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 12:29 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> This happens in two stages. Platform devices are created first for the
> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
> to discover the RIS entries the MSC contains.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data about the RIS entries.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v2:
>  * Expanded commit message.
>  * Moved explicit memset() to array initialisation.
>  * Added comments on the sizing of arrays.
>  * Moved MSC table entry parsing to a helper to allow use of a platform-device
>    cleanup rune, result int more returns and fewer breaks.
>  * Changed pre-processor macros for table bits.
>  * Discover unsupported PPI partitions purely from the table to make gicv5
>    easier, which also simplifies acpi_mpam_parse_irqs()
>  * Gave interface type numbers pre-processor names.
>  * Clarified some comments.
>  * Fixed the WARN_ON comparison in acpi_mpam_parse_msc().
>  * Made buffer over-run noisier.
>  * Print an error condition as %d not %u.
>  * Print a debug message when bad NUMA nodes are found.
> 
[...]
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index a9dbacabdf89..9d66421f68ff 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))

Any reason to not change this to !IS_ERR_OR_NULL(_T) as Jonathan
suggested in his v2 review.

> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..3d6c39c667c3
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +#define GLOBAL_AFFINITY		~0
> +
> +struct mpam_msc;
> +
> +enum mpam_msc_iface {
> +	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
> +	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
> +};
> +
> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +/* Parse the ACPI description of resources entries for this MSC. */
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else
> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +					    struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
> +#endif
> +
> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* __LINUX_ARM_MPAM_H */
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index 074754c23d33..23a30ada2d4c 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
>  extern int platform_device_add(struct platform_device *pdev);
>  extern void platform_device_del(struct platform_device *pdev);
>  extern void platform_device_put(struct platform_device *pdev);
> +DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))

Significant enough to be mentioned in the commit message?

This DEFINE_FREE is named after the free-ing function,
platform_device_put, whereas the previous DEFINE_FREE in this patch is
named after the struct, acpi_table. On grepping I see both naming
schemes - not sure if there is a recommendation for new code.

>  
>  struct platform_driver {
>  	int (*probe)(struct platform_device *);

Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-10-20 12:43   ` Ben Horgan
  2025-10-20 15:44   ` Ben Horgan
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 12:43 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v2:
>  * Comment in Kconfig about why EXPERT.
>  * Dropped duplicate depends.
>  * Fixed duplicate return statement.
>  * Restructured driver probe to have a do_ function to allow breaks to be
>    return instead...
>  * Removed resctrl.h include, added spinlock.h
>  * Removed stray DT function prototype
>  * Removed stray PCC variables in struct mpam_msc.
>  * Used ccflags not cflags for debug define.
>  * Moved srcu header include to internal.h
>  * Moved mpam_msc_destroy() into this patch.
> 
> Changes since v1:
>  * Avoid selecting driver on other architectrues.
>  * Removed PCC support stub.
>  * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
>  * Clarified a comment.
>  * Stopped using mpam_num_msc as an id,a and made it atomic.
>  * Size of -1 returned from cache_of_calculate_id()
>  * Renamed some struct members.
>  * Made a bunch of pr_err() dev_err_ocne().
>  * Used more cleanup magic.
>  * Inlined a print message.
>  * Fixed error propagation from mpam_dt_parse_resources().
>  * Moved cache accessibility checks earlier.
>  * Change cleanup macro to use IS_ERR_OR_NULL().
> 
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/acpi/arm64/mpam.c       |   7 ++
>  drivers/resctrl/Kconfig         |  13 +++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 190 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  52 +++++++++
>  include/linux/acpi.h            |   2 +-
>  9 files changed, 271 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c5e66d5d72cd..004d58cfbff8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
>  	select ACPI_MPAM if ACPI
>  	help
>  	  Memory System Resource Partitioning and Monitoring (MPAM) is an
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>  
>  source "drivers/cdx/Kconfig"
>  
> +source "drivers/resctrl/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 8e1ffa4358d5..20eb17596b89 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,6 +194,7 @@ obj-$(CONFIG_HTE)		+= hte/
>  obj-$(CONFIG_DRM_ACCEL)		+= accel/
>  obj-$(CONFIG_CDX_BUS)		+= cdx/
>  obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
>  
>  obj-$(CONFIG_DIBS)		+= dibs/
>  obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> index 59712397025d..51c6f5fd4a5e 100644
> --- a/drivers/acpi/arm64/mpam.c
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -337,6 +337,13 @@ static int __init acpi_mpam_parse(void)
>  	return 0;
>  }
>  
> +/**
> + * acpi_mpam_count_msc() - Count the number of MSC described by firmware.
> + *
> + * Returns the number of of MSC, or zero for an error.
> + *
> + * This can be called before or in parallel with acpi_mpam_parse().
> + */

This comment can be added in the patch where you add the function,
acpi_mpam_count_msc().

>  int acpi_mpam_count_msc(void)
>  {
>  	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
[...]
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 9d66421f68ff..70f075b397ce 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -231,7 +231,7 @@ static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32
>  		return ERR_PTR(-ENOENT);
>  	return table;
>  }
> -DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR_OR_NULL(_T)) acpi_put_table(_T))

Ah, you did make this change. Just ended up in the wrong patch.

>  
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-10-17 18:56 ` [PATCH v3 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-10-20 15:14   ` Ben Horgan
  0 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 15:14 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Reduced the scop of arguments in mpam_reset_component_locked().
> 
> Changes since v1:
>  * more complete use of _srcu helpers.
>  * Use guard macro for srcu.
>  * Dropped a might_sleep() - something else will bark.
> ---
>  drivers/resctrl/mpam_devices.c | 58 ++++++++++++++++++++++++++++++++--
>  1 file changed, 55 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index ec089593acad..545482e112b7 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -802,15 +802,13 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
>  
>  /*
>   * Called via smp_call_on_cpu() to prevent migration, while still being
> - * pre-emptible.
> + * pre-emptible. Caller must hold mpam_srcu.
>   */
>  static int mpam_reset_ris(void *arg)
>  {
>  	u16 partid, partid_max;
>  	struct mpam_msc_ris *ris = arg;
>  
> -	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
> -
>  	if (ris->in_reset_state)
>  		return 0;
>  
> @@ -1328,8 +1326,56 @@ static void mpam_enable_once(void)
>  	       mpam_partid_max + 1, mpam_pmg_max + 1);
>  }
>  
> +static void mpam_reset_component_locked(struct mpam_component *comp)
> +{
> +

Nit: Extra blank line.

> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		struct mpam_msc *msc = vmsc->msc;
> +		struct mpam_msc_ris *ris;
> +
> +		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
> +					 srcu_read_lock_held(&mpam_srcu)) {
> +			if (!ris->in_reset_state)
> +				mpam_touch_msc(msc, mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +		}
> +	}
> +}
> +

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
  2025-10-20 12:43   ` Ben Horgan
@ 2025-10-20 15:44   ` Ben Horgan
  2025-10-21  9:51   ` Ben Horgan
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 15:44 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v2:
>  * Comment in Kconfig about why EXPERT.
>  * Dropped duplicate depends.
>  * Fixed duplicate return statement.
>  * Restructured driver probe to have a do_ function to allow breaks to be
>    return instead...
>  * Removed resctrl.h include, added spinlock.h
>  * Removed stray DT function prototype
>  * Removed stray PCC variables in struct mpam_msc.
>  * Used ccflags not cflags for debug define.
>  * Moved srcu header include to internal.h
>  * Moved mpam_msc_destroy() into this patch.
> 
> Changes since v1:
>  * Avoid selecting driver on other architectrues.
>  * Removed PCC support stub.
>  * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
>  * Clarified a comment.
>  * Stopped using mpam_num_msc as an id,a and made it atomic.
>  * Size of -1 returned from cache_of_calculate_id()
>  * Renamed some struct members.
>  * Made a bunch of pr_err() dev_err_ocne().
>  * Used more cleanup magic.
>  * Inlined a print message.
>  * Fixed error propagation from mpam_dt_parse_resources().
>  * Moved cache accessibility checks earlier.
>  * Change cleanup macro to use IS_ERR_OR_NULL().
> 
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/acpi/arm64/mpam.c       |   7 ++
>  drivers/resctrl/Kconfig         |  13 +++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 190 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  52 +++++++++
>  include/linux/acpi.h            |   2 +-
>  9 files changed, 271 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
> 
[snip]
> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	u32 tmp;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +	if (!msc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	mutex_init(&msc->probe_lock);
> +	mutex_init(&msc->part_sel_lock);
> +	msc->id = pdev->id;
> +	msc->pdev = pdev;
> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +	INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +	err = update_msc_accessibility(msc);
> +	if (err)
> +		return ERR_PTR(err);
> +	if (cpumask_empty(&msc->accessibility)) {
> +		dev_err_once(dev, "MSC is not accessible from any CPU!");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
> +		msc->iface = MPAM_IFACE_MMIO;
> +	else
> +		msc->iface = MPAM_IFACE_PCC;

As there is no PCC support in this series should this return
ERR_PTR(-ENOTSUPP) when the firmware doesn't advertise a MMIO interface?

> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		void __iomem *io;
> +
> +		io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +							    &msc_res);
> +		if (IS_ERR(io)) {
> +			dev_err_once(dev, "Failed to map MSC base address\n");
> +			return (void *)io;
> +		}
> +		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +		msc->mapped_hwpage = io;
> +	}
> +
> +	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +	platform_set_drvdata(pdev, msc);
> +
> +	return msc;
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc = NULL;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	msc = do_mpam_msc_drv_probe(pdev);
> +	mutex_unlock(&mpam_list_lock);
> +	if (!IS_ERR(msc)) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +		if (err)
> +			mpam_msc_drv_remove(pdev);
> +	} else {
> +		err = PTR_ERR(msc);
> +	}
> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> +		pr_info("Discovered all MSC\n");
> +
> +	return err;
> +}
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	fw_num_msc = acpi_mpam_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 19/29] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-10-17 18:56 ` [PATCH v3 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-10-20 16:28   ` Ben Horgan
  0 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 16:28 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> Once all the MSC have been probed, the system wide usable number of
> PARTID is known and the configuration arrays can be allocated.
> 
> After this point, checking all the MSC have been probed is pointless,
> and the cpuhp callbacks should restore the configuration, instead of
> just resetting the MSC.
> 
> Add a static key to enable this behaviour. This will also allow MPAM
> to be disabled in response to an error, and the architecture code to
> enable/disable the context switch of the MPAM system registers.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Removed the word 'TODO'.
>  * Fixed a typo in the commit message.
> ---
[..]
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a04b09abd814..d492df9a1735 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -9,6 +9,7 @@
>  #include <linux/bitmap.h>
>  #include <linux/cpumask.h>
>  #include <linux/io.h>
> +#include <linux/jump_label.h>
>  #include <linux/llist.h>
>  #include <linux/mailbox_client.h>
>  #include <linux/mutex.h>
> @@ -19,8 +20,16 @@
>  
>  #define MPAM_MSC_MAX_NUM_RIS	16
>  
> +

nit: stray whitespace change

>  struct platform_device;
>  
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +
> +static inline bool mpam_is_enabled(void)
> +{
> +	return static_branch_likely(&mpam_enabled);
> +}
> +
>  /*
>   * Structures protected by SRCU may not be freed for a surprising amount of
>   * time (especially if perf is running). To ensure the MPAM error interrupt can

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-10-17 18:56 ` [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-10-20 17:04   ` Ben Horgan
  2025-10-27  8:47   ` Shaopeng Tan (Fujitsu)
  2025-10-29  7:09   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 17:04 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate
> a configuration array for each component, and reprogram each RIS
> configuration from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Call mpam_init_reset_cfg() on alloated config as 0 is not longer correct.
>  * init_garbage() on each config - the array has to be freed in one go, but
>    otherwise this looks weird.
>  * Use struct initialiser in mpam_init_reset_cfg(),
>  * Moved int err definition.
>  * Removed srcu lock taking based on squinting at the only caller.
>  * Moved config reset to mpam_reset_component_cfg() for re-use in
>    mpam_reset_component_locked(), previous memset() was not enough since zero
>    no longer means reset.
> 
[...]
>  
> +struct reprogram_ris {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_config *cfg;
> +};
> +
> +/* Call with MSC lock held */
> +static int mpam_reprogram_ris(void *_arg)
> +{
> +	u16 partid, partid_max;
> +	struct reprogram_ris *arg = _arg;
> +	struct mpam_msc_ris *ris = arg->ris;
> +	struct mpam_config *cfg = arg->cfg;
> +
> +	if (ris->in_reset_state)
> +		return 0;
> +
> +	spin_lock(&partid_max_lock);
> +	partid_max = mpam_partid_max;
> +	spin_unlock(&partid_max_lock);
> +	for (partid = 0; partid <= partid_max + 1; partid++)

Loop overrun. This was correct in the previous version of the patch and
the same shape of loop is done correctly elsewhere in this version. I
think it would be good to standardise on using either:
partid <= partid_max
or
partid < partid_max + 1
I have a preference for the first as you don't need to think about the
size of the type.

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 21/29] arm_mpam: Probe and reset the rest of the features
  2025-10-17 18:56 ` [PATCH v3 21/29] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-10-20 17:16   ` Ben Horgan
  0 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-20 17:16 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan, Zeng Heng

Hi James,

On 10/17/25 19:56, James Morse wrote:
> MPAM supports more features than are going to be exposed to resctrl.
> For partid other than 0, the reset values of these controls isn't
> known.
> 
> Discover the rest of the features so they can be reset to avoid any
> side effects when resctrl is in use.
> 
> PARTID narrowing allows MSC/RIS to support less configuration space than
> is usable. If this feature is found on a class of device we are likely
> to use, then reduce the partid_max to make it usable. This allows us
> to map a PARTID to itself.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> CC: Zeng Heng <zengheng4@huawei.com>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Moved some enum definitions in here.
>  * Whitespace.
> 
[...]
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 2f2a7369107b..00edee9ebc6c 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -139,16 +139,30 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>  
>  /* Bits for mpam features bitmaps */
>  enum mpam_device_features {
> -	mpam_feat_cpor_part = 0,

Any reason this one doesn't stay first?

> +	mpam_feat_cmax_softlim,
> +	mpam_feat_cmax_cmax,
> +	mpam_feat_cmax_cmin,
> +	mpam_feat_cmax_cassoc,
> +	mpam_feat_cpor_part,
>  	mpam_feat_mbw_part,
>  	mpam_feat_mbw_min,
>  	mpam_feat_mbw_max,
> +	mpam_feat_mbw_prop,
> +	mpam_feat_intpri_part,
> +	mpam_feat_intpri_part_0_low,
> +	mpam_feat_dspri_part,
> +	mpam_feat_dspri_part_0_low,
>  	mpam_feat_msmon,
>  	mpam_feat_msmon_csu,
> +	mpam_feat_msmon_csu_capture,
> +	mpam_feat_msmon_csu_xcl,
>  	mpam_feat_msmon_csu_hw_nrdy,
>  	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_capture,
> +	mpam_feat_msmon_mbwu_rwbw,
>  	mpam_feat_msmon_mbwu_hw_nrdy,
> -	MPAM_FEATURE_LAST
> +	mpam_feat_partid_nrw,
> +	MPAM_FEATURE_LAST,

nit: drop the trailing , from MPAM_FEATURE_LAST. It confuses the diff.

>  };
>  


-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
  2025-10-20 12:43   ` Ben Horgan
  2025-10-20 15:44   ` Ben Horgan
@ 2025-10-21  9:51   ` Ben Horgan
  2025-10-22  0:29   ` Fenghua Yu
  2025-10-24 16:25   ` Jonathan Cameron
  4 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-21  9:51 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v2:
>  * Comment in Kconfig about why EXPERT.
>  * Dropped duplicate depends.
>  * Fixed duplicate return statement.
>  * Restructured driver probe to have a do_ function to allow breaks to be
>    return instead...
>  * Removed resctrl.h include, added spinlock.h
>  * Removed stray DT function prototype
>  * Removed stray PCC variables in struct mpam_msc.
>  * Used ccflags not cflags for debug define.
>  * Moved srcu header include to internal.h
>  * Moved mpam_msc_destroy() into this patch.
> 
> Changes since v1:
>  * Avoid selecting driver on other architectrues.
>  * Removed PCC support stub.
>  * Use for_each_available_child_of_node_scoped() and of_property_read_reg()
>  * Clarified a comment.
>  * Stopped using mpam_num_msc as an id,a and made it atomic.
>  * Size of -1 returned from cache_of_calculate_id()
>  * Renamed some struct members.
>  * Made a bunch of pr_err() dev_err_ocne().
>  * Used more cleanup magic.
>  * Inlined a print message.
>  * Fixed error propagation from mpam_dt_parse_resources().
>  * Moved cache accessibility checks earlier.
>  * Change cleanup macro to use IS_ERR_OR_NULL().
> 
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/acpi/arm64/mpam.c       |   7 ++
>  drivers/resctrl/Kconfig         |  13 +++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 190 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  52 +++++++++
>  include/linux/acpi.h            |   2 +-
>  9 files changed, 271 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c5e66d5d72cd..004d58cfbff8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
>  	select ACPI_MPAM if ACPI

If ARM64_MPAM is selected without selecting EXPERT then ACPI_MPAM is
selected but not ACPI_MPAM. When the whole series is applied this
configuration does not build as the mpam acpi code calls mpam_ris_create().

Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                     ` (2 preceding siblings ...)
  2025-10-21  9:51   ` Ben Horgan
@ 2025-10-22  0:29   ` Fenghua Yu
  2025-10-22 19:00     ` Tushar Dave
  2025-10-24 16:25   ` Jonathan Cameron
  4 siblings, 1 reply; 86+ messages in thread
From: Fenghua Yu @ 2025-10-22  0:29 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton, Gavin Shan

Hi, James,

On 10/17/25 11:56, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
[SNIP]> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	u32 affinity_id;
> +	int err;
> +
> +	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +				       &affinity_id);
> +	if (err)
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +	else
> +		acpi_pptt_get_cpus_from_container(affinity_id,
> +						  &msc->accessibility);
> +	return err;

The error is handled and there is no need to return the error to caller.
Returning the error causes probe failure and the mpam_msc driver cannot 
be installed.

s/return err;/return 0;/

> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_msc_destroy(struct mpam_msc *msc)
> +{
> +	struct platform_device *pdev = msc->pdev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&msc->all_msc_list);
> +	platform_set_drvdata(pdev, NULL);
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_msc_destroy(msc);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	synchronize_srcu(&mpam_srcu);
> +}
> +
> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	u32 tmp;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +	if (!msc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	mutex_init(&msc->probe_lock);
> +	mutex_init(&msc->part_sel_lock);
> +	msc->id = pdev->id;
> +	msc->pdev = pdev;
> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +	INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +	err = update_msc_accessibility(msc);
> +	if (err)
> +		return ERR_PTR(err);

The returned error causes probe failure and the driver cannot be 
installed. Return 0 will make the probe succeed.

There is no probe failure in mpam/snapshot/v6.18-rc1 because its 
returned err=0.

[SNIP]

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters
  2025-10-17 18:56 ` [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-10-22 11:23   ` Ben Horgan
  2025-10-24 18:24   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-22 11:23 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> mpam v0.1 and versions above v1.0 support optional long counter for
> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
> indicating support for long counters.
> 
> Probe these feature bits.
> 
> The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
> monitors are supported, instead of muddling this with which size of
> bandwidth monitors, add an explicit 31 bit counter feature.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [ morse: Added 31bit counter feature to simplify later logic ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Added 31 bit counter type feature.
>  * Altered commit message.
> ---
>  drivers/resctrl/mpam_devices.c  | 13 +++++++++++--
>  drivers/resctrl/mpam_internal.h |  3 +++
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index deb1dcc6f6b1..f4d07234ce10 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -777,16 +777,25 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
>  		}
>  		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
> -			bool hw_managed;
> +			bool has_long, hw_managed;
>  			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
>  
>  			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
> -			if (props->num_mbwu_mon)
> +			if (props->num_mbwu_mon) {
>  				mpam_set_feature(mpam_feat_msmon_mbwu, props);
> +				mpam_set_feature(mpam_feat_msmon_mbwu_31counter, props);
> +			}
>  
>  			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
>  				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
>  
> +			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumon_idr);
> +			if (props->num_mbwu_mon && has_long) {
> +				mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
> +				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumon_idr))
> +					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
> +			}
> +

I think it's clearer to set each mpam_feat_msmon_mbwu_XXcounter for just
the size of counter the hardware supports rather than all XX up to that
size.

>  			/* Is NRDY hardware managed? */
>  			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
>  			if (hw_managed)
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 28c475d18d86..ff38b4bbfc2b 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -158,6 +158,9 @@ enum mpam_device_features {
>  	mpam_feat_msmon_csu_xcl,
>  	mpam_feat_msmon_csu_hw_nrdy,
>  	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_31counter,
> +	mpam_feat_msmon_mbwu_44counter,
> +	mpam_feat_msmon_mbwu_63counter,
>  	mpam_feat_msmon_mbwu_capture,
>  	mpam_feat_msmon_mbwu_rwbw,
>  	mpam_feat_msmon_mbwu_hw_nrdy,

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported
  2025-10-17 18:56 ` [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-10-22 12:31   ` Ben Horgan
  2025-10-24 18:29   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-22 12:31 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton,
	Gavin Shan

Hi James,

On 10/17/25 19:56, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> Now that the larger counter sizes are probed, make use of them.
> 
> Callers of mpam_msmon_read() may not know (or care!) about the different
> counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
> driver pick the counter to use.
> 
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit, added explicit counter selection ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Removed mpam_feat_msmon_mbwu as a top-level bit for explicit 31bit counter
>    selection.
>  * Allow callers of mpam_msmon_read() to specify mpam_feat_msmon_mbwu and have
>    the driver pick a supported counter size.
>  * Rephrased commit message.
> 
> Changes since v1:
>  * Only clear OFLOW_STATUS_L on MBWU counters.
> 
> Changes since RFC:
>  * Commit message wrangling.
>  * Refer to 31 bit counters as opposed to 32 bit (registers).
> ---
>  drivers/resctrl/mpam_devices.c | 134 ++++++++++++++++++++++++++++-----
>  1 file changed, 116 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index f4d07234ce10..c207a6d2832c 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -897,6 +897,48 @@ struct mon_read {
[...]
> +static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
> +{
> +	mpam_mon_sel_lock_held(msc);
> +
> +	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
> +	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
> +}
> +
[...]
> @@ -978,10 +1027,15 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  		mpam_write_monsel_reg(msc, CSU, 0);
>  		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>  		break;
> -	case mpam_feat_msmon_mbwu:
> +	case mpam_feat_msmon_mbwu_44counter:
> +	case mpam_feat_msmon_mbwu_63counter:
> +		mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
> +		fallthrough;
> +	case mpam_feat_msmon_mbwu_31counter:
>  		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>  		mpam_write_monsel_reg(msc, MBWU, 0);

Already zeroed if it's a long counter.

> +
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>  
>  		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
[...]
> +static enum mpam_device_features mpam_msmon_choose_counter(struct mpam_class *class)
> +{
> +	struct mpam_props *cprops = &class->props;
> +
> +	if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, cprops))
> +		return mpam_feat_msmon_mbwu_44counter;

This should check the longest counter first.

> +	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, cprops))
> +		return mpam_feat_msmon_mbwu_63counter;
> +
> +	return mpam_feat_msmon_mbwu_31counter;
> +}
> +
-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-10-17 18:56 ` [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
  2025-10-20 10:45   ` Ben Horgan
@ 2025-10-22 12:58   ` Jeremy Linton
  2025-10-24 14:22     ` Jonathan Cameron
  1 sibling, 1 reply; 86+ messages in thread
From: Jeremy Linton @ 2025-10-22 12:58 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Gavin Shan

Hi,

This is largely looking pretty solid, but..


On 10/17/25 1:56 PM, James Morse wrote:
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>   * Removed stray cleanup useage in preference for acpi_get_pptt().
>   * Removed WARN_ON_ONCE() for symmetry with other helpers.
>   * Dropped restriction on unified caches.
> 
> Changes since v1:
>   * Added punctuation to the commit message.
>   * Removed a comment about an alternative implementaion.
>   * Made the loop continue with a warning if a CPU is missing from the PPTT.
> 
> Changes since RFC:
>   * acpi_count_levels() now returns a value.
>   * Converted the table-get stuff to use Jonathan's cleanup helper.
>   * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>   drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  6 +++++
>   2 files changed, 70 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 50c8f2a3c927..2f86f58699a6 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -985,3 +985,67 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>   
>   	return -ENOENT;
>   }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	int level, cpu;
> +	u32 acpi_cpu_id;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_table_header *table;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	cpumask_clear(cpus);
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			continue;
> +
> +		/* Start at 1 for L1 */
> +		level = 1;
> +		cache = acpi_find_any_type_cache_node(table, acpi_cpu_id, level,
> +						      &cpu_node);
> +		while (cache) {
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache, sizeof(*cache));

Is the core acpi definition in actbl2.h correct? Shouldn't it be 
something along the lines of:

struct acpi_pptt_cache_v1 {
  struct acpi_subtable_header header;
  u16 reservedd;
  u32 flags;
  u32 next_level_of_cache;
  u32 size;
  u32 number_of_sets;
  u8 associativity;
  u8 attributes;
  u16 lines_size;
  u32 cache_id;
}


Then that solves the detection of the additional field problem correctly 
because the length (24 vs 28) of the subtable then tells you which 
version your dealing with. (and goes back to why much of this is coded 
to use ACPI_ADD_PTR rather than structure+ logic.)


Thanks,






> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache, sizeof(*cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				cpumask_set_cpu(cpu, cpus);
> +
> +			level++;
> +			cache = acpi_find_any_type_cache_node(table, acpi_cpu_id,
> +							      level, &cpu_node);
> +		}
> +	}
> +
> +	return 0;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index be074bdfd4d1..a9dbacabdf89 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1543,6 +1543,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
>   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>   void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>   int find_acpi_cache_level_from_id(u32 cache_id);
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
>   #else
>   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>   {
> @@ -1570,6 +1571,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
>   {
>   	return -ENOENT;
>   }
> +static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
> +						      cpumask_t *cpus)
> +{
> +	return -ENOENT;
> +}
>   #endif
>   
>   void acpi_arch_init(void);


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-17 18:56 ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-10-22 13:39   ` Zeng Heng
  2025-10-22 16:17     ` Ben Horgan
  2025-10-24 18:22   ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management Jonathan Cameron
  2025-10-29  7:56   ` [PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
  2 siblings, 1 reply; 86+ messages in thread
From: Zeng Heng @ 2025-10-22 13:39 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	zengheng4, wangkefeng.wang

Bandwidth counters need to run continuously to correctly reflect the
bandwidth. When reading the previously configured MSMON_CFG_MBWU_CTL,
software must recognize that the MSMON_CFG_x_CTL_OFLOW_STATUS bit may
have been set by hardware because of the counter overflow.

The existing logic incorrectly treats this bit as an indication that the
monitor configuration has been changed and consequently zeros the MBWU
statistics by mistake.

Also fix the handling of overflow amount calculation. There's no need to
subtract mbwu_state->prev_val when calculating overflow_val.

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
 drivers/resctrl/mpam_devices.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0dd048279e02..06f3ec9887d2 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
 	config_mismatch = cur_flt != flt_val ||
-			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+			 (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
+			 (ctl_val | MSMON_CFG_x_CTL_EN);
 
 	if (config_mismatch || reset_on_next_read)
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
@@ -1138,8 +1139,9 @@ static void __ris_msmon_read(void *arg)
 		mbwu_state = &ris->mbwu_state[ctx->mon];
 
 		/* Add any pre-overflow value to the mbwu_state->val */
-		if (mbwu_state->prev_val > now)
-			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;
+		if (mbwu_state->prev_val > now &&
+		   (cur_ctl & MSMON_CFG_x_CTL_OFLOW_STATUS))
+			overflow_val = mpam_msmon_overflow_val(ris);
 
 		mbwu_state->prev_val = now;
 		mbwu_state->correction += overflow_val;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-22 13:39   ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Zeng Heng
@ 2025-10-22 16:17     ` Ben Horgan
  2025-10-25  8:45       ` Zeng Heng
  2025-10-25  9:01       ` Zeng Heng
  0 siblings, 2 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-22 16:17 UTC (permalink / raw)
  To: Zeng Heng, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang

Hi Zeng,

On 10/22/25 14:39, Zeng Heng wrote:
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth. When reading the previously configured MSMON_CFG_MBWU_CTL,
> software must recognize that the MSMON_CFG_x_CTL_OFLOW_STATUS bit may
> have been set by hardware because of the counter overflow.
> 
> The existing logic incorrectly treats this bit as an indication that the
> monitor configuration has been changed and consequently zeros the MBWU
> statistics by mistake.

By zero-ing when the overflow bit is set we miss out on the counts after
the overflow and before the zero-ing. Do I understand correctly, that
this what this patch is aiming to fix?

> 
> Also fix the handling of overflow amount calculation. There's no need to
> subtract mbwu_state->prev_val when calculating overflow_val.

Why not? Isn't this the pre-overflow part that we are missing from the
running count?

> 
> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
> ---
>  drivers/resctrl/mpam_devices.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 0dd048279e02..06f3ec9887d2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>  	clean_msmon_ctl_val(&cur_ctl);
>  	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>  	config_mismatch = cur_flt != flt_val ||
> -			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
> +			 (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
> +			 (ctl_val | MSMON_CFG_x_CTL_EN);

This only considers 31 bit counters. I would expect any change here to
consider all lengths of counter. Also, as the overflow bit is no longer
reset due to the config mismatch it needs to be reset somewhere else.

>  
>  	if (config_mismatch || reset_on_next_read)
>  		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> @@ -1138,8 +1139,9 @@ static void __ris_msmon_read(void *arg)
>  		mbwu_state = &ris->mbwu_state[ctx->mon];
>  
>  		/* Add any pre-overflow value to the mbwu_state->val */
> -		if (mbwu_state->prev_val > now)
> -			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;
> +		if (mbwu_state->prev_val > now &&
> +		   (cur_ctl & MSMON_CFG_x_CTL_OFLOW_STATUS))
> +			overflow_val = mpam_msmon_overflow_val(ris);
>  
>  		mbwu_state->prev_val = now;
>  		mbwu_state->correction += overflow_val;


Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-22  0:29   ` Fenghua Yu
@ 2025-10-22 19:00     ` Tushar Dave
  0 siblings, 0 replies; 86+ messages in thread
From: Tushar Dave @ 2025-10-22 19:00 UTC (permalink / raw)
  To: Fenghua Yu, James Morse, linux-kernel, linux-arm-kernel,
	linux-acpi
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, baisheng.gao, Jonathan Cameron, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Jeremy Linton, Gavin Shan



On 10/21/25 7:29 PM, Fenghua Yu wrote:
> Hi, James,
> 
> On 10/17/25 11:56, James Morse wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
> [SNIP]> +/*
>> + * An MSC can control traffic from a set of CPUs, but may only be accessible
>> + * from a (hopefully wider) set of CPUs. The common reason for this is power
>> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
>> + * corresponding cache may also be powered off. By making accesses from
>> + * one of those CPUs, we ensure this isn't the case.
>> + */
>> +static int update_msc_accessibility(struct mpam_msc *msc)
>> +{
>> +    u32 affinity_id;
>> +    int err;
>> +
>> +    err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
>> +                       &affinity_id);
>> +    if (err)
>> +        cpumask_copy(&msc->accessibility, cpu_possible_mask);
>> +    else
>> +        acpi_pptt_get_cpus_from_container(affinity_id,
>> +                          &msc->accessibility);
>> +    return err;
> 
> The error is handled and there is no need to return the error to caller.
> Returning the error causes probe failure and the mpam_msc driver cannot be 
> installed.

Ack. I see the probe failure too.

e.g.

[    7.118297] mpam_msc mpam_msc.183: probe with driver mpam_msc failed with 
error -22
[    7.118383] mpam_msc mpam_msc.370: probe with driver mpam_msc failed with 
error -22
[   10.208127]     # Subtest: mpam_devices_test_suite
[   10.208129]     # module: mpam
[   10.208215]     ok 1 test_mpam_reset_msc_bitmap
[   10.208275] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[   10.208285] mpam:__props_mismatch: cleared cpor_part
[   10.208287] mpam:__props_mismatch: cleared mbw_part
[   10.208294] mpam:__props_mismatch: took the min bwa_wd
[   10.208296] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[   10.208310] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[   10.208345]     ok 2 test_mpam_enable_merge_features
[   10.208411] # mpam_devices_test_suite: pass:3 fail:0 skip:0 total:3
[   10.208413] ok 1 mpam_devices_test_suite

> 
> s/return err;/return 0;/

Yes, this resolve the probe failure.

Tested-by: Tushar Dave <tdave@nvidia.com>

> 
>> +}
>> +
>> +static int fw_num_msc;
>> +
>> +static void mpam_msc_destroy(struct mpam_msc *msc)
>> +{
>> +    struct platform_device *pdev = msc->pdev;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_del_rcu(&msc->all_msc_list);
>> +    platform_set_drvdata(pdev, NULL);
>> +}
>> +
>> +static void mpam_msc_drv_remove(struct platform_device *pdev)
>> +{
>> +    struct mpam_msc *msc = platform_get_drvdata(pdev);
>> +
>> +    if (!msc)
>> +        return;
>> +
>> +    mutex_lock(&mpam_list_lock);
>> +    mpam_msc_destroy(msc);
>> +    mutex_unlock(&mpam_list_lock);
>> +
>> +    synchronize_srcu(&mpam_srcu);
>> +}
>> +
>> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +    int err;
>> +    u32 tmp;
>> +    struct mpam_msc *msc;
>> +    struct resource *msc_res;
>> +    struct device *dev = &pdev->dev;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +    if (!msc)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    mutex_init(&msc->probe_lock);
>> +    mutex_init(&msc->part_sel_lock);
>> +    msc->id = pdev->id;
>> +    msc->pdev = pdev;
>> +    INIT_LIST_HEAD_RCU(&msc->all_msc_list);
>> +    INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +    err = update_msc_accessibility(msc);
>> +    if (err)
>> +        return ERR_PTR(err);
> 
> The returned error causes probe failure and the driver cannot be installed. 
> Return 0 will make the probe succeed.
> 
> There is no probe failure in mpam/snapshot/v6.18-rc1 because its returned err=0.
> 
> [SNIP]
> 
> Thanks.
> 
> -Fenghua
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 00/29] arm_mpam: Add basic mpam driver
  2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
                   ` (29 preceding siblings ...)
  2025-10-18  1:01 ` [PATCH v3 00/29] arm_mpam: Add basic mpam driver Fenghua Yu
@ 2025-10-23  8:15 ` Shaopeng Tan (Fujitsu)
  30 siblings, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-23  8:15 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

Hello James,

> This series is based on v6.18-rc4, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/driver/v3
> 
> The rest of the driver can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/snapshot/v6.18-rc1

This series(mpam/driver/v3) based on v6.18-rc1 and mpam driver(mpam/snapshot/v6.18-rc1) can not run on my machine(The cause is still unknown),
But I applied this series and mpam driver on v6.18-rc2, they are run successfully.

Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

However, I make a few minor fixes as follows:

1)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index f890d1381af6..dd6041ae7cc9 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -645,6 +645,13 @@ static inline void mpam_resctrl_teardown_class(struct mpam_class *class) { }
 #define MPAMF_IIDR_VARIANT     GENMASK(19, 16)
 #define MPAMF_IIDR_PRODUCTID   GENMASK(31, 20)
+#define MPAMF_IIDR_REVISON_SHIFT       12
+//#define MPAMF_IIDR_REVISION_SHIFT    12
+#define MPAMF_IIDR_IMPLEMENTER_SHIFT    0
+#define MPAMF_IIDR_VARIANT_SHIFT    16
+#define MPAMF_IIDR_PRODUCTID_SHIFT    20

or
#define IIDR_REV(x)     ((x) << MPAMF_IIDR_REVISON_SHIFT)
s/MPAMF_IIDR_REVISON_SHIFT/MPAMF_IIDR_REVISION_SHIFT/


2)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 0ea76b7783b6..99b2bbb80a5a 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1113,9 +1118,9 @@ static void mpam_resctrl_pick_counters(void)
                                        update_rmid_limits(cache_size);

                                counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
-                               return;
+                               break;
                        default:
-                               return;
+                               break;
                        }
                }


3)
when building this series(mpam/driver/v3),the `EXPERT` Kconfig option needs to be explicitly enabled.
This aligns with Ben's observation in the following patch:
https://lore.kernel.org/lkml/146ad8f4-ef6c-48cb-aed8-db619c8258a8@arm.com/


Best regards,
Shaopeng TAN


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-10-17 18:56 ` [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-10-24 11:26   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 11:26 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:17 +0000
James Morse <james.morse@arm.com> wrote:

> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT to indicate the subset of CPUs and cache topology that can
> access each MPAM System Component (MSC).
> 
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
Hi James,

V2 has dropped out of my memory more or less so to get this back in
I'll do a fresh review (even of ones I've already given an RB on).
Nothing here to change that tag. Just some naming / comment suggestions
that I think would help a little with readability.

Some of these comments come from me forgetting how the spec named
things and so going to take a look.  The spec isn't consistent with the
naming (e.g. ACPI PROCESSOR ID is not necessarily a processor
ID) but I think keeping closer to spec names will help readers. 

Note this may all be in the category of perfect being the enemy of
good + upstream so I don't mind if you ignore.

> ---
> Changes since v2:
>  * Grouped two nested if clauses differently to reduce scope of cpu_node.
>  * Removed stale comment refering to the return value.
> 
> Changes since v1:
>  * Replaced commit message with wording from Dave.
>  * Fixed a stray plural.
>  * Moved further down in the file to make use of get_pptt() helper.
>  * Added a break to exit the loop early.
> 
> Changes since RFC:
>  * Removed leaf_flag local variable from acpi_pptt_get_cpus_from_container()
> 
> Changes since RFC:
>  * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
>  * Added missing : in kernel-doc
>  * Made helper return void as this never actually returns an error.
> ---
>  drivers/acpi/pptt.c  | 82 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  3 ++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..58cfa3916a13 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -817,3 +817,85 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>  					  ACPI_PPTT_ACPI_IDENTICAL);
>  }
> +
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node

The spec calls these Processor Hierarchy Node Structures. I think the
addition of the Hierarchy word will help people understand this isn't
finding things below a node specific to a processor but to some higher
level hierarchy structure.

> + * @table_hdr:		A reference to the PPTT table.
> + * @parent_node:	A pointer to the processor node in the @table_hdr.

Likewise, calling this a "processor hierarchy node" would make things
clearer.

> + * @cpus:		A cpumask to fill with the CPUs below @parent_node.
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> +				     struct acpi_pptt_processor *parent_node,
> +				     cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;

This is definitely a processor hierarchy node.  To my mind
cpu_node doesn't convey this.  See below for more, but perhaps just renaming
it to include hierarchy in the name would help.

> +	u32 acpi_id;
> +	int cpu;
> +
> +	cpumask_clear(cpus);
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);

Here it is indeed a CPU.

> +
> +		while (cpu_node) {
> +			if (cpu_node == parent_node) {
> +				cpumask_set_cpu(cpu, cpus);
> +				break;
> +			}
> +			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);

But here it is a processor container or a private node.  So
cpu_hieriarchy_node or something along those lines would be more appropriate.

> +		}
> +	}
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor container
> + * @acpi_cpu_id:	The UID of the processor container.
> + * @cpus:		The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor

I'd go with 'Processor Hierarchy' here
 
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 proc_sz;
> +
> +	cpumask_clear(cpus);
> +
> +	table_hdr = acpi_get_pptt();
> +	if (!table_hdr)
> +		return;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +	proc_sz = sizeof(struct acpi_pptt_processor);
> +	while ((unsigned long)entry + proc_sz <= table_end) {
> +
> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR) {
> +			struct acpi_pptt_processor *cpu_node;
similar naming thing here. 
> +
> +			cpu_node = (struct acpi_pptt_processor *)entry;
> +			if (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID &&
> +			    !acpi_pptt_leaf_node(table_hdr, cpu_node) &&
> +			    cpu_node->acpi_processor_id == acpi_cpu_id) {
> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
The double tab indent here is odd.
			if (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID &&
			    !acpi_pptt_leaf_node(table_hdr, cpu_node) &&
			    cpu_node->acpi_processor_id == acpi_cpu_id) {
				acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);

Isn't find I think for readability.  I could understand the bonus tab if it was
close to aligning with the line above but it isn't.

> +					break;
> +			}
> +		}
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +}



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-10-17 18:56 ` [PATCH v3 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-10-24 11:29   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 11:29 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:18 +0000
James Morse <james.morse@arm.com> wrote:

> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter.  The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
> 
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
> 
> Get rid of the levels parameter, which has no remaining purpose.
> 
> Fix acpi_get_cache_info() to match.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Another meh, the name is confusing type comment. 

> -static void acpi_count_levels(struct acpi_table_header *table_hdr,
> -			      struct acpi_pptt_processor *cpu_node,
> -			      unsigned int *levels, unsigned int *split_levels)
> +static int acpi_count_levels(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node,
> +			     unsigned int *split_levels)
>  {
> +	int starting_level = 0;
> +
>  	do {
> -		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
> +		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
>  		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>  	} while (cpu_node);
> +
> +	return starting_level;
Given it's not the starting level at this point... Maybe just call it level or current_level.
>  }
>  
>  /**
> @@ -645,7 +649,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
>  	if (!cpu_node)
>  		return -ENOENT;
>  
> -	acpi_count_levels(table, cpu_node, levels, split_levels);
> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
>  
>  	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
>  		 *levels, split_levels ? *split_levels : -1);


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id
  2025-10-17 18:56 ` [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
  2025-10-20 10:34   ` Ben Horgan
@ 2025-10-24 14:15   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 14:15 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:19 +0000
James Morse <james.morse@arm.com> wrote:

> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes sinec v2:
>  * Search all caches, not just unified caches. This removes the need to count
>    the caches first, but means a failure to find the table walks the table
>    three times for different cache types.

Fwiw that sentence doesn't make sense to me. Too many tables.


>  * Fixed return value of the no-acpi stub.
>  * Punctuation typo in a comment,
>  * Keep trying to parse the table even if a bogus CPU is encountered.
>  * Specified CPUs share caches with other CPUs.
Trivial comment only from me.  Ben's question on matching an ID against an
l1 instruction cache needs addressing (or ruling out as a 'won't fix')
though before an RB is appropriate.

>  /**
>   * update_cache_properties() - Update cacheinfo for the given processor
>   * @this_leaf: Kernel cache info structure being updated
> @@ -903,3 +924,64 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>  				     entry->length);
>  	}
>  }
> +
> +/*

Smells like kernel doc.  Why not /** ?
Then can at least verify formatting etc.

> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the cache
> + *
> + * Determine the level relative to any CPU for the cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later.
> + *
> + * If one CPU's L2 is shared with another CPU as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
> + * the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-10-22 12:58   ` Jeremy Linton
@ 2025-10-24 14:22     ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 14:22 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Rob Herring,
	Rohit Mathew, Rafael Wysocki, Len Brown, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Gavin Shan

On Wed, 22 Oct 2025 07:58:36 -0500
Jeremy Linton <jeremy.linton@arm.com> wrote:

> Hi,
> 
> This is largely looking pretty solid, but..
> 
> 
> On 10/17/25 1:56 PM, James Morse wrote:
> > MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> > 
> > The driver needs to know which CPUs are associated with the cache.
> > The CPUs may not all be online, so cacheinfo does not have the
> > information.
> > 
> > Add a helper to pull this information out of the PPTT.
> > 
> > CC: Rohit Mathew <Rohit.Mathew@arm.com>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> > Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> > ---
> > Changes since v2:
> >   * Removed stray cleanup useage in preference for acpi_get_pptt().
> >   * Removed WARN_ON_ONCE() for symmetry with other helpers.
> >   * Dropped restriction on unified caches.
> > 
> > Changes since v1:
> >   * Added punctuation to the commit message.
> >   * Removed a comment about an alternative implementaion.
> >   * Made the loop continue with a warning if a CPU is missing from the PPTT.
> > 
> > Changes since RFC:
> >   * acpi_count_levels() now returns a value.
> >   * Converted the table-get stuff to use Jonathan's cleanup helper.
> >   * Dropped Sudeep's Review tag due to the cleanup change.
> > ---
> >   drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
> >   include/linux/acpi.h |  6 +++++
> >   2 files changed, 70 insertions(+)
> > 
> > diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> > index 50c8f2a3c927..2f86f58699a6 100644
> > --- a/drivers/acpi/pptt.c
> > +++ b/drivers/acpi/pptt.c
> > @@ -985,3 +985,67 @@ int find_acpi_cache_level_from_id(u32 cache_id)
> >   
> >   	return -ENOENT;
> >   }
> > +
> > +/**
> > + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> > + *					   specified cache
> > + * @cache_id: The id field of the cache
> > + * @cpus: Where to build the cpumask
> > + *
> > + * Determine which CPUs are below this cache in the PPTT. This allows the property
> > + * to be found even if the CPUs are offline.
> > + *
> > + * The PPTT table must be rev 3 or later,
> > + *
> > + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> > + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> > + */
> > +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> > +{
> > +	int level, cpu;
> > +	u32 acpi_cpu_id;
> > +	struct acpi_pptt_cache *cache;
> > +	struct acpi_table_header *table;
> > +	struct acpi_pptt_cache_v1 *cache_v1;
> > +	struct acpi_pptt_processor *cpu_node;
> > +
> > +	cpumask_clear(cpus);
> > +
> > +	table = acpi_get_pptt();
> > +	if (!table)
> > +		return -ENOENT;
> > +
> > +	if (table->revision < 3)
> > +		return -ENOENT;
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> > +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> > +		if (!cpu_node)
> > +			continue;
> > +
> > +		/* Start at 1 for L1 */
> > +		level = 1;
> > +		cache = acpi_find_any_type_cache_node(table, acpi_cpu_id, level,
> > +						      &cpu_node);
> > +		while (cache) {
> > +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> > +						cache, sizeof(*cache));  
> 
> Is the core acpi definition in actbl2.h correct? Shouldn't it be 
> something along the lines of:
> 
> struct acpi_pptt_cache_v1 {
>   struct acpi_subtable_header header;
>   u16 reservedd;
>   u32 flags;
>   u32 next_level_of_cache;
>   u32 size;
>   u32 number_of_sets;
>   u8 associativity;
>   u8 attributes;
>   u16 lines_size;
>   u32 cache_id;
> }
> 
> 
> Then that solves the detection of the additional field problem correctly 
> because the length (24 vs 28) of the subtable then tells you which 
> version your dealing with. (and goes back to why much of this is coded 
> to use ACPI_ADD_PTR rather than structure+ logic.)
> 

Do we want to deal with arguing the change in ACPICA? 
I fully agree that it would be much nicer if that didn't use this weird
bits of structures approach :(  

https://github.com/acpica/acpica/blob/master/source/include/actbl2.h#L3497
is where this is coming from.

Maybe can do it in parallel. Rafael, what do you think is best way forwards
with this?

Jonathan

> 
> Thanks,
> 
> 
> 
> 
> 
> 
> > +			if (!cache)
> > +				continue;
> > +
> > +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> > +						cache, sizeof(*cache));
> > +
> > +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> > +			    cache_v1->cache_id == cache_id)
> > +				cpumask_set_cpu(cpu, cpus);
> > +
> > +			level++;
> > +			cache = acpi_find_any_type_cache_node(table, acpi_cpu_id,
> > +							      level, &cpu_node);
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> > index be074bdfd4d1..a9dbacabdf89 100644
> > --- a/include/linux/acpi.h
> > +++ b/include/linux/acpi.h
> > @@ -1543,6 +1543,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
> >   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> >   void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> >   int find_acpi_cache_level_from_id(u32 cache_id);
> > +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
> >   #else
> >   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
> >   {
> > @@ -1570,6 +1571,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
> >   {
> >   	return -ENOENT;
> >   }
> > +static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
> > +						      cpumask_t *cpus)
> > +{
> > +	return -ENOENT;
> > +}
> >   #endif
> >   
> >   void acpi_arch_init(void);  
> 
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table
  2025-10-17 18:56 ` [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table James Morse
  2025-10-20 12:29   ` Ben Horgan
@ 2025-10-24 16:13   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 16:13 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:22 +0000
James Morse <james.morse@arm.com> wrote:

> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> This happens in two stages. Platform devices are created first for the
> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
> to discover the RIS entries the MSC contains.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data about the RIS entries.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> 
Hi James,

Various comments inline.  One challenge with this is that a few
very generic things (the acpi table DEFINE_FREE() stuff and the
platform device put equivalent) aren't obvious enough that I'd expect
those concerned to necessarily notice them.  I'd break those out
as precursor patches, or narrow what they do to not be generic.


> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..59712397025d
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,377 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.

This needs to ultimately end up in ACPICA.  Perhaps fine to have it here
for now. Maybe can't happen until this isn't a beta? I'm not sure on
policy around that.

> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE                              BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                         GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID                    BIT(4)
> +
> +/*
> + * Encodings for the MSC node body interface type field.
> + * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
> +#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
> +

> +static bool acpi_mpam_register_irq(struct platform_device *pdev, int intid,
> +				   u32 flags, int *irq)
Given irq value of 0 is invalid, could just return the irq from this.
Or even return error codes for what went wrong given negative irqs are invalid
as well.

> +{
> +	u32 int_type;
> +	int sense;
> +
> +	if (!intid)
> +		return false;
> +
> +	if (_is_ppi_partition(flags))
> +		return false;
> +
> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);

Given it's one bit that indicates 1 if edge, I'd call this edge (which
inherently doesn't mean level).  Sense as a term I think can incorporate
this and rising/falling high/low.  It's parse to acpi_register_gsi()
which calls it trigger so that would work as well.

> +	int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
> +	if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return false;
> +
> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> +	if (*irq <= 0) {
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> +			    intid);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, intid;
> +	int irq;
> +
> +	intid = tbl_msc->overflow_interrupt;
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	if (acpi_mpam_register_irq(pdev, intid, flags, &irq))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> +
> +	intid = tbl_msc->error_interrupt;
> +	flags = tbl_msc->error_interrupt_flags;
> +	if (acpi_mpam_register_irq(pdev, intid, flags, &irq))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +


This function has a bit of a dual use to it. I think it needs some documentation
to explain under what conditions acpi_id is set.

> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int err;
> +
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (err >= sizeof(uid)) {

The err naming is a bit confusing as it's not an error code. Could call it
something like len or just avoid having a local variable at all.

	if (snprintf(uid, sizeof(uid), "%u",
		     tbl_msc->instance_id_linked_device) >= sizeof(uid))
> +		pr_debug("Failed to convert uid of device for power management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}

> +
> +static struct platform_device * __init acpi_mpam_parse_msc(struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	struct platform_device *pdev __free(platform_device_put) = platform_device_alloc("mpam_msc", tbl_msc->identifier);
That's too long. Format as
	struct platform_device *pdev __free(platform_device_put) =
		platform_device_alloc("mpam_msc", tbl_msc->identifier);

> +	int next_res = 0, next_prop = 0, err;

...

> +
> +static int __init acpi_mpam_parse(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	struct platform_device *pdev;
> +
> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
Check acpi_disabled || !system_supports_mpam() before

	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
	
	if (IS_ERR(table))
		return 0;

That way we aren't relying on acpi_get_table_ret() doing the right thing if acpi_disabled == true
which seems gragile even though it is fine today

Declaring stuff using __free() inline is fine (commonly done).

> +		return 0;
> +

...

> +int acpi_mpam_count_msc(void)
> +{
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (IS_ERR(table))
Why fewer things to guard on in here than in the parse function?
I guess this may only be called in circumstances where we know those
other reasons not to carry on aren't true, but still seem odd.

> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		if (!tbl_msc->mmio_size)

Bit marginal on how much we protect against garbage, but perhaps
check the length is long enough to get here. Or can you just
do the length checks first?

> +			continue;

Infinite loop?  table_offset isn't updated so this
keeps checking the same value. Similar to funciton above, I'd
do the table_offset update before check this (and modify
the appropriate checks below).

> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +		if (tbl_msc->length > table_end - table_offset)
> +			return -EINVAL;
> +		table_offset += tbl_msc->length;
> +
> +		count++;
> +	}
> +
> +	return count;
> +}
 
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index a9dbacabdf89..9d66421f68ff 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
I wonder if it's worth instance being a variable.  11 instances of 1, 52 instances of 0
(almost nothing else).  Maybe just about worth it...

Whilst I like the general nature of this I wonder if smoother
to merge acpi_get_table_mpam() that does this first and then
look at generalizing when it's not on the critical path for this
patch set?  If ACPI folk are fine with this I don't mind it being
in here though as generally useful to have.
(I may well be arguing with earlier me on this :)

> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))

Ben raised earlier comment on checking for NULL as well to let the compiler optimize
things better.

> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..3d6c39c667c3
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>

...

> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */

Curious comment.  As opposed to the lesser known l5?   I get
what you mean about not including the weird like memory-side caches
but TLBs are also well known caches so I'd just go with
/* Caches, e.g. l2, l3 */
or something like that.

> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +/* Parse the ACPI description of resources entries for this MSC. */

I'd push more detailed documentation down along side the implementation
rather than having it here.  The function name and parameters make this fairly
obvious anyway.

> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else

> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index 074754c23d33..23a30ada2d4c 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
>  extern int platform_device_add(struct platform_device *pdev);
>  extern void platform_device_del(struct platform_device *pdev);
>  extern void platform_device_put(struct platform_device *pdev);
> +DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))

I'd break this out as a separate precursor patch mostly so people notice it.
Likely will get some review from folk who aren't going to spot it down here.
They may well tell you to not have it as a separate patch but at least you'll
be sure it was noticed!

Jonathan

>  
>  struct platform_driver {
>  	int (*probe)(struct platform_device *);


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                     ` (3 preceding siblings ...)
  2025-10-22  0:29   ` Fenghua Yu
@ 2025-10-24 16:25   ` Jonathan Cameron
  4 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 16:25 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:23 +0000
James Morse <james.morse@arm.com> wrote:

> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Trying not to replicate comments too much...

A few things inline but others found bigger stuff to fix.
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..58c83b5c8bfd
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,13 @@
> +menuconfig ARM64_MPAM_DRIVER
> +	bool "MPAM driver"
> +	depends on ARM64 && ARM64_MPAM && EXPERT
> +	help
> +	  MPAM driver for System IP, e,g. caches and memory controllers.

Bit minimal for help text :)

> +
> +if ARM64_MPAM_DRIVER

I'd add a blank line here.

> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver"
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> +
> +endif

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..d18eeec95f79
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c

> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	u32 tmp;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +	if (!msc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	mutex_init(&msc->probe_lock);
Maybe worth
	err = devm_mutex_init(&msc->probe_lock);
	if (err)
		return err;
to enable the mutex debugging if anyone wants it.  I've stopped trying
to analyze whether that is useful or not, now it is easy to add to drivers
already doing devm.

> +	mutex_init(&msc->part_sel_lock);
> +	msc->id = pdev->id;
> +	msc->pdev = pdev;
> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +	INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +	err = update_msc_accessibility(msc);
> +	if (err)
> +		return ERR_PTR(err);
> +	if (cpumask_empty(&msc->accessibility)) {
> +		dev_err_once(dev, "MSC is not accessible from any CPU!");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
> +		msc->iface = MPAM_IFACE_MMIO;
> +	else
> +		msc->iface = MPAM_IFACE_PCC;
> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		void __iomem *io;
> +
> +		io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +							    &msc_res);
> +		if (IS_ERR(io)) {
> +			dev_err_once(dev, "Failed to map MSC base address\n");
> +			return (void *)io;

ERR_CAST() is there to make this stuff more obvious

> +		}
> +		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +		msc->mapped_hwpage = io;
> +	}
> +
> +	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +	platform_set_drvdata(pdev, msc);
> +
> +	return msc;
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc = NULL;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	msc = do_mpam_msc_drv_probe(pdev);
> +	mutex_unlock(&mpam_list_lock);
> +	if (!IS_ERR(msc)) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +		if (err)
> +			mpam_msc_drv_remove(pdev);
> +	} else {
> +		err = PTR_ERR(msc);
> +	}
> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> +		pr_info("Discovered all MSC\n");
> +
> +	return err;
> +}

> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..6ac75f3613c3
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/sizes.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>

Includes need another look.

Should be seeing the list header for starters and
mailbox_client.h doesn't make sense yet.  Some of the
others may need pushing to the patches where they are
first used or pushing down into the c files that need them.

> +
> +struct platform_device;
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        all_msc_list;
> +
> +	int			id;
> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only taken during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs;
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +#endif /* MPAM_INTERNAL_H */



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 08/29] arm_mpam: Add the class and component structures for firmware described ris
  2025-10-17 18:56 ` [PATCH v3 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
@ 2025-10-24 16:47   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 16:47 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Ben Horgan

n Fri, 17 Oct 2025 18:56:24 +0000
James Morse <james.morse@arm.com> wrote:

> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> Add support for creating and destroying structures to allow a hierarchy
> of resources to be created.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
A few minor things inline.  Mostly code ordering related to make
it easier to review!

> ---
>  drivers/resctrl/mpam_devices.c  | 390 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  93 ++++++++
>  include/linux/arm_mpam.h        |   8 +-
>  3 files changed, 483 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index d18eeec95f79..8685e50f08c6 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -30,7 +30,7 @@
>  static DEFINE_MUTEX(mpam_list_lock);
>  static LIST_HEAD(mpam_all_msc);
>  
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;

Meh. Others may be fussier about this but I'd rather you just
added the extern when this was first introduced and didn't
have this churn here.


> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_comp_destroy(comp);
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
I'd rather see the create / destroy next to each other if possible.
Makes it easier to check this unwinds the creat path.

> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	/*
> +	 * It is assumed affinities don't overlap. If they do the class becomes
> +	 * unusable immediately.
> +	 */
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, &msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
Can you reorder these so that they are reverse of what happens in create path?
Makes not real difference other than slightly easier to check everything is done.
Right now I'm failing to spot where this was added to ris->msc_list in the
create path.


> +	add_to_garbage(ris);
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
> +

> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + * This helper walks the device tree to include offline CPUs too.

Comment stale?  It does walk a tree of devices but I'm not sure that's
what people will read device tree as meaning.

> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity)
> +{
> +	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
> +}

> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
...

> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(&ris->garbage);
> +	ris->garbage.pdev = pdev;
I wonder if it's cleaner to just pass the pdev (sometimes null) in
as a parameter to init_garbage()
> +
> +	class = mpam_class_find(class_id, type);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);



> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 6ac75f3613c3..1a5d96660382 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h

> +/*
> + * Structures protected by SRCU may not be freed for a surprising amount of
> + * time (especially if perf is running). To ensure the MPAM error interrupt can
> + * tear down all the structures, build a list of objects that can be gargbage

Spell check.  garbage

> + * collected once synchronize_srcu() has returned.
> + * If pdev is non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> +	/* member of mpam_garbage */
> +	struct llist_node	llist;
> +
> +	void			*to_free;
> +	struct platform_device	*pdev;
> +};


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-10-17 18:56 ` [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
  2025-10-17 23:03   ` Fenghua Yu
@ 2025-10-24 17:32   ` Jonathan Cameron
  2025-10-27 16:33     ` Ben Horgan
  2025-10-29  6:37   ` Shaopeng Tan (Fujitsu)
  2 siblings, 1 reply; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 17:32 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Ben Horgan

On Fri, 17 Oct 2025 18:56:25 +0000
James Morse <james.morse@arm.com> wrote:

> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
> 
> Link: https://developer.arm.com/documentation/ihi0099/latest/

I can't figure out how to get a stable link when there is only
one version.  If possible would be good to use one.

I guess it probably doesn't matter unless someone renames things as
you only have as subset of the fields currently there for some registers.

> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
A few tiny things inline.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


> ---
>  drivers/resctrl/mpam_internal.h | 268 ++++++++++++++++++++++++++++++++
>  1 file changed, 268 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 1a5d96660382..1ef3e8e1d056 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -142,4 +142,272 @@ extern struct list_head mpam_classes;
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
>  
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/
> + */
> +#define MPAM_ARCHITECTURE_V1    0x10

> +#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
Spec name I'm seeing is
MSMON_MBWU_L_CAPTURE.  Maybe a good idea to match?

> + */
> +#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
> +#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
> +#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
> +#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
> +#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
> +#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
> +#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
> +#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
> +#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
> +#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
> +#define MSMON_CFG_x_CTL_EN			BIT(31)
> +
> +#define MSMON_CFG_MBWU_CTL_TYPE_MBWU		0x42
> +#define MSMON_CFG_CSU_CTL_TYPE_CSU		0x43
> +
> +#define MSMON_CFG_MBWU_CTL_SCLEN		BIT(19)

Why is this one down here, but OFLOW_STATUS_L is in middle of the shared
block of definitions? I don't mind which approach you use, but not a mix.

> +
> +/*
> + * MSMON_CSU - Memory system performance monitor cache storage usage monitor
> + *            register
> + * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
> + *                     capture register
> + * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
> + *               monitor register
> + * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
> + *                     capture register
> + */
> +#define MSMON___VALUE		GENMASK(30, 0)
> +#define MSMON___NRDY		BIT(31)
> +#define MSMON___NRDY_L		BIT(63)
> +#define MSMON___L_VALUE		GENMASK(43, 0)
Positioning of L in these seems a little inconsistent?

> +#define MSMON___LWD_VALUE	GENMASK(62, 0)

>  #endif /* MPAM_INTERNAL_H */


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-10-17 18:56 ` [PATCH v3 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
@ 2025-10-24 17:40   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 17:40 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Ben Horgan

On Fri, 17 Oct 2025 18:56:27 +0000
James Morse <james.morse@arm.com> wrote:

> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may also have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> The limits from requestors (e.g. CPUs) also need taking into account.
> 
> While doing this, RIS entries that firmware didn't describe are created
> under MPAM_CLASS_UNKNOWN.
> 
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Trivial stuff inline. I'd definitely not trust this reviewer who
is horribly inconsistent ;)

> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

>  static struct mpam_vmsc *
>  mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
>  {
> @@ -427,6 +500,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>  	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>  	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>  	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +	list_add_rcu(&ris->msc_list, &msc->ris);

This looks like it might the add I was missing earlier?  If so and it can
only be done now, move the del into this patch as well.

>  
>  	return 0;
>  }
> @@ -446,9 +520,36 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  	return err;
>  }
>  
> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> +						   u8 ris_idx)
> +{
> +	int err;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (!test_bit(ris_idx, &msc->ris_idxs)) {
> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> +					     0, 0);
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +
> +	list_for_each_entry(ris, &msc->ris, msc_list) {
> +		if (ris->ris_idx == ris_idx) {
> +			return ris;

I'm not seeing this change in later patches in this series, so brackets
seem unnecessary and against kernel style.

> +		}
> +	}
> +
> +	return ERR_PTR(-ENOENT);
> +}
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-10-17 18:56 ` [PATCH v3 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-10-24 17:43   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 17:43 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:28 +0000
James Morse <james.morse@arm.com> wrote:

> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms
> where MSC are not accessible from every CPU. This makes an irqsave
> spinlock the obvious lock to protect these registers. On systems with SCMI
> or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
> The SCMI or PCC platforms can't support an overflow interrupt, and
> can't access the registers from hardirq context.
> 
> Clearly these two can't exist for one MSC at the same time.
> 
> Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
> only support 'real' MMIO platforms.
> 
> In the future this lock will be split in two allowing SCMI/PCC platforms
> to take a mutex. Because there are contexts where the SCMI/PCC platforms
> can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
> this now, so that all the error handling on these paths is present. This
> allows the relevant paths to fail if they are needed on a platform where
> this isn't possible, instead of having to make explicit checks of the
> interface type.
> 
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Change since v1:
>  * Made accesses to outer_lock_held READ_ONCE() for torn values in the failure
>    case.
Guess that went away. I'd prune the old version log or add something to indicate
it did in a later version log.

One stray change inline otherwise seems fine
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


> ---
>  drivers/resctrl/mpam_devices.c  |  3 ++-
>  drivers/resctrl/mpam_internal.h | 38 +++++++++++++++++++++++++++++++++
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 910bb6cd5e4f..35011d3e8f1e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -738,6 +738,7 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
>  
>  	mutex_init(&msc->probe_lock);
>  	mutex_init(&msc->part_sel_lock);
> +	mpam_mon_sel_lock_init(msc);
>  	msc->id = pdev->id;
>  	msc->pdev = pdev;
>  	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> @@ -822,7 +823,7 @@ static void mpam_enable_once(void)
>  				      "mpam:online");
>  
>  	/* Use printk() to avoid the pr_fmt adding the function name. */
> -	printk(KERN_INFO, "MPAM enabled with %u PARTIDs and %u PMGs\n",
> +	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
Move this to original patch.

>  	       mpam_partid_max + 1, mpam_pmg_max + 1);
>  }


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/29] arm_mpam: Probe the hardware features resctrl supports
  2025-10-17 18:56 ` [PATCH v3 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-10-24 17:47   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 17:47 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:29 +0000
James Morse <james.morse@arm.com> wrote:

> Expand the probing support with the control and monitor types
> we can use with resctrl.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Trivial thing inline. LGTM.

> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 1afc52b36328..be9ea0aab6d2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -5,6 +5,7 @@
>  #define MPAM_INTERNAL_H
>  
>  #include <linux/arm_mpam.h>
> +#include <linux/bitmap.h>
>  #include <linux/cpumask.h>
>  #include <linux/io.h>
>  #include <linux/llist.h>
> @@ -13,6 +14,7 @@
>  #include <linux/sizes.h>
>  #include <linux/spinlock.h>
>  #include <linux/srcu.h>
> +#include <linux/types.h>
>  
>  #define MPAM_MSC_MAX_NUM_RIS	16
>  
> @@ -111,6 +113,33 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
>  	raw_spin_lock_init(&msc->_mon_sel_lock);
>  }
>  
> +/* Bits for mpam features bitmaps */
> +enum mpam_device_features {
> +	mpam_feat_cpor_part = 0,

Given you are using this for internal stuff and the numbers don't mean anything explicit
I think you can drop the = 0.

> +	mpam_feat_mbw_part,
> +	mpam_feat_mbw_min,
> +	mpam_feat_mbw_max,
> +	mpam_feat_msmon,
> +	mpam_feat_msmon_csu,
> +	mpam_feat_msmon_csu_hw_nrdy,
> +	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_hw_nrdy,
> +	MPAM_FEATURE_LAST
> +};



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks
  2025-10-17 18:56 ` [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks James Morse
@ 2025-10-24 17:52   ` Jonathan Cameron
  2025-10-29  6:53   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 17:52 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:31 +0000
James Morse <james.morse@arm.com> wrote:

> When a CPU comes online, it may bring a newly accessible MSC with
> it. Only the default partid has its value reset by hardware, and
> even then the MSC might not have been reset since its config was
> previously dirtied. e.g. Kexec.
> 
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
> 
> MSC are also reset when CPUs are taken offline to cover cases where
> firmware doesn't reset the MSC over reboot using UEFI, or kexec
> where there is no firmware involvement.
> 
> If the configuration for a RIS has not been touched since it was
> brought online, it does not need resetting again.
> 
> To reset, write the maximum values for all discovered controls.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 18/29] arm_mpam: Register and enable IRQs
  2025-10-17 18:56 ` [PATCH v3 18/29] arm_mpam: Register and enable IRQs James Morse
@ 2025-10-24 18:03   ` Jonathan Cameron
  2025-10-29  7:02   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 18:03 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:34 +0000
James Morse <james.morse@arm.com> wrote:

> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the MPAMF_ESR register, so no locking is
> needed. The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning
> it can't be done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
One trivial thing inline to reduce the patch churn a tiny bit.
May not be worth the hassle.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 545482e112b7..f18a22f825a0 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -14,6 +14,9 @@



>  static void mpam_enable_once(void)
>  {
> -	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> -	mutex_unlock(&mpam_list_lock);

Can you shift this later in the earlier patch (ordering clearly doesn't matter)
so as to reduce churn here?

> +	int err;
>  
>  	/*
>  	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> @@ -1318,6 +1572,27 @@ static void mpam_enable_once(void)
>  	partid_max_published = true;
>  	spin_unlock(&partid_max_lock);
>  
> +	/*
> +	 * If all the MSC have been probed, enabling the IRQs happens next.
> +	 * That involves cross-calling to a CPU that can reach the MSC, and
> +	 * the locks must be taken in this order:
> +	 */
> +	cpus_read_lock();
> +	mutex_lock(&mpam_list_lock);
> +	mpam_enable_merge_features(&mpam_classes);
> +
> +	err = mpam_register_irqs();
> +
> +	mutex_unlock(&mpam_list_lock);
> +	cpus_read_unlock();
> +
> +	if (err) {
> +		pr_warn("Failed to register irqs: %d\n", err);
> +		mpam_disable_reason = "Failed to enable.";
> +		schedule_work(&mpam_broken_work);
> +		return;
> +	}
> +
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
>  				      "mpam:online");
>  
> @@ -1385,6 +1660,8 @@ void mpam_disable(struct work_struct *ignored)
>  	}
>  	mutex_unlock(&mpam_cpuhp_state_lock);
>  
> +	mpam_unregister_irqs();
> +
>  	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>  				 srcu_read_lock_held(&mpam_srcu))


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-10-17 18:56 ` [PATCH v3 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-10-24 18:18   ` Jonathan Cameron
  0 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 18:18 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:39 +0000
James Morse <james.morse@arm.com> wrote:

> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
> 
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
> 
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Seems fine. Just the trivial comment below.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>



> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> +	int err,  any_err = 0;
extra space after ,

> +	struct mpam_vmsc *vmsc;




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-10-17 18:56 ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
  2025-10-22 13:39   ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Zeng Heng
@ 2025-10-24 18:22   ` Jonathan Cameron
  2025-10-29  7:56   ` [PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
  2 siblings, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 18:22 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

On Fri, 17 Oct 2025 18:56:40 +0000
James Morse <james.morse@arm.com> wrote:

> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
> 
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
> 
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
Just one trivial thing from me. I'll take another look once the
questions from others are resolved.

> +/* Call with MSC lock and held */
> +static int mpam_save_mbwu_state(void *arg)
> +{
> +	int i;
> +	u64 val;
> +	struct mon_cfg *cfg;
> +	u32 cur_flt, cur_ctl, mon_sel;
> +	struct mpam_msc_ris *ris = arg;
> +	struct msmon_mbwu_state *mbwu_state;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		mbwu_state = &ris->mbwu_state[i];
> +		cfg = &mbwu_state->cfg;

Could pull some of the local variable declarations in here to make
their scope clear.

> +
> +		if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
> +			return -EIO;
> +
> +		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
> +			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
> +
> +		val = mpam_read_monsel_reg(msc, MBWU);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +
> +		cfg->mon = i;
> +		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
> +		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
> +		cfg->partid = FIELD_GET(MSMON_CFG_x_FLT_PARTID, cur_flt);
> +		mbwu_state->correction += val;
> +		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
> +		mpam_mon_sel_unlock(msc);
> +	}
> +
> +	return 0;
> +}



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters
  2025-10-17 18:56 ` [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
  2025-10-22 11:23   ` Ben Horgan
@ 2025-10-24 18:24   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 18:24 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Ben Horgan

On Fri, 17 Oct 2025 18:56:41 +0000
James Morse <james.morse@arm.com> wrote:

> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> mpam v0.1 and versions above v1.0 support optional long counter for
> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
> indicating support for long counters.
> 
> Probe these feature bits.
> 
> The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
> monitors are supported, instead of muddling this with which size of
> bandwidth monitors, add an explicit 31 bit counter feature.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [ morse: Added 31bit counter feature to simplify later logic ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
With or without the change Ben suggested.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported
  2025-10-17 18:56 ` [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported James Morse
  2025-10-22 12:31   ` Ben Horgan
@ 2025-10-24 18:29   ` Jonathan Cameron
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 18:29 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Ben Horgan

On Fri, 17 Oct 2025 18:56:42 +0000
James Morse <james.morse@arm.com> wrote:

> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> Now that the larger counter sizes are probed, make use of them.
> 
> Callers of mpam_msmon_read() may not know (or care!) about the different
> counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
> driver pick the counter to use.
> 
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit, added explicit counter selection ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

A few tiny things on a fresh look.

> +static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
> +{
> +	int retry = 3;
> +	u32 mbwu_l_low;
> +	u64 mbwu_l_high1, mbwu_l_high2;
> +
> +	mpam_mon_sel_lock_held(msc);
> +
> +	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +	do {
> +		mbwu_l_high1 = mbwu_l_high2;
> +		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
> +		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +
> +		retry--;
> +	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);

Just carrying on if it tore repeatedly without screaming seems unwise...
I can't see it actually happening more than once but still seems like
we'd want to know if it did.

> +
> +	if (mbwu_l_high1 == mbwu_l_high2)
> +		return (mbwu_l_high1 << 32) | mbwu_l_low;
> +	return MSMON___NRDY_L;
> +}

>  static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> @@ -978,10 +1027,15 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  		mpam_write_monsel_reg(msc, CSU, 0);
>  		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>  		break;
> -	case mpam_feat_msmon_mbwu:
> +	case mpam_feat_msmon_mbwu_44counter:
> +	case mpam_feat_msmon_mbwu_63counter:
> +		mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
> +		fallthrough;
> +	case mpam_feat_msmon_mbwu_31counter:
>  		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>  		mpam_write_monsel_reg(msc, MBWU, 0);
> +
Stray change to clean up (push to original patch).
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>  
>  		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> @@ -993,10 +1047,19 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  	}
>  }


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-10-17 18:56 ` [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-10-24 18:34   ` Jonathan Cameron
  2025-10-29  7:14   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 86+ messages in thread
From: Jonathan Cameron @ 2025-10-24 18:34 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Fenghua Yu

On Fri, 17 Oct 2025 18:56:43 +0000
James Morse <james.morse@arm.com> wrote:

> resctrl expects to reset the bandwidth counters when the filesystem
> is mounted.
> 
> To allow this, add a helper that clears the saved mbwu state. Instead
> of cross calling to each CPU that can access the component MSC to
> write to the counter, set a flag that causes it to be zero'd on the
> the next read. This is easily done by forcing a configuration update.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
A couple of little places in here where I think moving things back to earlier
patches would reduce churn a little which is always nice for reviewers.

Either way
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
> Changes since v2:
>  * Switched to guard and fixed non _srcu list walker.
>  * Made a comment about what is proteted by which lock a list.
> ---
>  drivers/resctrl/mpam_devices.c  | 46 ++++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  7 ++++-
>  2 files changed, 51 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index c207a6d2832c..89d4f42168ed 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c

>  	/*
>  	 * Read the existing configuration to avoid re-writing the same values.
>  	 * This saves waiting for 'nrdy' on subsequent reads.
> @@ -1090,7 +1100,10 @@ static void __ris_msmon_read(void *arg)
>  	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
>  	clean_msmon_ctl_val(&cur_ctl);
>  	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> -	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> +	config_mismatch = cur_flt != flt_val ||
> +			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);

Push back to earlier patch perhaps?  I guess someone might ask why you don't do it
inline, but to me it seems complex enough that I doubt they will.
Nice to reduce the churn where it is easy to do.

> +
> +	if (config_mismatch || reset_on_next_read)
>  		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
>  
>  	switch (m->type) {
> @@ -1242,6 +1255,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>  	return err;
>  }


> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index ff38b4bbfc2b..6632699ae814 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -204,10 +204,14 @@ struct mon_cfg {
>  
>  /*
>   * Changes to enabled and cfg are protected by the msc->lock.
> - * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
> + * The msc's mon_sel_lock protects:
Nice to push this formatting change back to earlier patch so this
becomes a one line change.

> + * - reset_on_next_read
> + * - prev_val
> + * - correction
>   */
>  struct msmon_mbwu_state {
>  	bool		enabled;
> +	bool		reset_on_next_read;
>  	struct mon_cfg	cfg;


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-22 16:17     ` Ben Horgan
@ 2025-10-25  8:45       ` Zeng Heng
  2025-10-25  9:34         ` [PATCH] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
  2025-10-28 17:04         ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Ben Horgan
  2025-10-25  9:01       ` Zeng Heng
  1 sibling, 2 replies; 86+ messages in thread
From: Zeng Heng @ 2025-10-25  8:45 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Ben,

On 2025/10/23 0:17, Ben Horgan wrote:
>>
>> Also fix the handling of overflow amount calculation. There's no need to
>> subtract mbwu_state->prev_val when calculating overflow_val.
> 
> Why not? Isn't this the pre-overflow part that we are missing from the
> running count?
> 

The MSMON_MBWU register accumulates counts monotonically forward and
would not automatically cleared to zero on overflow.

The overflow portion is exactly what mpam_msmon_overflow_val() computes,
there is no need to additionally subtract mbwu_state->prev_val.

>>
>> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
>> ---
>>   drivers/resctrl/mpam_devices.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 0dd048279e02..06f3ec9887d2 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>>   	clean_msmon_ctl_val(&cur_ctl);
>>   	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>>   	config_mismatch = cur_flt != flt_val ||
>> -			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>> +			 (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
>> +			 (ctl_val | MSMON_CFG_x_CTL_EN);
> 
> This only considers 31 bit counters. I would expect any change here to
> consider all lengths of counter. Also, as the overflow bit is no longer
> reset due to the config mismatch it needs to be reset somewhere else.

Yes, overflow bit needs to be cleared somewhere. I try to point out in
the next patch mail.

Best Regards,
Zeng Heng

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-22 16:17     ` Ben Horgan
  2025-10-25  8:45       ` Zeng Heng
@ 2025-10-25  9:01       ` Zeng Heng
  2025-10-28 16:01         ` Ben Horgan
  1 sibling, 1 reply; 86+ messages in thread
From: Zeng Heng @ 2025-10-25  9:01 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Ben,

On 2025/10/23 0:17, Ben Horgan wrote:

>> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
>> ---
>>   drivers/resctrl/mpam_devices.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 0dd048279e02..06f3ec9887d2 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>>   	clean_msmon_ctl_val(&cur_ctl);
>>   	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>>   	config_mismatch = cur_flt != flt_val ||
>> -			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>> +			 (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
>> +			 (ctl_val | MSMON_CFG_x_CTL_EN);
> 
> This only considers 31 bit counters. I would expect any change here to
> consider all lengths of counter. 
> 

Sorry, regardless of whether the counter is 32-bit or 64-bit, the
config_mismatch logic should be handled the same way here. Am I
wrong?

Best Regards,
Zeng Heng

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH] arm64/mpam: Clean MBWU monitor overflow bit
  2025-10-25  8:45       ` Zeng Heng
@ 2025-10-25  9:34         ` Zeng Heng
  2025-10-28 17:37           ` Ben Horgan
  2025-10-28 17:04         ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Ben Horgan
  1 sibling, 1 reply; 86+ messages in thread
From: Zeng Heng @ 2025-10-25  9:34 UTC (permalink / raw)
  To: ben.horgan, james.morse
  Cc: zengheng4, amitsinght, baisheng.gao, baolin.wang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, sunnanyong, tan.shaopeng,
	wangkefeng.wang, will, xhao

The MSMON_MBWU register accumulates counts monotonically forward and
would not automatically cleared to zero on overflow. The overflow portion
is exactly what mpam_msmon_overflow_val() computes, there is no need to
additionally subtract mbwu_state->prev_val.

Before invoking write_msmon_ctl_flt_vals(), the overflow bit of the
MSMON_MBWU register must first be read to prevent it from being
inadvertently cleared by the write operation. Then, before updating the
monitor configuration, the overflow bit should be cleared to zero.

Finally, use the overflow bit instead of relying on counter wrap-around
to determine whether an overflow has occurred, that avoids the case where
a wrap-around (now > prev_val) is overlooked.

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
 drivers/resctrl/mpam_devices.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0dd048279e02..575980e3a366 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1062,6 +1062,21 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
 	}
 }
 
+static bool read_msmon_mbwu_is_overflow(struct mpam_msc *msc)
+{
+	u32 ctl;
+	bool overflow;
+
+	ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+	overflow = ctl & MSMON_CFG_x_CTL_OFLOW_STATUS ? true : false;
+
+	if (overflow)
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl &
+				     ~MSMON_CFG_x_CTL_OFLOW_STATUS);
+
+	return overflow;
+}
+
 /* Call with MSC lock held */
 static void __ris_msmon_read(void *arg)
 {
@@ -1069,6 +1084,7 @@ static void __ris_msmon_read(void *arg)
 	bool config_mismatch;
 	struct mon_read *m = arg;
 	u64 now, overflow_val = 0;
+	bool mbwu_overflow = false;
 	struct mon_cfg *ctx = m->ctx;
 	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
@@ -1091,6 +1107,7 @@ static void __ris_msmon_read(void *arg)
 			reset_on_next_read = mbwu_state->reset_on_next_read;
 			mbwu_state->reset_on_next_read = false;
 		}
+		mbwu_overflow = read_msmon_mbwu_is_overflow(msc);
 	}
 
 	/*
@@ -1138,8 +1155,8 @@ static void __ris_msmon_read(void *arg)
 		mbwu_state = &ris->mbwu_state[ctx->mon];
 
 		/* Add any pre-overflow value to the mbwu_state->val */
-		if (mbwu_state->prev_val > now)
-			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;
+		if (mbwu_overflow)
+			overflow_val = mpam_msmon_overflow_val(m->type);
 
 		mbwu_state->prev_val = now;
 		mbwu_state->correction += overflow_val;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-10-17 18:56 ` [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
  2025-10-20 17:04   ` Ben Horgan
@ 2025-10-27  8:47   ` Shaopeng Tan (Fujitsu)
  2025-10-29  7:09   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-27  8:47 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Dave Martin,
	Ben Horgan

Hello James,

> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate a
> configuration array for each component, and reprogram each RIS configuration
> from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
> Changes since v2:
>  * Call mpam_init_reset_cfg() on alloated config as 0 is not longer correct.
>  * init_garbage() on each config - the array has to be freed in one go, but
>    otherwise this looks weird.
>  * Use struct initialiser in mpam_init_reset_cfg(),
>  * Moved int err definition.
>  * Removed srcu lock taking based on squinting at the only caller.
>  * Moved config reset to mpam_reset_component_cfg() for re-use in
>    mpam_reset_component_locked(), previous memset() was not enough
> since zero
>    no longer means reset.
> 
> Changes since v1:
>  * Switched entry_rcu to srcu versions.
> 
> Changes since RFC:
>  * Added a comment about the ordering around max_partid.
>  * Allocate configurations after interrupts are registered to reduce churn.
>  * Added mpam_assert_partid_sizes_fixed();
>  * Make reset use an all-ones instead of zero config.
> ---
>  drivers/resctrl/mpam_devices.c  | 284
> +++++++++++++++++++++++++++++---
> drivers/resctrl/mpam_internal.h |  23 +++
>  2 files changed, 287 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index ab37ed1fb5de..e990ef67df5b 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -118,6 +118,17 @@ static inline void init_garbage(struct mpam_garbage
> *garbage)  {
>  	init_llist_node(&garbage->llist);
>  }
> +
> +/*
> + * Once mpam is enabled, new requestors cannot further reduce the
> +available
> + * partid. Assert that the size is fixed, and new requestors will be
> +turned
> + * away.
> + */
> +static void mpam_assert_partid_sizes_fixed(void)
> +{
> +	WARN_ON_ONCE(!partid_max_published);
> +}
> +
>  static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)  {
>  	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> &msc->accessibility)); @@ -366,12 +377,16 @@ static void
> mpam_class_destroy(struct mpam_class *class)
>  	add_to_garbage(class);
>  }
> 
> +static void __destroy_component_cfg(struct mpam_component *comp);
> +
>  static void mpam_comp_destroy(struct mpam_component *comp)  {
>  	struct mpam_class *class = comp->class;
> 
>  	lockdep_assert_held(&mpam_list_lock);
> 
> +	__destroy_component_cfg(comp);
> +
>  	list_del_rcu(&comp->class_list);
>  	add_to_garbage(comp);
> 
> @@ -812,48 +827,102 @@ static void mpam_reset_msc_bitmap(struct
> mpam_msc *msc, u16 reg, u16 wd)
>  	__mpam_write_reg(msc, reg, bm);
>  }
> 
> -static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +/* Called via IPI. Call while holding an SRCU reference */ static void
> +mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> +				      struct mpam_config *cfg)
>  {
>  	struct mpam_msc *msc = ris->vmsc->msc;
>  	struct mpam_props *rprops = &ris->props;
> 
> -	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
> -
>  	mutex_lock(&msc->part_sel_lock);
>  	__mpam_part_sel(ris->ris_idx, partid, msc);
> 
> -	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
> rprops->cpbm_wd);
> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
> +	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
> +		if (cfg->reset_cpbm)
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
> +					      rprops->cpbm_wd);
> +		else
> +			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> +	}
> 
> -	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
> rprops->mbw_pbm_bits);
> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_part, cfg)) {
> +		if (cfg->reset_mbw_pbm)
> +			mpam_reset_msc_bitmap(msc,
> MPAMCFG_MBW_PBM,
> +					      rprops->mbw_pbm_bits);
> +		else
> +			mpam_write_partsel_reg(msc, MBW_PBM,
> cfg->mbw_pbm);
> +	}
> 
> -	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_min, cfg))
>  		mpam_write_partsel_reg(msc, MBW_MIN, 0);
> 
> -	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> -		mpam_write_partsel_reg(msc, MBW_MAX,
> MPAMCFG_MBW_MAX_MAX);
> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_max, cfg))
> +		mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
> 
>  	mutex_unlock(&msc->part_sel_lock);
>  }
> 
> +struct reprogram_ris {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_config *cfg;
> +};
> +
> +/* Call with MSC lock held */
> +static int mpam_reprogram_ris(void *_arg) {
> +	u16 partid, partid_max;
> +	struct reprogram_ris *arg = _arg;
> +	struct mpam_msc_ris *ris = arg->ris;
> +	struct mpam_config *cfg = arg->cfg;
> +
> +	if (ris->in_reset_state)
> +		return 0;
> +
> +	spin_lock(&partid_max_lock);
> +	partid_max = mpam_partid_max;
> +	spin_unlock(&partid_max_lock);
> +	for (partid = 0; partid <= partid_max + 1; partid++)
> +		mpam_reprogram_ris_partid(ris, partid, cfg);
> +
> +	return 0;
> +}
> +
> +static void mpam_init_reset_cfg(struct mpam_config *reset_cfg) {
> +	*reset_cfg = (struct mpam_config) {
> +		.cpbm = ~0,
> +		.mbw_pbm = ~0,
> +		.mbw_max = MPAMCFG_MBW_MAX_MAX,

When rdtgroup_schemata_show() is called, the "cpbm" value is output to the schema file.
Since bitmap lengths are chip-dependent, I think we just need to reset the bitmap length portion.
Otherwise, 0xffffffff(u32) will be output from the schemata file.

Best regards,
Shaopeng TAN


> +		.reset_cpbm = true,
> +		.reset_mbw_pbm = true,
> +	};
> +	bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST); }
> +
>  /*
>   * Called via smp_call_on_cpu() to prevent migration, while still being
>   * pre-emptible. Caller must hold mpam_srcu.
>   */
>  static int mpam_reset_ris(void *arg)
>  {
> -	u16 partid, partid_max;
> +	struct mpam_config reset_cfg;
>  	struct mpam_msc_ris *ris = arg;
> +	struct reprogram_ris reprogram_arg;
> 
>  	if (ris->in_reset_state)
>  		return 0;
> 
> -	spin_lock(&partid_max_lock);
> -	partid_max = mpam_partid_max;
> -	spin_unlock(&partid_max_lock);
> -	for (partid = 0; partid < partid_max + 1; partid++)
> -		mpam_reset_ris_partid(ris, partid);
> +	mpam_init_reset_cfg(&reset_cfg);
> +
> +	reprogram_arg.ris = ris;
> +	reprogram_arg.cfg = &reset_cfg;
> +
> +	mpam_reprogram_ris(&reprogram_arg);
> 
>  	return 0;
>  }
> @@ -897,6 +966,39 @@ static void mpam_reset_msc(struct mpam_msc *msc,
> bool online)
>  	}
>  }
> 
> +static void mpam_reprogram_msc(struct mpam_msc *msc) {
> +	u16 partid;
> +	bool reset;
> +	struct mpam_config *cfg;
> +	struct mpam_msc_ris *ris;
> +
> +	/*
> +	 * No lock for mpam_partid_max as partid_max_published has been
> +	 * set by mpam_enabled(), so the values can no longer change.
> +	 */
> +	mpam_assert_partid_sizes_fixed();
> +
> +	list_for_each_entry_srcu(ris, &msc->ris, msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!mpam_is_enabled() && !ris->in_reset_state) {
> +			mpam_touch_msc(msc, &mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +			continue;
> +		}
> +
> +		reset = true;
> +		for (partid = 0; partid <= mpam_partid_max; partid++) {
> +			cfg = &ris->vmsc->comp->cfg[partid];
> +			if (!bitmap_empty(cfg->features,
> MPAM_FEATURE_LAST))
> +				reset = false;
> +
> +			mpam_reprogram_ris_partid(ris, partid, cfg);
> +		}
> +		ris->in_reset_state = reset;
> +	}
> +}
> +
>  static void _enable_percpu_irq(void *_irq)  {
>  	int *irq = _irq;
> @@ -918,7 +1020,7 @@ static int mpam_cpu_online(unsigned int cpu)
>  			_enable_percpu_irq(&msc->reenable_error_ppi);
> 
>  		if (atomic_fetch_inc(&msc->online_refs) == 0)
> -			mpam_reset_msc(msc, true);
> +			mpam_reprogram_msc(msc);
>  	}
> 
>  	return 0;
> @@ -1569,6 +1671,64 @@ static void mpam_unregister_irqs(void)
>  	}
>  }
> 
> +static void __destroy_component_cfg(struct mpam_component *comp) {
> +	add_to_garbage(comp->cfg);
> +}
> +
> +static void mpam_reset_component_cfg(struct mpam_component *comp) {
> +	int i;
> +
> +	mpam_assert_partid_sizes_fixed();
> +
> +	if (!comp->cfg)
> +		return;
> +
> +	for (i = 0; i < mpam_partid_max + 1; i++)
> +		mpam_init_reset_cfg(&comp->cfg[i]);
> +}
> +
> +static int __allocate_component_cfg(struct mpam_component *comp) {
> +	mpam_assert_partid_sizes_fixed();
> +
> +	if (comp->cfg)
> +		return 0;
> +
> +	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg),
> GFP_KERNEL);
> +	if (!comp->cfg)
> +		return -ENOMEM;
> +
> +	/*
> +	 * The array is free()d in one go, so only cfg[0]'s struture needs
> +	 * to be initialised.
> +	 */
> +	init_garbage(&comp->cfg[0].garbage);
> +
> +	mpam_reset_component_cfg(comp);
> +
> +	return 0;
> +}
> +
> +static int mpam_allocate_config(void)
> +{
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		list_for_each_entry(comp, &class->components, class_list) {
> +			int err = __allocate_component_cfg(comp);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static void mpam_enable_once(void)
>  {
>  	int err;
> @@ -1588,15 +1748,25 @@ static void mpam_enable_once(void)
>  	 */
>  	cpus_read_lock();
>  	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> +	do {
> +		mpam_enable_merge_features(&mpam_classes);
> 
> -	err = mpam_register_irqs();
> +		err = mpam_register_irqs();
> +		if (err) {
> +			pr_warn("Failed to register irqs: %d\n", err);
> +			break;
> +		}
> 
> +		err = mpam_allocate_config();
> +		if (err) {
> +			pr_err("Failed to allocate configuration arrays.\n");
> +			break;
> +		}
> +	} while (0);
>  	mutex_unlock(&mpam_list_lock);
>  	cpus_read_unlock();
> 
>  	if (err) {
> -		pr_warn("Failed to register irqs: %d\n", err);
>  		mpam_disable_reason = "Failed to enable.";
>  		schedule_work(&mpam_broken_work);
>  		return;
> @@ -1617,6 +1787,9 @@ static void mpam_reset_component_locked(struct
> mpam_component *comp)
>  	struct mpam_vmsc *vmsc;
> 
>  	lockdep_assert_cpus_held();
> +	mpam_assert_partid_sizes_fixed();
> +
> +	mpam_reset_component_cfg(comp);
> 
>  	guard(srcu)(&mpam_srcu);
>  	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list, @@
> -1717,6 +1890,77 @@ void mpam_enable(struct work_struct *work)
>  		mpam_enable_once();
>  }
> 
> +struct mpam_write_config_arg {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_component *comp;
> +	u16 partid;
> +};
> +
> +static int __write_config(void *arg)
> +{
> +	struct mpam_write_config_arg *c = arg;
> +
> +	mpam_reprogram_ris_partid(c->ris, c->partid,
> +&c->comp->cfg[c->partid]);
> +
> +	return 0;
> +}
> +
> +#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
> +	if (mpam_has_feature(feature, newcfg) &&			\
> +	    (newcfg)->member != (cfg)->member) {			\
> +		(cfg)->member = (newcfg)->member;			\
> +		mpam_set_feature(feature, cfg);				\
> +									\
> +		(changes) = true;					\
> +	}								\
> +} while (0)
> +
> +static bool mpam_update_config(struct mpam_config *cfg,
> +			       const struct mpam_config *newcfg) {
> +	bool has_changes = false;
> +
> +	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm,
> has_changes);
> +	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg,
> mbw_pbm, has_changes);
> +	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg,
> mbw_max,
> +has_changes);
> +
> +	return has_changes;
> +}
> +
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> +		      struct mpam_config *cfg)
> +{
> +	struct mpam_write_config_arg arg;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc *msc;
> +
> +	lockdep_assert_cpus_held();
> +
> +	/* Don't pass in the current config! */
> +	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
> +
> +	if (!mpam_update_config(&comp->cfg[partid], cfg))
> +		return 0;
> +
> +	arg.comp = comp;
> +	arg.partid = partid;
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		msc = vmsc->msc;
> +
> +		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
> +
> srcu_read_lock_held(&mpam_srcu)) {
> +			arg.ris = ris;
> +			mpam_touch_msc(msc, __write_config, &arg);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int __init mpam_msc_driver_init(void)  {
>  	if (!system_supports_mpam())
> diff --git a/drivers/resctrl/mpam_internal.h
> b/drivers/resctrl/mpam_internal.h index d492df9a1735..2f2a7369107b 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -182,6 +182,20 @@ struct mpam_class {
>  	struct mpam_garbage	garbage;
>  };
> 
> +struct mpam_config {
> +	/* Which configuration values are valid. */
> +	DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
> +
> +	u32	cpbm;
> +	u32	mbw_pbm;
> +	u16	mbw_max;
> +
> +	bool	reset_cpbm;
> +	bool	reset_mbw_pbm;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
>  struct mpam_component {
>  	u32			comp_id;
> 
> @@ -190,6 +204,12 @@ struct mpam_component {
> 
>  	cpumask_t		affinity;
> 
> +	/*
> +	 * Array of configuration values, indexed by partid.
> +	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
> +	 */
> +	struct mpam_config	*cfg;
> +
>  	/* member of mpam_class:components */
>  	struct list_head	class_list;
> 
> @@ -249,6 +269,9 @@ extern u8 mpam_pmg_max;  void mpam_enable(struct
> work_struct *work);  void mpam_disable(struct work_struct *work);
> 
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> +		      struct mpam_config *cfg);
> +
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
> cache_level,
>  				   cpumask_t *affinity);
> 
> --
> 2.39.5


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-10-24 17:32   ` Jonathan Cameron
@ 2025-10-27 16:33     ` Ben Horgan
  0 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-27 16:33 UTC (permalink / raw)
  To: Jonathan Cameron, James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, D Scott Phillips OS,
	carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
	David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay, baisheng.gao, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

Hi James, Jonathan,

On 10/24/25 18:32, Jonathan Cameron wrote:
> On Fri, 17 Oct 2025 18:56:25 +0000
> James Morse <james.morse@arm.com> wrote:
> 
>> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
>> (MSCs) with an identity/configuration page.
>>
>> Add the definitions for these registers as offset within the page(s).
>>
>> Link: https://developer.arm.com/documentation/ihi0099/latest/
> 
> I can't figure out how to get a stable link when there is only
> one version.  If possible would be good to use one.
> 
> I guess it probably doesn't matter unless someone renames things as
> you only have as subset of the fields currently there for some registers.
> 
https://developer.arm.com/documentation/ihi0099/aa/

This link has the version, A.a, at the end so should be stable. I found
this by visiting an unknown version,
https://developer.arm.com/documentation/ihi0099/unknown/, and seeing
where it redirects.

Thanks,

Ben

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-25  9:01       ` Zeng Heng
@ 2025-10-28 16:01         ` Ben Horgan
  2025-10-29  2:49           ` Zeng Heng
  0 siblings, 1 reply; 86+ messages in thread
From: Ben Horgan @ 2025-10-28 16:01 UTC (permalink / raw)
  To: Zeng Heng, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Zeng,

On 10/25/25 10:01, Zeng Heng wrote:
> Hi Ben,
> 
> On 2025/10/23 0:17, Ben Horgan wrote:
> 
>>> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
>>> ---
>>>   drivers/resctrl/mpam_devices.c | 8 +++++---
>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>> mpam_devices.c
>>> index 0dd048279e02..06f3ec9887d2 100644
>>> --- a/drivers/resctrl/mpam_devices.c
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>>>       clean_msmon_ctl_val(&cur_ctl);
>>>       gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>>>       config_mismatch = cur_flt != flt_val ||
>>> -              cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>>> +             (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
>>> +             (ctl_val | MSMON_CFG_x_CTL_EN);
>>
>> This only considers 31 bit counters. I would expect any change here to
>> consider all lengths of counter.
> 
> Sorry, regardless of whether the counter is 32-bit or 64-bit, the
> config_mismatch logic should be handled the same way here. Am I
> wrong?

Yes, they should be handled the same way. However, the overflow status
bit for long counters is MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L.

I now see that the existing code in the series has this covered.
Both the overflow bits are masked out in clean_msmon_ctl_val(). No need
for any additional masking.

> 
> Best Regards,
> Zeng Heng
> 
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-25  8:45       ` Zeng Heng
  2025-10-25  9:34         ` [PATCH] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
@ 2025-10-28 17:04         ` Ben Horgan
  1 sibling, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-28 17:04 UTC (permalink / raw)
  To: Zeng Heng, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Zeng,

On 10/25/25 09:45, Zeng Heng wrote:
> Hi Ben,
> 
> On 2025/10/23 0:17, Ben Horgan wrote:
>>>
>>> Also fix the handling of overflow amount calculation. There's no need to
>>> subtract mbwu_state->prev_val when calculating overflow_val.
>>
>> Why not? Isn't this the pre-overflow part that we are missing from the
>> running count?
>>
> 
> The MSMON_MBWU register accumulates counts monotonically forward and
> would not automatically cleared to zero on overflow.
> 
> The overflow portion is exactly what mpam_msmon_overflow_val() computes,
> there is no need to additionally subtract mbwu_state->prev_val.

Yes, I now see you are correct. The 'correction' ends up holding
(counter size) * (number of overflows) and the current value of the
counter plus this gives you the bandwidth use up until now.

> 
>>>
>>> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
>>> ---
>>>   drivers/resctrl/mpam_devices.c | 8 +++++---
>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>> mpam_devices.c
>>> index 0dd048279e02..06f3ec9887d2 100644
>>> --- a/drivers/resctrl/mpam_devices.c
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>>>       clean_msmon_ctl_val(&cur_ctl);
>>>       gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>>>       config_mismatch = cur_flt != flt_val ||
>>> -              cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>>> +             (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
>>> +             (ctl_val | MSMON_CFG_x_CTL_EN);
>>
>> This only considers 31 bit counters. I would expect any change here to
>> consider all lengths of counter. Also, as the overflow bit is no longer
>> reset due to the config mismatch it needs to be reset somewhere else.
> 
> Yes, overflow bit needs to be cleared somewhere. I try to point out in
> the next patch mail.

I had misunderstood before but the current code in the series doesn't
make use of overflow bit and just relies on prev_val > now. Using
overflow status does give us a bit more lee-way for overflowing so is a
useful enhancement.

> 
> Best Regards,
> Zeng Heng
> 
> 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH] arm64/mpam: Clean MBWU monitor overflow bit
  2025-10-25  9:34         ` [PATCH] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
@ 2025-10-28 17:37           ` Ben Horgan
  0 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-28 17:37 UTC (permalink / raw)
  To: Zeng Heng, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, carl, catalin.marinas,
	dakr, dave.martin, david, dfustini, fenghuay, gregkh, gshan,
	guohanjun, jeremy.linton, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, sunnanyong, tan.shaopeng,
	wangkefeng.wang, will, xhao

Hi Zeng,

On 10/25/25 10:34, Zeng Heng wrote:
> The MSMON_MBWU register accumulates counts monotonically forward and
> would not automatically cleared to zero on overflow. The overflow portion
> is exactly what mpam_msmon_overflow_val() computes, there is no need to
> additionally subtract mbwu_state->prev_val.
> 
> Before invoking write_msmon_ctl_flt_vals(), the overflow bit of the
> MSMON_MBWU register must first be read to prevent it from being
> inadvertently cleared by the write operation. Then, before updating the
> monitor configuration, the overflow bit should be cleared to zero.
> 
> Finally, use the overflow bit instead of relying on counter wrap-around
> to determine whether an overflow has occurred, that avoids the case where
> a wrap-around (now > prev_val) is overlooked.
> 
> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
> ---
>  drivers/resctrl/mpam_devices.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 0dd048279e02..575980e3a366 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1062,6 +1062,21 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
>  	}
>  }
>  
> +static bool read_msmon_mbwu_is_overflow(struct mpam_msc *msc)
> +{
> +	u32 ctl;
> +	bool overflow;
> +
> +	ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +	overflow = ctl & MSMON_CFG_x_CTL_OFLOW_STATUS ? true : false;
> +
> +	if (overflow)
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl &
> +				     ~MSMON_CFG_x_CTL_OFLOW_STATUS);


Seems sensible. It's best to consider the overflow status bit for long
counters as well. Although, that's introduced later in the series so
depends on patch ordering. (Sorry, was considering patches on top of the
full series when I commented on counter length before.)

> +
> +	return overflow;
> +}
> +
>  /* Call with MSC lock held */
>  static void __ris_msmon_read(void *arg)
>  {
> @@ -1069,6 +1084,7 @@ static void __ris_msmon_read(void *arg)
>  	bool config_mismatch;
>  	struct mon_read *m = arg;
>  	u64 now, overflow_val = 0;
> +	bool mbwu_overflow = false;
>  	struct mon_cfg *ctx = m->ctx;
>  	bool reset_on_next_read = false;
>  	struct mpam_msc_ris *ris = m->ris;
> @@ -1091,6 +1107,7 @@ static void __ris_msmon_read(void *arg)
>  			reset_on_next_read = mbwu_state->reset_on_next_read;
>  			mbwu_state->reset_on_next_read = false;
>  		}
> +		mbwu_overflow = read_msmon_mbwu_is_overflow(msc);

If the config is then found to mismatch, then mbwu_overflow can be
subsequently set to false.

>  	}
>  
>  	/*
> @@ -1138,8 +1155,8 @@ static void __ris_msmon_read(void *arg)
>  		mbwu_state = &ris->mbwu_state[ctx->mon];
>  
>  		/* Add any pre-overflow value to the mbwu_state->val */
> -		if (mbwu_state->prev_val > now)
> -			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;
> +		if (mbwu_overflow)
> +			overflow_val = mpam_msmon_overflow_val(m->type);

Yep, makes sense.

>  
>  		mbwu_state->prev_val = now;

With this prev_val no longer has any use.

>  		mbwu_state->correction += overflow_val;
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-28 16:01         ` Ben Horgan
@ 2025-10-29  2:49           ` Zeng Heng
  2025-10-29  3:59             ` Zeng Heng
  0 siblings, 1 reply; 86+ messages in thread
From: Zeng Heng @ 2025-10-29  2:49 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Ben,

On 2025/10/29 0:01, Ben Horgan wrote:
> Hi Zeng,
> 
> On 10/25/25 10:01, Zeng Heng wrote:
>> Hi Ben,
>>
>> On 2025/10/23 0:17, Ben Horgan wrote:
>>
>>>> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
>>>> ---
>>>>    drivers/resctrl/mpam_devices.c | 8 +++++---
>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>>> mpam_devices.c
>>>> index 0dd048279e02..06f3ec9887d2 100644
>>>> --- a/drivers/resctrl/mpam_devices.c
>>>> +++ b/drivers/resctrl/mpam_devices.c
>>>> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>>>>        clean_msmon_ctl_val(&cur_ctl);
>>>>        gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>>>>        config_mismatch = cur_flt != flt_val ||
>>>> -              cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>>>> +             (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
>>>> +             (ctl_val | MSMON_CFG_x_CTL_EN);
>>>
>>> This only considers 31 bit counters. I would expect any change here to
>>> consider all lengths of counter.
>>
>> Sorry, regardless of whether the counter is 32-bit or 64-bit, the
>> config_mismatch logic should be handled the same way here. Am I
>> wrong?
> 
> Yes, they should be handled the same way. However, the overflow status
> bit for long counters is MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L.
> 
> I now see that the existing code in the series has this covered.
> Both the overflow bits are masked out in clean_msmon_ctl_val(). No need
> for any additional masking.
> 

Yes, I’ve seen the usage, except that clearing the overflow bit in the
register is missing.


Best Regards,
Zeng Heng

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling
  2025-10-29  2:49           ` Zeng Heng
@ 2025-10-29  3:59             ` Zeng Heng
  0 siblings, 0 replies; 86+ messages in thread
From: Zeng Heng @ 2025-10-29  3:59 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Ben,

On 2025/10/29 10:49, Zeng Heng wrote:
> Hi Ben,
> 
> On 2025/10/29 0:01, Ben Horgan wrote:
>> Hi Zeng,
>>
>> On 10/25/25 10:01, Zeng Heng wrote:
>>> Hi Ben,
>>>
>>> On 2025/10/23 0:17, Ben Horgan wrote:
>>>
>>>>> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
>>>>> ---
>>>>>    drivers/resctrl/mpam_devices.c | 8 +++++---
>>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>>>>> mpam_devices.c
>>>>> index 0dd048279e02..06f3ec9887d2 100644
>>>>> --- a/drivers/resctrl/mpam_devices.c
>>>>> +++ b/drivers/resctrl/mpam_devices.c
>>>>> @@ -1101,7 +1101,8 @@ static void __ris_msmon_read(void *arg)
>>>>>        clean_msmon_ctl_val(&cur_ctl);
>>>>>        gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>>>>>        config_mismatch = cur_flt != flt_val ||
>>>>> -              cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>>>>> +             (cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS) !=
>>>>> +             (ctl_val | MSMON_CFG_x_CTL_EN);
>>>>
>>>> This only considers 31 bit counters. I would expect any change here to
>>>> consider all lengths of counter.
>>>
>>> Sorry, regardless of whether the counter is 32-bit or 64-bit, the
>>> config_mismatch logic should be handled the same way here. Am I
>>> wrong?
>>
>> Yes, they should be handled the same way. However, the overflow status
>> bit for long counters is MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L.
>>
>> I now see that the existing code in the series has this covered.
>> Both the overflow bits are masked out in clean_msmon_ctl_val(). No need
>> for any additional masking.
>>
> 
> Yes, I’ve seen the usage, except that clearing the overflow bit in the
> register is missing.
> 

Please disregard my previous mail... :)

Exactly, thanks for the review. I'll fold the fixes into v2 of the
patch.


Best Regards,
Zeng Heng

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions
  2025-10-17 18:56 ` [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
  2025-10-17 23:03   ` Fenghua Yu
  2025-10-24 17:32   ` Jonathan Cameron
@ 2025-10-29  6:37   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-29  6:37 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Ben Horgan


> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
> 
> Link: https://developer.arm.com/documentation/ihi0099/latest/
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks
  2025-10-17 18:56 ` [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks James Morse
  2025-10-24 17:52   ` Jonathan Cameron
@ 2025-10-29  6:53   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-29  6:53 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Rohit Mathew

 
> When a CPU comes online, it may bring a newly accessible MSC with it. Only
> the default partid has its value reset by hardware, and even then the MSC might
> not have been reset since its config was previously dirtied. e.g. Kexec.
> 
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
> 
> MSC are also reset when CPUs are taken offline to cover cases where firmware
> doesn't reset the MSC over reboot using UEFI, or kexec where there is no
> firmware involvement.
> 
> If the configuration for a RIS has not been touched since it was brought online,
> it does not need resetting again.
> 
> To reset, write the maximum values for all discovered controls.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>


Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 18/29] arm_mpam: Register and enable IRQs
  2025-10-17 18:56 ` [PATCH v3 18/29] arm_mpam: Register and enable IRQs James Morse
  2025-10-24 18:03   ` Jonathan Cameron
@ 2025-10-29  7:02   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-29  7:02 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan

> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever signalled,
> attempt to disable MPAM.
> 
> Only the irq handler accesses the MPAMF_ESR register, so no locking is
> needed. The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning it can't be
> done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that can
> access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure this
> size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-10-17 18:56 ` [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
  2025-10-20 17:04   ` Ben Horgan
  2025-10-27  8:47   ` Shaopeng Tan (Fujitsu)
@ 2025-10-29  7:09   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-29  7:09 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Dave Martin,
	Ben Horgan


> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate a
> configuration array for each component, and reprogram each RIS configuration
> from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state
  2025-10-17 18:56 ` [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
  2025-10-24 18:34   ` Jonathan Cameron
@ 2025-10-29  7:14   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-29  7:14 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Fenghua Yu

> resctrl expects to reset the bandwidth counters when the filesystem is
> mounted.
> 
> To allow this, add a helper that clears the saved mbwu state. Instead of cross
> calling to each CPU that can access the component MSC to write to the counter,
> set a flag that causes it to be zero'd on the the next read. This is easily done by
> forcing a configuration update.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-10-17 18:56 ` [PATCH v3 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-10-29  7:24   ` Shaopeng Tan (Fujitsu)
  0 siblings, 0 replies; 86+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-10-29  7:24 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org
  Cc: D Scott Phillips OS, carl@os.amperecomputing.com,
	lcherian@marvell.com, bobo.shaobowang@huawei.com,
	baolin.wang@linux.alibaba.com, Jamie Iles, Xin Hao,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, David Hildenbrand, Dave Martin, Koba Ko,
	Shanker Donthineni, fenghuay@nvidia.com, baisheng.gao@unisoc.com,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Jeremy Linton, Gavin Shan, Lecopzer Chen,
	Ben Horgan

> Because an MSC can only by accessed from the CPUs in its cpu-affinity set we
> need to be running on one of those CPUs to probe the MSC hardware.
> 
> Do this work in the cpuhp callback. Probing the hardware will only happen
> before MPAM is enabled, walk all the MSCs and probe those we can reach that
> haven't already been probed as each CPU's online call is made.
> 
> This adds the low-level MSC register accessors.
> 
> Once all MSCs reported by the firmware have been probed from a CPU in their
> respective cpu-affinity set, the probe-time cpuhp callbacks are replaced.  The
> replacement callbacks will ultimately need to handle save/restore of the
> runtime MSC state across power transitions, but for now there is nothing to do
> in them: so do nothing.
> 
> The architecture's context switch code will be enabled by a static-key, this can
> be set by mpam_enable(), but must be done from process context, not a cpuhp
> callback because both take the cpuhp lock.
> Whenever a new MSC has been probed, the mpam_enable() work is scheduled
> to test if all the MSCs have been probed. If probing fails, mpam_disable() is
> scheduled to unregister the cpuhp callbacks and free memory.
> 
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit
  2025-10-17 18:56 ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
  2025-10-22 13:39   ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Zeng Heng
  2025-10-24 18:22   ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management Jonathan Cameron
@ 2025-10-29  7:56   ` Zeng Heng
  2025-10-30  9:52     ` Ben Horgan
  2 siblings, 1 reply; 86+ messages in thread
From: Zeng Heng @ 2025-10-29  7:56 UTC (permalink / raw)
  To: ben.horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, carl, catalin.marinas,
	dakr, dave.martin, david, dfustini, fenghuay, gregkh, gshan,
	guohanjun, jeremy.linton, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong, zengheng4

The MSMON_MBWU register accumulates counts monotonically forward and
would not automatically cleared to zero on overflow. The overflow portion
is exactly what mpam_msmon_overflow_val() computes, there is no need to
additionally subtract mbwu_state->prev_val.

Before invoking write_msmon_ctl_flt_vals(), the overflow bit of the
MSMON_MBWU register must first be read to prevent it from being
inadvertently cleared by the write operation.

Finally, use the overflow bit instead of relying on counter wrap-around
to determine whether an overflow has occurred, that avoids the case where
a wrap-around (now > prev_val) is overlooked. So with this, prev_val no
longer has any use and remove it.

CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
 drivers/resctrl/mpam_devices.c  | 22 +++++++++++++++++-----
 drivers/resctrl/mpam_internal.h |  3 ---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0dd048279e02..db4cec710091 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1039,7 +1039,6 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);

 		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
-		mbwu_state->prev_val = 0;

 		break;
 	default:
@@ -1062,6 +1061,16 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
 	}
 }

+static bool read_msmon_mbwu_is_overflow(struct mpam_msc *msc)
+{
+	u32 ctl;
+
+	ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+	return ctl & (MSMON_CFG_x_CTL_OFLOW_STATUS |
+		      MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L) ?
+		      true : false;
+}
+
 /* Call with MSC lock held */
 static void __ris_msmon_read(void *arg)
 {
@@ -1069,6 +1078,7 @@ static void __ris_msmon_read(void *arg)
 	bool config_mismatch;
 	struct mon_read *m = arg;
 	u64 now, overflow_val = 0;
+	bool mbwu_overflow = false;
 	struct mon_cfg *ctx = m->ctx;
 	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
@@ -1091,6 +1101,7 @@ static void __ris_msmon_read(void *arg)
 			reset_on_next_read = mbwu_state->reset_on_next_read;
 			mbwu_state->reset_on_next_read = false;
 		}
+		mbwu_overflow = read_msmon_mbwu_is_overflow(msc);
 	}

 	/*
@@ -1103,8 +1114,10 @@ static void __ris_msmon_read(void *arg)
 	config_mismatch = cur_flt != flt_val ||
 			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);

-	if (config_mismatch || reset_on_next_read)
+	if (config_mismatch || reset_on_next_read) {
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+		mbwu_overflow = false;
+	}

 	switch (m->type) {
 	case mpam_feat_msmon_csu:
@@ -1138,10 +1151,9 @@ static void __ris_msmon_read(void *arg)
 		mbwu_state = &ris->mbwu_state[ctx->mon];

 		/* Add any pre-overflow value to the mbwu_state->val */
-		if (mbwu_state->prev_val > now)
-			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;
+		if (mbwu_overflow)
+			overflow_val = mpam_msmon_overflow_val(m->type);

-		mbwu_state->prev_val = now;
 		mbwu_state->correction += overflow_val;

 		/* Include bandwidth consumed before the last hardware reset */
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4f25681b56ab..8837c0cd7b0c 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -226,9 +226,6 @@ struct msmon_mbwu_state {
 	bool		reset_on_next_read;
 	struct mon_cfg	cfg;

-	/* The value last read from the hardware. Used to detect overflow. */
-	u64		prev_val;
-
 	/*
 	 * The value to add to the new reading to account for power management,
 	 * and shifts to trigger the overflow interrupt.
--
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit
  2025-10-29  7:56   ` [PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
@ 2025-10-30  9:52     ` Ben Horgan
  0 siblings, 0 replies; 86+ messages in thread
From: Ben Horgan @ 2025-10-30  9:52 UTC (permalink / raw)
  To: Zeng Heng, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, carl, catalin.marinas,
	dakr, dave.martin, david, dfustini, fenghuay, gregkh, gshan,
	guohanjun, jeremy.linton, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	wangkefeng.wang, sunnanyong

Hi Zeng,

On 10/29/25 07:56, Zeng Heng wrote:
> The MSMON_MBWU register accumulates counts monotonically forward and
> would not automatically cleared to zero on overflow. The overflow portion
> is exactly what mpam_msmon_overflow_val() computes, there is no need to
> additionally subtract mbwu_state->prev_val.
> 
> Before invoking write_msmon_ctl_flt_vals(), the overflow bit of the
> MSMON_MBWU register must first be read to prevent it from being
> inadvertently cleared by the write operation.
> 
> Finally, use the overflow bit instead of relying on counter wrap-around
> to determine whether an overflow has occurred, that avoids the case where
> a wrap-around (now > prev_val) is overlooked. So with this, prev_val no
> longer has any use and remove it.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 22 +++++++++++++++++-----
>  drivers/resctrl/mpam_internal.h |  3 ---
>  2 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 0dd048279e02..db4cec710091 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1039,7 +1039,6 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> 
>  		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> -		mbwu_state->prev_val = 0;
> 
>  		break;
>  	default:
> @@ -1062,6 +1061,16 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
>  	}
>  }
> 
> +static bool read_msmon_mbwu_is_overflow(struct mpam_msc *msc)
> +{
> +	u32 ctl;
> +
> +	ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +	return ctl & (MSMON_CFG_x_CTL_OFLOW_STATUS |
> +		      MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L) ?
> +		      true : false;
> +}
> +
>  /* Call with MSC lock held */
>  static void __ris_msmon_read(void *arg)
>  {
> @@ -1069,6 +1078,7 @@ static void __ris_msmon_read(void *arg)
>  	bool config_mismatch;
>  	struct mon_read *m = arg;
>  	u64 now, overflow_val = 0;
> +	bool mbwu_overflow = false;
>  	struct mon_cfg *ctx = m->ctx;
>  	bool reset_on_next_read = false;
>  	struct mpam_msc_ris *ris = m->ris;
> @@ -1091,6 +1101,7 @@ static void __ris_msmon_read(void *arg)
>  			reset_on_next_read = mbwu_state->reset_on_next_read;
>  			mbwu_state->reset_on_next_read = false;
>  		}
> +		mbwu_overflow = read_msmon_mbwu_is_overflow(msc);
>  	}
> 
>  	/*
> @@ -1103,8 +1114,10 @@ static void __ris_msmon_read(void *arg)
>  	config_mismatch = cur_flt != flt_val ||
>  			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
> 
> -	if (config_mismatch || reset_on_next_read)
> +	if (config_mismatch || reset_on_next_read) {
>  		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +		mbwu_overflow = false;
> +	}
> 
>  	switch (m->type) {
>  	case mpam_feat_msmon_csu:
> @@ -1138,10 +1151,9 @@ static void __ris_msmon_read(void *arg)
>  		mbwu_state = &ris->mbwu_state[ctx->mon];
> 
>  		/* Add any pre-overflow value to the mbwu_state->val */
> -		if (mbwu_state->prev_val > now)
> -			overflow_val = mpam_msmon_overflow_val(m->type) - mbwu_state->prev_val;

This all looks fine for overflow, but what we've been forgetting about
is the power management. As James mentioned in his commit message, the
prev_val is after now check is doing double duty. If an msc is powered
down and reset then we lose the count. Hence, to keep an accurate count,
we should be considering this case too.

> +		if (mbwu_overflow)
> +			overflow_val = mpam_msmon_overflow_val(m->type);
> 
> -		mbwu_state->prev_val = now;
>  		mbwu_state->correction += overflow_val;
> 
>  		/* Include bandwidth consumed before the last hardware reset */
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4f25681b56ab..8837c0cd7b0c 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -226,9 +226,6 @@ struct msmon_mbwu_state {
>  	bool		reset_on_next_read;
>  	struct mon_cfg	cfg;
> 
> -	/* The value last read from the hardware. Used to detect overflow. */
> -	u64		prev_val;
> -
>  	/*
>  	 * The value to add to the new reading to account for power management,
>  	 * and shifts to trigger the overflow interrupt.
> --
> 2.25.1
> 
> 
> 

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2025-10-30  9:52 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17 18:56 [PATCH v3 00/29] arm_mpam: Add basic mpam driver James Morse
2025-10-17 18:56 ` [PATCH v3 01/29] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
2025-10-24 11:26   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 02/29] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
2025-10-24 11:29   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 03/29] ACPI / PPTT: Find cache level by cache-id James Morse
2025-10-20 10:34   ` Ben Horgan
2025-10-24 14:15   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 04/29] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
2025-10-20 10:45   ` Ben Horgan
2025-10-22 12:58   ` Jeremy Linton
2025-10-24 14:22     ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 05/29] arm64: kconfig: Add Kconfig entry for MPAM James Morse
2025-10-17 18:56 ` [PATCH v3 06/29] ACPI / MPAM: Parse the MPAM table James Morse
2025-10-20 12:29   ` Ben Horgan
2025-10-24 16:13   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 07/29] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
2025-10-20 12:43   ` Ben Horgan
2025-10-20 15:44   ` Ben Horgan
2025-10-21  9:51   ` Ben Horgan
2025-10-22  0:29   ` Fenghua Yu
2025-10-22 19:00     ` Tushar Dave
2025-10-24 16:25   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 08/29] arm_mpam: Add the class and component structures for firmware described ris James Morse
2025-10-24 16:47   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 09/29] arm_mpam: Add MPAM MSC register layout definitions James Morse
2025-10-17 23:03   ` Fenghua Yu
2025-10-24 17:32   ` Jonathan Cameron
2025-10-27 16:33     ` Ben Horgan
2025-10-29  6:37   ` Shaopeng Tan (Fujitsu)
2025-10-17 18:56 ` [PATCH v3 10/29] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
2025-10-29  7:24   ` Shaopeng Tan (Fujitsu)
2025-10-17 18:56 ` [PATCH v3 11/29] arm_mpam: Probe hardware to find the supported partid/pmg values James Morse
2025-10-24 17:40   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 12/29] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
2025-10-24 17:43   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 13/29] arm_mpam: Probe the hardware features resctrl supports James Morse
2025-10-24 17:47   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 14/29] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
2025-10-17 18:56 ` [PATCH v3 15/29] arm_mpam: Reset MSC controls from cpuhp callbacks James Morse
2025-10-24 17:52   ` Jonathan Cameron
2025-10-29  6:53   ` Shaopeng Tan (Fujitsu)
2025-10-17 18:56 ` [PATCH v3 16/29] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
2025-10-17 18:56 ` [PATCH v3 17/29] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
2025-10-20 15:14   ` Ben Horgan
2025-10-17 18:56 ` [PATCH v3 18/29] arm_mpam: Register and enable IRQs James Morse
2025-10-24 18:03   ` Jonathan Cameron
2025-10-29  7:02   ` Shaopeng Tan (Fujitsu)
2025-10-17 18:56 ` [PATCH v3 19/29] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
2025-10-20 16:28   ` Ben Horgan
2025-10-17 18:56 ` [PATCH v3 20/29] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
2025-10-20 17:04   ` Ben Horgan
2025-10-27  8:47   ` Shaopeng Tan (Fujitsu)
2025-10-29  7:09   ` Shaopeng Tan (Fujitsu)
2025-10-17 18:56 ` [PATCH v3 21/29] arm_mpam: Probe and reset the rest of the features James Morse
2025-10-20 17:16   ` Ben Horgan
2025-10-17 18:56 ` [PATCH v3 22/29] arm_mpam: Add helpers to allocate monitors James Morse
2025-10-17 18:56 ` [PATCH v3 23/29] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
2025-10-24 18:18   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
2025-10-22 13:39   ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Zeng Heng
2025-10-22 16:17     ` Ben Horgan
2025-10-25  8:45       ` Zeng Heng
2025-10-25  9:34         ` [PATCH] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
2025-10-28 17:37           ` Ben Horgan
2025-10-28 17:04         ` [PATCH mpam mpam/snapshot/v6.14-rc1] arm64/mpam: Fix MBWU monitor overflow handling Ben Horgan
2025-10-25  9:01       ` Zeng Heng
2025-10-28 16:01         ` Ben Horgan
2025-10-29  2:49           ` Zeng Heng
2025-10-29  3:59             ` Zeng Heng
2025-10-24 18:22   ` [PATCH v3 24/29] arm_mpam: Track bandwidth counter state for overflow and power management Jonathan Cameron
2025-10-29  7:56   ` [PATCH v2] arm64/mpam: Clean MBWU monitor overflow bit Zeng Heng
2025-10-30  9:52     ` Ben Horgan
2025-10-17 18:56 ` [PATCH v3 25/29] arm_mpam: Probe for long/lwd mbwu counters James Morse
2025-10-22 11:23   ` Ben Horgan
2025-10-24 18:24   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 26/29] arm_mpam: Use long MBWU counters if supported James Morse
2025-10-22 12:31   ` Ben Horgan
2025-10-24 18:29   ` Jonathan Cameron
2025-10-17 18:56 ` [PATCH v3 27/29] arm_mpam: Add helper to reset saved mbwu state James Morse
2025-10-24 18:34   ` Jonathan Cameron
2025-10-29  7:14   ` Shaopeng Tan (Fujitsu)
2025-10-17 18:56 ` [PATCH v3 28/29] arm_mpam: Add kunit test for bitmap reset James Morse
2025-10-17 18:56 ` [PATCH v3 29/29] arm_mpam: Add kunit tests for props_mismatch() James Morse
2025-10-18  1:01 ` [PATCH v3 00/29] arm_mpam: Add basic mpam driver Fenghua Yu
2025-10-23  8:15 ` Shaopeng Tan (Fujitsu)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).