* [PATCH 00/33] arm_mpam: Add basic mpam driver
@ 2025-08-22 15:29 James Morse
2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
` (67 more replies)
0 siblings, 68 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Hello,
This is just enough MPAM driver for the ACPI and DT pre-requisites.
It doesn't contain any of the resctrl code, meaning you can't actually drive it
from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
This will change once the user interface is connected up.
This is the initial group of patches that allows the resctrl code to be built
on top. Including that will increase the number of trees that may need to
coordinate, so breaking it up make sense.
The locking looks very strange - but is influenced by the 'mpam-fb' firmware
interface specification that is still alpha. That thing needs to wait for an
interrupt after every system register write, which significantly impacts the
driver. Some features just won't work, e.g. reading the monitor registers via
perf.
The aim is to not have to make invasive changes to the locking to support the
firmware interface, hence it looks strange from day-1.
I've not found a platform that can test all the behaviours around the monitors,
so this is where I'd expect the most bugs.
The MPAM spec that describes all the system and MMIO registers can be found here:
https://developer.arm.com/documentation/ddi0598/db/?lang=en
(Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
This document has the best overview)
The expectation is this will go via the arm64 tree.
This series is based on v6.17-rc2, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/rv1
The rest of the driver can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.17-rc2
What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/
This series was previously posted here:
[RFC] lore.kernel.org/r/20250711183648.30766-2-james.morse@arm.com
Bugs welcome,
Thanks,
James Morse (29):
cacheinfo: Expose the code to generate a cache-id from a device_node
drivers: base: cacheinfo: Add helper to find the cache size from
cpu+level
ACPI / PPTT: Add a helper to fill a cpumask from a processor container
ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear
levels
ACPI / PPTT: Find cache level by cache-id
ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
arm64: kconfig: Add Kconfig entry for MPAM
ACPI / MPAM: Parse the MPAM table
arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
arm_mpam: Add the class and component structures for ris firmware
described
arm_mpam: Add MPAM MSC register layout definitions
arm_mpam: Add cpuhp callbacks to probe MSC hardware
arm_mpam: Probe MSCs to find the supported partid/pmg values
arm_mpam: Add helpers for managing the locking around the mon_sel
registers
arm_mpam: Probe the hardware features resctrl supports
arm_mpam: Merge supported features during mpam_enable() into
mpam_class
arm_mpam: Reset MSC controls from cpu hp callbacks
arm_mpam: Add a helper to touch an MSC from any CPU
arm_mpam: Extend reset logic to allow devices to be reset any time
arm_mpam: Register and enable IRQs
arm_mpam: Use a static key to indicate when mpam is enabled
arm_mpam: Allow configuration to be applied and restored during cpu
online
arm_mpam: Probe and reset the rest of the features
arm_mpam: Add helpers to allocate monitors
arm_mpam: Add mpam_msmon_read() to read monitor value
arm_mpam: Track bandwidth counter state for overflow and power
management
arm_mpam: Add helper to reset saved mbwu state
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add kunit tests for props_mismatch()
Rob Herring (1):
dt-bindings: arm: Add MPAM MSC binding
Rohit Mathew (2):
arm_mpam: Probe for long/lwd mbwu counters
arm_mpam: Use long MBWU counters if supported
Shanker Donthineni (1):
arm_mpam: Add support for memory controller MSC on DT platforms
.../devicetree/bindings/arm/arm,mpam-msc.yaml | 200 ++
arch/arm64/Kconfig | 19 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/acpi/arm64/Kconfig | 3 +
drivers/acpi/arm64/Makefile | 1 +
drivers/acpi/arm64/mpam.c | 331 ++
drivers/acpi/pptt.c | 230 +-
drivers/acpi/tables.c | 2 +-
drivers/base/cacheinfo.c | 19 +-
drivers/resctrl/Kconfig | 24 +
drivers/resctrl/Makefile | 4 +
drivers/resctrl/mpam_devices.c | 2909 +++++++++++++++++
drivers/resctrl/mpam_internal.h | 692 ++++
drivers/resctrl/test_mpam_devices.c | 390 +++
include/linux/acpi.h | 26 +
include/linux/arm_mpam.h | 56 +
include/linux/cacheinfo.h | 16 +
18 files changed, 4911 insertions(+), 14 deletions(-)
create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
create mode 100644 drivers/acpi/arm64/mpam.c
create mode 100644 drivers/resctrl/Kconfig
create mode 100644 drivers/resctrl/Makefile
create mode 100644 drivers/resctrl/mpam_devices.c
create mode 100644 drivers/resctrl/mpam_internal.h
create mode 100644 drivers/resctrl/test_mpam_devices.c
create mode 100644 include/linux/arm_mpam.h
--
2.20.1
^ permalink raw reply [flat|nested] 130+ messages in thread
* [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-27 10:46 ` Dave Martin
2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
` (66 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM driver identifies caches by id for use with resctrl. It
needs to know the cache-id when probe-ing, but the value isn't set
in cacheinfo until device_initcall().
Expose the code that generates the cache-id. The parts of the MPAM
driver that run early can use this to set up the resctrl structures
before cacheinfo is ready in device_initcall().
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
* Renamed cache_of_get_id() cache_of_calculate_id().
---
drivers/base/cacheinfo.c | 19 +++++++++++++------
include/linux/cacheinfo.h | 1 +
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 613410705a47..f6289d142ba9 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
#define arch_compact_of_hwid(_x) (_x)
#endif
-static void cache_of_set_id(struct cacheinfo *this_leaf,
- struct device_node *cache_node)
+unsigned long cache_of_calculate_id(struct device_node *cache_node)
{
struct device_node *cpu;
- u32 min_id = ~0;
+ unsigned long min_id = ~0UL;
for_each_of_cpu_node(cpu) {
u64 id = of_get_cpu_hwid(cpu, 0);
@@ -219,15 +218,23 @@ static void cache_of_set_id(struct cacheinfo *this_leaf,
id = arch_compact_of_hwid(id);
if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
of_node_put(cpu);
- return;
+ return ~0UL;
}
if (match_cache_node(cpu, cache_node))
min_id = min(min_id, id);
}
- if (min_id != ~0) {
- this_leaf->id = min_id;
+ return min_id;
+}
+
+static void cache_of_set_id(struct cacheinfo *this_leaf,
+ struct device_node *cache_node)
+{
+ unsigned long id = cache_of_calculate_id(cache_node);
+
+ if (id != ~0UL) {
+ this_leaf->id = id;
this_leaf->attributes |= CACHE_ID;
}
}
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index c8f4f0a0b874..2dcbb69139e9 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,
#endif
const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
+unsigned long cache_of_calculate_id(struct device_node *np);
/*
* Get the cacheinfo structure for the cache associated with @cpu at
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-24 17:25 ` Krzysztof Kozlowski
2025-08-27 10:46 ` Dave Martin
2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
` (65 subsequent siblings)
67 siblings, 2 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM needs to know the size of a cache associated with a particular CPU.
The DT/ACPI agnostic way of doing this is to ask cacheinfo.
Add a helper to do this.
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
* Converted to kdoc.
* Simplified helper to use get_cpu_cacheinfo_level().
---
include/linux/cacheinfo.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2dcbb69139e9..e12d6f2c6a57 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
return ci ? ci->id : -1;
}
+/**
+ * get_cpu_cacheinfo_size() - Get the size of the cache.
+ * @cpu: The cpu that is associated with the cache.
+ * @level: The level of the cache as seen by @cpu.
+ *
+ * Callers must hold the cpuhp lock.
+ * Returns the cache-size on success, or 0 for an error.
+ */
+static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
+{
+ struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
+
+ return ci ? ci->size : 0;
+}
+
#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
#define use_arch_cache_info() (true)
#else
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-26 14:45 ` Ben Horgan
2025-08-27 10:48 ` Dave Martin
2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
` (64 subsequent siblings)
67 siblings, 2 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The PPTT describes CPUs and caches, as well as processor containers.
The ACPI table for MPAM describes the set of CPUs that can access an MSC
with the UID of a processor container.
Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.
CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
* Added missing : in kernel-doc
* Made helper return void as this never actually returns an error.
---
drivers/acpi/pptt.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 3 ++
2 files changed, 89 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..4791ca2bdfac 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
return NULL;
}
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
+ * @table_hdr: A reference to the PPTT table.
+ * @parent_node: A pointer to the processor node in the @table_hdr.
+ * @cpus: A cpumask to fill with the CPUs below @parent_node.
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+ struct acpi_pptt_processor *parent_node,
+ cpumask_t *cpus)
+{
+ struct acpi_pptt_processor *cpu_node;
+ u32 acpi_id;
+ int cpu;
+
+ cpumask_clear(cpus);
+
+ for_each_possible_cpu(cpu) {
+ acpi_id = get_acpi_id_for_cpu(cpu);
+ cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+ while (cpu_node) {
+ if (cpu_node == parent_node) {
+ cpumask_set_cpu(cpu, cpus);
+ break;
+ }
+ cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+ }
+ }
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ * processor containers
+ * @acpi_cpu_id: The UID of the processor container.
+ * @cpus: The resulting CPU mask.
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
+ * Container, they may exist purely to describe a Private resource. CPUs
+ * have to be leaves, so a Processor Container is a non-leaf that has the
+ * 'ACPI Processor ID valid' flag set.
+ *
+ * Return: 0 for a complete walk, or an error if the mask is incomplete.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+ struct acpi_pptt_processor *cpu_node;
+ struct acpi_table_header *table_hdr;
+ struct acpi_subtable_header *entry;
+ unsigned long table_end;
+ acpi_status status;
+ bool leaf_flag;
+ u32 proc_sz;
+
+ cpumask_clear(cpus);
+
+ status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
+ if (ACPI_FAILURE(status))
+ return;
+
+ table_end = (unsigned long)table_hdr + table_hdr->length;
+ entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+ sizeof(struct acpi_table_pptt));
+ proc_sz = sizeof(struct acpi_pptt_processor);
+ while ((unsigned long)entry + proc_sz <= table_end) {
+ cpu_node = (struct acpi_pptt_processor *)entry;
+ if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
+ cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
+ leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
+ if (!leaf_flag) {
+ if (cpu_node->acpi_processor_id == acpi_cpu_id)
+ acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+ }
+ }
+ entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+ entry->length);
+ }
+
+ acpi_put_table(table_hdr);
+}
+
static u8 acpi_cache_type(enum cache_type type)
{
switch (type) {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1c5bb1e887cd..f97a9ff678cc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
int find_acpi_cpu_topology_cluster(unsigned int cpu);
int find_acpi_cpu_topology_package(unsigned int cpu);
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
#else
static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
{
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
{
return -EINVAL;
}
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+ cpumask_t *cpus) { }
#endif
void acpi_arch_init(void);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (2 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-27 10:49 ` Dave Martin
2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
` (63 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
acpi_count_levels() passes the number of levels back via a pointer argument.
It also passes this to acpi_find_cache_level() as the starting_level, and
preserves this value as it walks up the cpu_node tree counting the levels.
This means the caller must initialise 'levels' due to acpi_count_levels()
internals. The only caller acpi_get_cache_info() happens to have already
initialised levels to zero, which acpi_count_levels() depends on to get the
correct result.
Two results are passed back from acpi_count_levels(), unlike split_levels,
levels is not optional.
Split these two results up. The mandatory 'levels' is always returned,
which hides the internal details from the caller, and avoids having
duplicated initialisation in all callers. split_levels remains an
optional argument passed back.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Made acpi_count_levels() return the levels value.
---
drivers/acpi/pptt.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 4791ca2bdfac..8f9b9508acba 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
* levels and split cache levels (data/instruction).
* @table_hdr: Pointer to the head of the PPTT table
* @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
* @split_levels: Number of split cache levels (data/instruction) if
- * success. Can by NULL.
+ * success. Can be NULL.
*
+ * Returns number of levels.
* Given a processor node containing a processing unit, walk into it and count
* how many levels exist solely for it, and then walk up each level until we hit
* the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
* split cache levels (data/instruction) that exist at each level on the way
* up.
*/
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
- struct acpi_pptt_processor *cpu_node,
- unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+ struct acpi_pptt_processor *cpu_node,
+ unsigned int *split_levels)
{
+ int starting_level = 0;
+
do {
- acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+ acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
} while (cpu_node);
+
+ return starting_level;
}
/**
@@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
if (!cpu_node)
return -ENOENT;
- acpi_count_levels(table, cpu_node, levels, split_levels);
+ *levels = acpi_count_levels(table, cpu_node, split_levels);
pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
*levels, split_levels ? *split_levels : -1);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (3 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-23 12:14 ` Markus Elfring
` (2 more replies)
2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
` (62 subsequent siblings)
67 siblings, 3 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.
Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.
Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.
Add a cleanup based free-ing mechanism for acpi_get_table().
CC: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* acpi_count_levels() now returns a value.
* Converted the table-get stuff to use Jonathan's cleanup helper.
* Dropped Sudeep's Review tag due to the cleanup change.
---
drivers/acpi/pptt.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 17 ++++++++++++
2 files changed, 81 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 8f9b9508acba..660457644a5b 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
ACPI_PPTT_ACPI_IDENTICAL);
}
+
+/**
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the unified cache
+ *
+ * Determine the level relative to any CPU for the unified cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group unified caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * If one CPUs L2 is shared with another as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+ u32 acpi_cpu_id;
+ int level, cpu, num_levels;
+ struct acpi_pptt_cache *cache;
+ struct acpi_pptt_cache_v1 *cache_v1;
+ struct acpi_pptt_processor *cpu_node;
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+ if (IS_ERR(table))
+ return PTR_ERR(table);
+
+ if (table->revision < 3)
+ return -ENOENT;
+
+ /*
+ * If we found the cache first, we'd still need to walk from each CPU
+ * to find the level...
+ */
+ for_each_possible_cpu(cpu) {
+ acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+ cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+ if (!cpu_node)
+ return -ENOENT;
+ num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+ /* Start at 1 for L1 */
+ for (level = 1; level <= num_levels; level++) {
+ cache = acpi_find_cache_node(table, acpi_cpu_id,
+ ACPI_PPTT_CACHE_TYPE_UNIFIED,
+ level, &cpu_node);
+ if (!cache)
+ continue;
+
+ cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+ cache,
+ sizeof(struct acpi_pptt_cache));
+
+ if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+ cache_v1->cache_id == cache_id)
+ return level;
+ }
+ }
+
+ return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f97a9ff678cc..30c10b1dcdb2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
#ifndef _LINUX_ACPI_H
#define _LINUX_ACPI_H
+#include <linux/cleanup.h>
#include <linux/errno.h>
#include <linux/ioport.h> /* for struct resource */
#include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
void acpi_table_init_complete (void);
int acpi_table_init (void);
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+ struct acpi_table_header *table;
+ int status = acpi_get_table(signature, instance, &table);
+
+ if (ACPI_FAILURE(status))
+ return ERR_PTR(-ENOENT);
+ return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
int __init_or_acpilib acpi_table_parse_entries(char *id,
unsigned long table_size, int entry_id,
@@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
int find_acpi_cpu_topology_package(unsigned int cpu);
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
#else
static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
{
@@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
}
static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+ return -EINVAL;
+}
#endif
void acpi_arch_init(void);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (4 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-27 10:53 ` Dave Martin
2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
` (61 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
MPAM identifies CPUs by the cache_id in the PPTT cache structure.
The driver needs to know which CPUs are associated with the cache,
the CPUs may not all be online, so cacheinfo does not have the
information.
Add a helper to pull this information out of the PPTT.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
---
Changes since RFC:
* acpi_count_levels() now returns a value.
* Converted the table-get stuff to use Jonathan's cleanup helper.
* Dropped Sudeep's Review tag due to the cleanup change.
---
drivers/acpi/pptt.c | 62 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 6 +++++
2 files changed, 68 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 660457644a5b..cb93a9a7f9b6 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
return -ENOENT;
}
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ * specified cache
+ * @cache_id: The id field of the unified cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+ u32 acpi_cpu_id;
+ int level, cpu, num_levels;
+ struct acpi_pptt_cache *cache;
+ struct acpi_pptt_cache_v1 *cache_v1;
+ struct acpi_pptt_processor *cpu_node;
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+ cpumask_clear(cpus);
+
+ if (IS_ERR(table))
+ return -ENOENT;
+
+ if (table->revision < 3)
+ return -ENOENT;
+
+ /*
+ * If we found the cache first, we'd still need to walk from each cpu.
+ */
+ for_each_possible_cpu(cpu) {
+ acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+ cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+ if (!cpu_node)
+ return 0;
+ num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+ /* Start at 1 for L1 */
+ for (level = 1; level <= num_levels; level++) {
+ cache = acpi_find_cache_node(table, acpi_cpu_id,
+ ACPI_PPTT_CACHE_TYPE_UNIFIED,
+ level, &cpu_node);
+ if (!cache)
+ continue;
+
+ cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+ cache,
+ sizeof(struct acpi_pptt_cache));
+
+ if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+ cache_v1->cache_id == cache_id)
+ cpumask_set_cpu(cpu, cpus);
+ }
+ }
+
+ return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30c10b1dcdb2..4ad08f5f1d83 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1555,6 +1555,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
#else
static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
{
@@ -1582,6 +1583,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
{
return -EINVAL;
}
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+ cpumask_t *cpus)
+{
+ return -EINVAL;
+}
#endif
void acpi_arch_init(void);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (5 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-27 8:53 ` Ben Horgan
2025-08-27 11:01 ` Dave Martin
2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
` (60 subsequent siblings)
67 siblings, 2 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it, as MPAM is only found on arm64
platforms, that is where the Kconfig option makes the most sense.
This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and registering the CPUs
properties with the MPAM driver.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
arch/arm64/Kconfig | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..658e47fc0c5a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses.
+config ARM64_MPAM
+ bool "Enable support for MPAM"
+ help
+ Memory Partitioning and Monitoring is an optional extension
+ that allows the CPUs to mark load and store transactions with
+ labels for partition-id and performance-monitoring-group.
+ System components, such as the caches, can use the partition-id
+ to apply a performance policy. MPAM monitors can use the
+ partition-id and performance-monitoring-group to measure the
+ cache occupancy or data throughput.
+
+ Use of this extension requires CPU support, support in the
+ memory system components (MSC), and a description from firmware
+ of where the MSC are in the address space.
+
+ MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
endmenu # "ARMv8.4 architectural features"
menu "ARMv8.5 architectural features"
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (6 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-23 10:55 ` Markus Elfring
2025-08-27 16:05 ` Dave Martin
2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
` (59 subsequent siblings)
67 siblings, 2 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.
CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Used DEFINE_RES_IRQ_NAMED() and friends macros.
* Additional error handling.
* Check for zero sized MSC.
* Allow table revisions greater than 1. (no spec for revision 0!)
* Use cleanup helpers to retrive ACPI tables, which allows some functions
to be folded together.
---
arch/arm64/Kconfig | 1 +
drivers/acpi/arm64/Kconfig | 3 +
drivers/acpi/arm64/Makefile | 1 +
drivers/acpi/arm64/mpam.c | 331 ++++++++++++++++++++++++++++++++++++
drivers/acpi/tables.c | 2 +-
include/linux/arm_mpam.h | 46 +++++
6 files changed, 383 insertions(+), 1 deletion(-)
create mode 100644 drivers/acpi/arm64/mpam.c
create mode 100644 include/linux/arm_mpam.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 658e47fc0c5a..e51ccf1da102 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
config ARM64_MPAM
bool "Enable support for MPAM"
+ select ACPI_MPAM if ACPI
help
Memory Partitioning and Monitoring is an optional extension
that allows the CPUs to mark load and store transactions with
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
config ACPI_APMT
bool
+
+config ACPI_MPAM
+ bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) += apmt.o
obj-$(CONFIG_ACPI_FFH) += ffh.o
obj-$(CONFIG_ACPI_GTDT) += gtdt.o
obj-$(CONFIG_ACPI_IORT) += iort.o
+obj-$(CONFIG_ACPI_MPAM) += mpam.o
obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
obj-$(CONFIG_ARM_AMBA) += amba.o
obj-y += dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..e55fc2729ac5
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE_MASK BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED 0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID BIT(4)
+
+static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
+ int *irq, u32 processor_container_uid)
+{
+ int sense;
+
+ if (!intid)
+ return false;
+
+ if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
+ ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+ return false;
+
+ sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
+
+ /*
+ * If the GSI is in the GIC's PPI range, try and create a partitioned
+ * percpu interrupt.
+ */
+ if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
+ pr_err_once("Partitioned interrupts not supported\n");
+ return false;
+ }
+
+ *irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
+ if (*irq <= 0) {
+ pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
+ intid);
+ return false;
+ }
+
+ return true;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+ struct acpi_mpam_msc_node *tbl_msc,
+ struct resource *res, int *res_idx)
+{
+ u32 flags, aff;
+ int irq;
+
+ flags = tbl_msc->overflow_interrupt_flags;
+ if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+ flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+ aff = tbl_msc->overflow_interrupt_affinity;
+ else
+ aff = ~0;
+ if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
+ res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+ flags = tbl_msc->error_interrupt_flags;
+ if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+ flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+ aff = tbl_msc->error_interrupt_affinity;
+ else
+ aff = ~0;
+ if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
+ res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+ struct acpi_mpam_resource_node *res)
+{
+ int level, nid;
+ u32 cache_id;
+
+ switch (res->locator_type) {
+ case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+ cache_id = res->locator.cache_locator.cache_reference;
+ level = find_acpi_cache_level_from_id(cache_id);
+ if (level <= 0) {
+ pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
+ return -EINVAL;
+ }
+ return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+ level, cache_id);
+ case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+ nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+ if (nid == NUMA_NO_NODE)
+ nid = 0;
+ return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+ 255, nid);
+ default:
+ /* These get discovered later and treated as unknown */
+ return 0;
+ }
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+ struct acpi_mpam_msc_node *tbl_msc)
+{
+ int i, err;
+ struct acpi_mpam_resource_node *resources;
+
+ resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
+ for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+ err = acpi_mpam_parse_resource(msc, &resources[i]);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+ struct platform_device *pdev,
+ u32 *acpi_id)
+{
+ bool acpi_id_valid = false;
+ struct acpi_device *buddy;
+ char hid[16], uid[16];
+ int err;
+
+ memset(&hid, 0, sizeof(hid));
+ memcpy(hid, &tbl_msc->hardware_id_linked_device,
+ sizeof(tbl_msc->hardware_id_linked_device));
+
+ if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+ *acpi_id = tbl_msc->instance_id_linked_device;
+ acpi_id_valid = true;
+ }
+
+ err = snprintf(uid, sizeof(uid), "%u",
+ tbl_msc->instance_id_linked_device);
+ if (err >= sizeof(uid))
+ return acpi_id_valid;
+
+ buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+ if (buddy)
+ device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+ return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+ enum mpam_msc_iface *iface)
+{
+ switch (tbl_msc->interface_type) {
+ case 0:
+ *iface = MPAM_IFACE_MMIO;
+ return 0;
+ case 0xa:
+ *iface = MPAM_IFACE_PCC;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int __init acpi_mpam_parse(void)
+{
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+ char *table_end, *table_offset = (char *)(table + 1);
+ struct property_entry props[4]; /* needs a sentinel */
+ struct acpi_mpam_msc_node *tbl_msc;
+ int next_res, next_prop, err = 0;
+ struct acpi_device *companion;
+ struct platform_device *pdev;
+ enum mpam_msc_iface iface;
+ struct resource res[3];
+ char uid[16];
+ u32 acpi_id;
+
+ if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
+ return 0;
+
+ if (IS_ERR(table))
+ return 0;
+
+ if (table->revision < 1)
+ return 0;
+
+ table_end = (char *)table + table->length;
+
+ while (table_offset < table_end) {
+ tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+ table_offset += tbl_msc->length;
+
+ /*
+ * If any of the reserved fields are set, make no attempt to
+ * parse the msc structure. This will prevent the driver from
+ * probing all the MSC, meaning it can't discover the system
+ * wide supported partid and pmg ranges. This avoids whatever
+ * this MSC is truncating the partids and creating a screaming
+ * error interrupt.
+ */
+ if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
+ continue;
+
+ if (!tbl_msc->mmio_size)
+ continue;
+
+ if (decode_interface_type(tbl_msc, &iface))
+ continue;
+
+ next_res = 0;
+ next_prop = 0;
+ memset(res, 0, sizeof(res));
+ memset(props, 0, sizeof(props));
+
+ pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
+ if (!pdev) {
+ err = -ENOMEM;
+ break;
+ }
+
+ if (tbl_msc->length < sizeof(*tbl_msc)) {
+ err = -EINVAL;
+ break;
+ }
+
+ /* Some power management is described in the namespace: */
+ err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+ if (err > 0 && err < sizeof(uid)) {
+ companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+ if (companion)
+ ACPI_COMPANION_SET(&pdev->dev, companion);
+ }
+
+ if (iface == MPAM_IFACE_MMIO) {
+ res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+ tbl_msc->mmio_size,
+ "MPAM:MSC");
+ } else if (iface == MPAM_IFACE_PCC) {
+ props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+ tbl_msc->base_address);
+ next_prop++;
+ }
+
+ acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+ err = platform_device_add_resources(pdev, res, next_res);
+ if (err)
+ break;
+
+ props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+ tbl_msc->max_nrdy_usec);
+
+ /*
+ * The MSC's CPU affinity is described via its linked power
+ * management device, but only if it points at a Processor or
+ * Processor Container.
+ */
+ if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
+ props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
+ acpi_id);
+ }
+
+ err = device_create_managed_software_node(&pdev->dev, props,
+ NULL);
+ if (err)
+ break;
+
+ /* Come back later if you want the RIS too */
+ err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+ if (err)
+ break;
+
+ err = platform_device_add(pdev);
+ if (err)
+ break;
+ }
+
+ if (err)
+ platform_device_put(pdev);
+
+ return err;
+}
+
+int acpi_mpam_count_msc(void)
+{
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+ char *table_end, *table_offset = (char *)(table + 1);
+ struct acpi_mpam_msc_node *tbl_msc;
+ int count = 0;
+
+ if (IS_ERR(table))
+ return 0;
+
+ if (table->revision < 1)
+ return 0;
+
+ tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+ table_end = (char *)table + table->length;
+
+ while (table_offset < table_end) {
+ if (!tbl_msc->mmio_size)
+ continue;
+
+ if (tbl_msc->length < sizeof(*tbl_msc))
+ return -EINVAL;
+
+ count++;
+
+ table_offset += tbl_msc->length;
+ tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+ }
+
+ return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index fa9bb8c8ce95..835e3795ede3 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
- ACPI_SIG_NBFT };
+ ACPI_SIG_NBFT, ACPI_SIG_MPAM };
#define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..0edefa6ba019
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+ MPAM_IFACE_MMIO, /* a real MPAM MSC */
+ MPAM_IFACE_PCC, /* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+ MPAM_CLASS_CACHE, /* Well known caches, e.g. L2 */
+ MPAM_CLASS_MEMORY, /* Main memory */
+ MPAM_CLASS_UNKNOWN, /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+/* Parse the ACPI description of resources entries for this MSC. */
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+ struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+ struct acpi_mpam_msc_node *tbl_msc)
+{
+ return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id,
+ int component_id)
+{
+ return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (7 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-27 16:22 ` Dave Martin
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
` (58 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rob Herring <robh@kernel.org>
The binding is designed around the assumption that an MSC will be a
sub-block of something else such as a memory controller, cache controller,
or IOMMU. However, it's certainly possible a design does not have that
association or has a mixture of both, so the binding illustrates how we can
support that with RIS child nodes.
A key part of MPAM is we need to know about all of the MSCs in the system
before it can be enabled. This drives the need for the genericish
'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
until a h/w specific driver potentially enables the h/w.
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Syntax(?) corrections supplied by Rob.
* Culled some context in the example.
---
.../devicetree/bindings/arm/arm,mpam-msc.yaml | 200 ++++++++++++++++++
1 file changed, 200 insertions(+)
create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
new file mode 100644
index 000000000000..d984817b3385
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
@@ -0,0 +1,200 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/arm,mpam-msc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
+
+description: |
+ The Arm MPAM specification can be found here:
+
+ https://developer.arm.com/documentation/ddi0598/latest
+
+maintainers:
+ - Rob Herring <robh@kernel.org>
+
+properties:
+ compatible:
+ items:
+ - const: arm,mpam-msc # Further details are discoverable
+ - const: arm,mpam-memory-controller-msc
+
+ reg:
+ maxItems: 1
+ description: A memory region containing registers as defined in the MPAM
+ specification.
+
+ interrupts:
+ minItems: 1
+ items:
+ - description: error (optional)
+ - description: overflow (optional, only for monitoring)
+
+ interrupt-names:
+ oneOf:
+ - items:
+ - enum: [ error, overflow ]
+ - items:
+ - const: error
+ - const: overflow
+
+ arm,not-ready-us:
+ description: The maximum time in microseconds for monitoring data to be
+ accurate after a settings change. For more information, see the
+ Not-Ready (NRDY) bit description in the MPAM specification.
+
+ numa-node-id: true # see NUMA binding
+
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+patternProperties:
+ '^ris@[0-9a-f]$':
+ type: object
+ additionalProperties: false
+ description:
+ RIS nodes for each RIS in an MSC. These nodes are required for each RIS
+ implementing known MPAM controls
+
+ properties:
+ compatible:
+ enum:
+ # Bulk storage for cache
+ - arm,mpam-cache
+ # Memory bandwidth
+ - arm,mpam-memory
+
+ reg:
+ minimum: 0
+ maximum: 0xf
+
+ cpus:
+ description:
+ Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
+ device's affinity is used.
+
+ arm,mpam-device:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ By default, the MPAM enabled device associated with a RIS is the MSC's
+ parent node. It is possible for each RIS to be associated with different
+ devices in which case 'arm,mpam-device' should be used.
+
+ required:
+ - compatible
+ - reg
+
+required:
+ - compatible
+ - reg
+
+dependencies:
+ interrupts: [ interrupt-names ]
+
+additionalProperties: false
+
+examples:
+ - |
+ L3: cache-controller@30000000 {
+ compatible = "arm,dsu-l3-cache", "cache";
+ cache-level = <3>;
+ cache-unified;
+
+ ranges = <0x0 0x30000000 0x800000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ msc@10000 {
+ compatible = "arm,mpam-msc";
+
+ /* CPU affinity implied by parent cache node's */
+ reg = <0x10000 0x2000>;
+ interrupts = <1>, <2>;
+ interrupt-names = "error", "overflow";
+ arm,not-ready-us = <1>;
+ };
+ };
+
+ mem: memory-controller@20000 {
+ compatible = "foo,a-memory-controller";
+ reg = <0x20000 0x1000>;
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ msc@21000 {
+ compatible = "arm,mpam-memory-controller-msc", "arm,mpam-msc";
+ reg = <0x21000 0x1000>;
+ interrupts = <3>;
+ interrupt-names = "error";
+ arm,not-ready-us = <1>;
+ numa-node-id = <1>;
+ };
+ };
+
+ iommu@40000 {
+ reg = <0x40000 0x1000>;
+
+ ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ msc@41000 {
+ compatible = "arm,mpam-msc";
+ reg = <0 0x1000>;
+ interrupts = <5>, <6>;
+ interrupt-names = "error", "overflow";
+ arm,not-ready-us = <1>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ris@2 {
+ compatible = "arm,mpam-cache";
+ reg = <0>;
+ // TODO: How to map to device(s)?
+ };
+ };
+ };
+
+ msc@80000 {
+ compatible = "foo,a-standalone-msc";
+ reg = <0x80000 0x1000>;
+
+ clocks = <&clks 123>;
+
+ ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ msc@10000 {
+ compatible = "arm,mpam-msc";
+
+ reg = <0x10000 0x2000>;
+ interrupts = <7>;
+ interrupt-names = "overflow";
+ arm,not-ready-us = <1>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ris@0 {
+ compatible = "arm,mpam-cache";
+ reg = <0>;
+ arm,mpam-device = <&L2_0>;
+ };
+
+ ris@1 {
+ compatible = "arm,mpam-memory";
+ reg = <1>;
+ arm,mpam-device = <&mem>;
+ };
+ };
+ };
+
+...
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (8 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-22 19:15 ` Markus Elfring
` (5 more replies)
2025-08-22 15:29 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
` (57 subsequent siblings)
67 siblings, 6 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.
Start with driver probe/remove and mapping the MSC.
CC: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Check for status=broken DT devices.
* Moved all the files around.
* Made Kconfig symbols depend on EXPERT
---
arch/arm64/Kconfig | 1 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/resctrl/Kconfig | 11 ++
drivers/resctrl/Makefile | 4 +
drivers/resctrl/mpam_devices.c | 336 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 62 ++++++
7 files changed, 417 insertions(+)
create mode 100644 drivers/resctrl/Kconfig
create mode 100644 drivers/resctrl/Makefile
create mode 100644 drivers/resctrl/mpam_devices.c
create mode 100644 drivers/resctrl/mpam_internal.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e51ccf1da102..ea3c54e04275 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
config ARM64_MPAM
bool "Enable support for MPAM"
+ select ARM64_MPAM_DRIVER
select ACPI_MPAM if ACPI
help
Memory Partitioning and Monitoring is an optional extension
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
source "drivers/cdx/Kconfig"
+source "drivers/resctrl/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index b5749cf67044..f41cf4eddeba 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,5 +194,6 @@ obj-$(CONFIG_HTE) += hte/
obj-$(CONFIG_DRM_ACCEL) += accel/
obj-$(CONFIG_CDX_BUS) += cdx/
obj-$(CONFIG_DPLL) += dpll/
+obj-y += resctrl/
obj-$(CONFIG_S390) += s390/
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..dff7b87280ab
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,11 @@
+# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
+# CPU resources, not containers or cgroups etc.
+config ARM64_MPAM_DRIVER
+ bool "MPAM driver for System IP, e,g. caches and memory controllers"
+ depends on ARM64_MPAM && EXPERT
+
+config ARM64_MPAM_DRIVER_DEBUG
+ bool "Enable debug messages from the MPAM driver."
+ depends on ARM64_MPAM_DRIVER
+ help
+ Say yes here to enable debug messages from the MPAM driver.
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..92b48fa20108
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
+mpam-y += mpam_devices.o
+
+cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..a0d9a699a6e7
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+#include <acpi/pcc.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+static struct srcu_struct mpam_srcu;
+
+/* MPAM isn't available until all the MSC have been probed. */
+static u32 mpam_num_msc;
+
+static void mpam_discovery_complete(void)
+{
+ pr_err("Discovered all MSC\n");
+}
+
+static int mpam_dt_count_msc(void)
+{
+ int count = 0;
+ struct device_node *np;
+
+ for_each_compatible_node(np, NULL, "arm,mpam-msc") {
+ if (of_device_is_available(np))
+ count++;
+ }
+
+ return count;
+}
+
+static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
+ u32 ris_idx)
+{
+ int err = 0;
+ u32 level = 0;
+ unsigned long cache_id;
+ struct device_node *cache;
+
+ do {
+ if (of_device_is_compatible(np, "arm,mpam-cache")) {
+ cache = of_parse_phandle(np, "arm,mpam-device", 0);
+ if (!cache) {
+ pr_err("Failed to read phandle\n");
+ break;
+ }
+ } else if (of_device_is_compatible(np->parent, "cache")) {
+ cache = of_node_get(np->parent);
+ } else {
+ /* For now, only caches are supported */
+ cache = NULL;
+ break;
+ }
+
+ err = of_property_read_u32(cache, "cache-level", &level);
+ if (err) {
+ pr_err("Failed to read cache-level\n");
+ break;
+ }
+
+ cache_id = cache_of_calculate_id(cache);
+ if (cache_id == ~0UL) {
+ err = -ENOENT;
+ break;
+ }
+
+ err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
+ cache_id);
+ } while (0);
+ of_node_put(cache);
+
+ return err;
+}
+
+static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
+{
+ int err, num_ris = 0;
+ const u32 *ris_idx_p;
+ struct device_node *iter, *np;
+
+ np = msc->pdev->dev.of_node;
+ for_each_child_of_node(np, iter) {
+ ris_idx_p = of_get_property(iter, "reg", NULL);
+ if (ris_idx_p) {
+ num_ris++;
+ err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
+ if (err) {
+ of_node_put(iter);
+ return err;
+ }
+ }
+ }
+
+ if (!num_ris)
+ mpam_dt_parse_resource(msc, np, 0);
+
+ return err;
+}
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * the corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+ struct device_node *parent;
+ u32 affinity_id;
+ int err;
+
+ if (!acpi_disabled) {
+ err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+ &affinity_id);
+ if (err)
+ cpumask_copy(&msc->accessibility, cpu_possible_mask);
+ else
+ acpi_pptt_get_cpus_from_container(affinity_id,
+ &msc->accessibility);
+
+ return 0;
+ }
+
+ /* This depends on the path to of_node */
+ parent = of_get_parent(msc->pdev->dev.of_node);
+ if (parent == of_root) {
+ cpumask_copy(&msc->accessibility, cpu_possible_mask);
+ err = 0;
+ } else {
+ err = -EINVAL;
+ pr_err("Cannot determine accessibility of MSC: %s\n",
+ dev_name(&msc->pdev->dev));
+ }
+ of_node_put(parent);
+
+ return err;
+}
+
+static int fw_num_msc;
+
+static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
+{
+ /* TODO: wake up tasks blocked on this MSC's PCC channel */
+}
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+ struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+ if (!msc)
+ return;
+
+ mutex_lock(&mpam_list_lock);
+ mpam_num_msc--;
+ platform_set_drvdata(pdev, NULL);
+ list_del_rcu(&msc->glbl_list);
+ synchronize_srcu(&mpam_srcu);
+ devm_kfree(&pdev->dev, msc);
+ mutex_unlock(&mpam_list_lock);
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+ int err;
+ struct mpam_msc *msc;
+ struct resource *msc_res;
+ void *plat_data = pdev->dev.platform_data;
+
+ mutex_lock(&mpam_list_lock);
+ do {
+ msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+ if (!msc) {
+ err = -ENOMEM;
+ break;
+ }
+
+ mutex_init(&msc->probe_lock);
+ mutex_init(&msc->part_sel_lock);
+ mutex_init(&msc->outer_mon_sel_lock);
+ raw_spin_lock_init(&msc->inner_mon_sel_lock);
+ msc->id = mpam_num_msc++;
+ msc->pdev = pdev;
+ INIT_LIST_HEAD_RCU(&msc->glbl_list);
+ INIT_LIST_HEAD_RCU(&msc->ris);
+
+ err = update_msc_accessibility(msc);
+ if (err)
+ break;
+ if (cpumask_empty(&msc->accessibility)) {
+ pr_err_once("msc:%u is not accessible from any CPU!",
+ msc->id);
+ err = -EINVAL;
+ break;
+ }
+
+ if (device_property_read_u32(&pdev->dev, "pcc-channel",
+ &msc->pcc_subspace_id))
+ msc->iface = MPAM_IFACE_MMIO;
+ else
+ msc->iface = MPAM_IFACE_PCC;
+
+ if (msc->iface == MPAM_IFACE_MMIO) {
+ void __iomem *io;
+
+ io = devm_platform_get_and_ioremap_resource(pdev, 0,
+ &msc_res);
+ if (IS_ERR(io)) {
+ pr_err("Failed to map MSC base address\n");
+ err = PTR_ERR(io);
+ break;
+ }
+ msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+ msc->mapped_hwpage = io;
+ } else if (msc->iface == MPAM_IFACE_PCC) {
+ msc->pcc_cl.dev = &pdev->dev;
+ msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
+ msc->pcc_cl.tx_block = false;
+ msc->pcc_cl.tx_tout = 1000; /* 1s */
+ msc->pcc_cl.knows_txdone = false;
+
+ msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
+ msc->pcc_subspace_id);
+ if (IS_ERR(msc->pcc_chan)) {
+ pr_err("Failed to request MSC PCC channel\n");
+ err = PTR_ERR(msc->pcc_chan);
+ break;
+ }
+ }
+
+ list_add_rcu(&msc->glbl_list, &mpam_all_msc);
+ platform_set_drvdata(pdev, msc);
+ } while (0);
+ mutex_unlock(&mpam_list_lock);
+
+ if (!err) {
+ /* Create RIS entries described by firmware */
+ if (!acpi_disabled)
+ err = acpi_mpam_parse_resources(msc, plat_data);
+ else
+ err = mpam_dt_parse_resources(msc, plat_data);
+ }
+
+ if (!err && fw_num_msc == mpam_num_msc)
+ mpam_discovery_complete();
+
+ if (err && msc)
+ mpam_msc_drv_remove(pdev);
+
+ return err;
+}
+
+static const struct of_device_id mpam_of_match[] = {
+ { .compatible = "arm,mpam-msc", },
+ {},
+};
+MODULE_DEVICE_TABLE(of, mpam_of_match);
+
+static struct platform_driver mpam_msc_driver = {
+ .driver = {
+ .name = "mpam_msc",
+ .of_match_table = of_match_ptr(mpam_of_match),
+ },
+ .probe = mpam_msc_drv_probe,
+ .remove = mpam_msc_drv_remove,
+};
+
+/*
+ * MSC that are hidden under caches are not created as platform devices
+ * as there is no cache driver. Caches are also special-cased in
+ * update_msc_accessibility().
+ */
+static void mpam_dt_create_foundling_msc(void)
+{
+ int err;
+ struct device_node *cache;
+
+ for_each_compatible_node(cache, NULL, "cache") {
+ err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
+ if (err)
+ pr_err("Failed to create MSC devices under caches\n");
+ }
+}
+
+static int __init mpam_msc_driver_init(void)
+{
+ if (!system_supports_mpam())
+ return -EOPNOTSUPP;
+
+ init_srcu_struct(&mpam_srcu);
+
+ if (!acpi_disabled)
+ fw_num_msc = acpi_mpam_count_msc();
+ else
+ fw_num_msc = mpam_dt_count_msc();
+
+ if (fw_num_msc <= 0) {
+ pr_err("No MSC devices found in firmware\n");
+ return -EINVAL;
+ }
+
+ if (acpi_disabled)
+ mpam_dt_create_foundling_msc();
+
+ return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..07e0f240eaca
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2024 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/resctrl.h>
+#include <linux/sizes.h>
+
+struct mpam_msc {
+ /* member of mpam_all_msc */
+ struct list_head glbl_list;
+
+ int id;
+ struct platform_device *pdev;
+
+ /* Not modified after mpam_is_enabled() becomes true */
+ enum mpam_msc_iface iface;
+ u32 pcc_subspace_id;
+ struct mbox_client pcc_cl;
+ struct pcc_mbox_chan *pcc_chan;
+ u32 nrdy_usec;
+ cpumask_t accessibility;
+
+ /*
+ * probe_lock is only take during discovery. After discovery these
+ * properties become read-only and the lists are protected by SRCU.
+ */
+ struct mutex probe_lock;
+ unsigned long ris_idxs[128 / BITS_PER_LONG];
+ u32 ris_max;
+
+ /* mpam_msc_ris of this component */
+ struct list_head ris;
+
+ /*
+ * part_sel_lock protects access to the MSC hardware registers that are
+ * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+ * by RIS).
+ * If needed, take msc->lock first.
+ */
+ struct mutex part_sel_lock;
+
+ /*
+ * mon_sel_lock protects access to the MSC hardware registers that are
+ * affeted by MPAMCFG_MON_SEL.
+ * If needed, take msc->lock first.
+ */
+ struct mutex outer_mon_sel_lock;
+ raw_spinlock_t inner_mon_sel_lock;
+ unsigned long inner_mon_sel_flags;
+
+ void __iomem *mapped_hwpage;
+ size_t mapped_hwpage_sz;
+};
+
+#endif /* MPAM_INTERNAL_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (9 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
` (56 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Shanker Donthineni <sdonthineni@nvidia.com>
The device-tree binding has two examples for MSC associated with
memory controllers. Add the support to discover the component_id
from the device-tree and create 'memory' RIS.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: split out of a bigger patch, added affinity piece ]
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 67 ++++++++++++++++++++++++----------
1 file changed, 47 insertions(+), 20 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a0d9a699a6e7..71a1fb1a9c75 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -62,41 +62,63 @@ static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
u32 ris_idx)
{
int err = 0;
- u32 level = 0;
- unsigned long cache_id;
- struct device_node *cache;
+ u32 class_id = 0, component_id = 0;
+ struct device_node *cache = NULL, *memory = NULL;
+ enum mpam_class_types type = MPAM_CLASS_UNKNOWN;
do {
+ /* What kind of MSC is this? */
if (of_device_is_compatible(np, "arm,mpam-cache")) {
cache = of_parse_phandle(np, "arm,mpam-device", 0);
if (!cache) {
pr_err("Failed to read phandle\n");
break;
}
+ type = MPAM_CLASS_CACHE;
} else if (of_device_is_compatible(np->parent, "cache")) {
cache = of_node_get(np->parent);
+ type = MPAM_CLASS_CACHE;
+ } else if (of_device_is_compatible(np, "arm,mpam-memory")) {
+ memory = of_parse_phandle(np, "arm,mpam-device", 0);
+ if (!memory) {
+ pr_err("Failed to read phandle\n");
+ break;
+ }
+ type = MPAM_CLASS_MEMORY;
+ } else if (of_device_is_compatible(np, "arm,mpam-memory-controller-msc")) {
+ memory = of_node_get(np->parent);
+ type = MPAM_CLASS_MEMORY;
} else {
- /* For now, only caches are supported */
- cache = NULL;
+ /*
+ * For now, only caches and memory controllers are
+ * supported.
+ */
break;
}
- err = of_property_read_u32(cache, "cache-level", &level);
- if (err) {
- pr_err("Failed to read cache-level\n");
- break;
- }
-
- cache_id = cache_of_calculate_id(cache);
- if (cache_id == ~0UL) {
- err = -ENOENT;
- break;
+ /* Determine the class and component ids, based on type. */
+ if (type == MPAM_CLASS_CACHE) {
+ err = of_property_read_u32(cache, "cache-level", &class_id);
+ if (err) {
+ pr_err("Failed to read cache-level\n");
+ break;
+ }
+ component_id = cache_of_calculate_id(cache);
+ if (component_id == ~0UL) {
+ err = -ENOENT;
+ break;
+ }
+ } else if (type == MPAM_CLASS_MEMORY) {
+ err = of_node_to_nid(np);
+ component_id = (err == NUMA_NO_NODE) ? 0 : err;
+ class_id = 255;
}
- err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
- cache_id);
+ err = mpam_ris_create(msc, ris_idx, type, class_id,
+ component_id);
} while (0);
of_node_put(cache);
+ of_node_put(memory);
return err;
}
@@ -157,9 +179,14 @@ static int update_msc_accessibility(struct mpam_msc *msc)
cpumask_copy(&msc->accessibility, cpu_possible_mask);
err = 0;
} else {
- err = -EINVAL;
- pr_err("Cannot determine accessibility of MSC: %s\n",
- dev_name(&msc->pdev->dev));
+ if (of_device_is_compatible(parent, "memory")) {
+ cpumask_copy(&msc->accessibility, cpu_possible_mask);
+ err = 0;
+ } else {
+ err = -EINVAL;
+ pr_err("Cannot determine accessibility of MSC: %s\n",
+ dev_name(&msc->pdev->dev));
+ }
}
of_node_put(parent);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (10 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-28 1:29 ` Fenghua Yu
2025-09-01 11:09 ` Dave Martin
2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
` (55 subsequent siblings)
67 siblings, 2 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.
To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)
struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
This is to allow hardware implementations where two controls are presented
as different RIS. Re-combining these RIS allows their feature bits to
be or-ed. This structure is not visible outside mpam_devices.c
struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
visible as each L2 cache may be composed of individual slices which need
to be configured the same as the hardware is not able to distribute the
configuration.
Add support for creating and destroying these structures.
A gfp is passed as the structures may need creating when a new RIS entry
is discovered when probing the MSC.
CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* removed a pr_err() debug message that crept in.
---
drivers/resctrl/mpam_devices.c | 488 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 91 ++++++
include/linux/arm_mpam.h | 8 +-
3 files changed, 574 insertions(+), 13 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 71a1fb1a9c75..5baf2a8786fb 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -20,7 +20,6 @@
#include <linux/printk.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
-#include <linux/srcu.h>
#include <linux/types.h>
#include <acpi/pcc.h>
@@ -35,11 +34,483 @@
static DEFINE_MUTEX(mpam_list_lock);
static LIST_HEAD(mpam_all_msc);
-static struct srcu_struct mpam_srcu;
+struct srcu_struct mpam_srcu;
/* MPAM isn't available until all the MSC have been probed. */
static u32 mpam_num_msc;
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
+{
+ struct mpam_vmsc *vmsc;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ vmsc = kzalloc(sizeof(*vmsc), gfp);
+ if (!comp)
+ return ERR_PTR(-ENOMEM);
+ init_garbage(vmsc);
+
+ INIT_LIST_HEAD_RCU(&vmsc->ris);
+ INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+ vmsc->comp = comp;
+ vmsc->msc = msc;
+
+ list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+ return vmsc;
+}
+
+static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
+ struct mpam_msc *msc, bool alloc,
+ gfp_t gfp)
+{
+ struct mpam_vmsc *vmsc;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ if (vmsc->msc->id == msc->id)
+ return vmsc;
+ }
+
+ if (!alloc)
+ return ERR_PTR(-ENOENT);
+
+ return mpam_vmsc_alloc(comp, msc, gfp);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp)
+{
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ comp = kzalloc(sizeof(*comp), gfp);
+ if (!comp)
+ return ERR_PTR(-ENOMEM);
+ init_garbage(comp);
+
+ comp->comp_id = id;
+ INIT_LIST_HEAD_RCU(&comp->vmsc);
+ /* affinity is updated when ris are added */
+ INIT_LIST_HEAD_RCU(&comp->class_list);
+ comp->class = class;
+
+ list_add_rcu(&comp->class_list, &class->components);
+
+ return comp;
+}
+
+static struct mpam_component *
+mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t gfp)
+{
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(comp, &class->components, class_list) {
+ if (comp->comp_id == id)
+ return comp;
+ }
+
+ if (!alloc)
+ return ERR_PTR(-ENOENT);
+
+ return mpam_component_alloc(class, id, gfp);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
+{
+ struct mpam_class *class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ class = kzalloc(sizeof(*class), gfp);
+ if (!class)
+ return ERR_PTR(-ENOMEM);
+ init_garbage(class);
+
+ INIT_LIST_HEAD_RCU(&class->components);
+ /* affinity is updated when ris are added */
+ class->level = level_idx;
+ class->type = type;
+ INIT_LIST_HEAD_RCU(&class->classes_list);
+
+ list_add_rcu(&class->classes_list, &mpam_classes);
+
+ return class;
+}
+
+static struct mpam_class *
+mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc, gfp_t gfp)
+{
+ bool found = false;
+ struct mpam_class *class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(class, &mpam_classes, classes_list) {
+ if (class->type == type && class->level == level_idx) {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ return class;
+
+ if (!alloc)
+ return ERR_PTR(-ENOENT);
+
+ return mpam_class_alloc(level_idx, type, gfp);
+}
+
+#define add_to_garbage(x) \
+do { \
+ __typeof__(x) _x = x; \
+ (_x)->garbage.to_free = (_x); \
+ llist_add(&(_x)->garbage.llist, &mpam_garbage); \
+} while (0)
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&class->classes_list);
+ add_to_garbage(class);
+}
+
+static void mpam_comp_destroy(struct mpam_component *comp)
+{
+ struct mpam_class *class = comp->class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&comp->class_list);
+ add_to_garbage(comp);
+
+ if (list_empty(&class->components))
+ mpam_class_destroy(class);
+}
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+ struct mpam_component *comp = vmsc->comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&vmsc->comp_list);
+ add_to_garbage(vmsc);
+
+ if (list_empty(&comp->vmsc))
+ mpam_comp_destroy(comp);
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+ struct mpam_vmsc *vmsc = ris->vmsc;
+ struct mpam_msc *msc = vmsc->msc;
+ struct platform_device *pdev = msc->pdev;
+ struct mpam_component *comp = vmsc->comp;
+ struct mpam_class *class = comp->class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+ cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+ clear_bit(ris->ris_idx, msc->ris_idxs);
+ list_del_rcu(&ris->vmsc_list);
+ list_del_rcu(&ris->msc_list);
+ add_to_garbage(ris);
+ ris->garbage.pdev = pdev;
+
+ if (list_empty(&vmsc->ris))
+ mpam_vmsc_destroy(vmsc);
+}
+
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+ struct platform_device *pdev = msc->pdev;
+ struct mpam_msc_ris *ris, *tmp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&msc->glbl_list);
+ platform_set_drvdata(pdev, NULL);
+
+ list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+ mpam_ris_destroy(ris);
+
+ add_to_garbage(msc);
+ msc->garbage.pdev = pdev;
+}
+
+static void mpam_free_garbage(void)
+{
+ struct mpam_garbage *iter, *tmp;
+ struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+ if (!to_free)
+ return;
+
+ synchronize_srcu(&mpam_srcu);
+
+ llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+ if (iter->pdev)
+ devm_kfree(&iter->pdev->dev, iter->to_free);
+ else
+ kfree(iter->to_free);
+ }
+}
+
+/* Called recursively to walk the list of caches from a particular CPU */
+static void __mpam_get_cpumask_from_cache_id(int cpu, struct device_node *cache_node,
+ unsigned long cache_id,
+ u32 cache_level,
+ cpumask_t *affinity)
+{
+ int err;
+ u32 iter_level;
+ unsigned long iter_cache_id;
+ struct device_node *iter_node __free(device_node) = of_find_next_cache_node(cache_node);
+
+ if (!iter_node)
+ return;
+
+ err = of_property_read_u32(iter_node, "cache-level", &iter_level);
+ if (err)
+ return;
+
+ /*
+ * get_cpu_cacheinfo_id() isn't ready until sometime
+ * during device_initcall(). Use cache_of_calculate_id().
+ */
+ iter_cache_id = cache_of_calculate_id(iter_node);
+ if (cache_id == ~0UL)
+ return;
+
+ if (iter_level == cache_level && iter_cache_id == cache_id)
+ cpumask_set_cpu(cpu, affinity);
+
+ __mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id, cache_level,
+ affinity);
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ * This helper walks the device tree to include offline CPUs too.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+ cpumask_t *affinity)
+{
+ int cpu;
+
+ if (!acpi_disabled)
+ return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+
+ for_each_possible_cpu(cpu) {
+ struct device_node *cpu_node __free(device_node) = of_get_cpu_node(cpu, NULL);
+ if (!cpu_node) {
+ pr_err("Failed to find cpu%d device node\n", cpu);
+ return -ENOENT;
+ }
+
+ __mpam_get_cpumask_from_cache_id(cpu, cpu_node, cache_id,
+ cache_level, affinity);
+ continue;
+ }
+
+ return 0;
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (node_id == cpu_to_node(cpu))
+ cpumask_set_cpu(cpu, affinity);
+ }
+}
+
+static int get_cpumask_from_cache(struct device_node *cache,
+ cpumask_t *affinity)
+{
+ int err;
+ u32 cache_level;
+ unsigned long cache_id;
+
+ err = of_property_read_u32(cache, "cache-level", &cache_level);
+ if (err) {
+ pr_err("Failed to read cache-level from cache node\n");
+ return -ENOENT;
+ }
+
+ cache_id = cache_of_calculate_id(cache);
+ if (cache_id == ~0UL) {
+ pr_err("Failed to calculate cache-id from cache node\n");
+ return -ENOENT;
+ }
+
+ return mpam_get_cpumask_from_cache_id(cache_id, cache_level, affinity);
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+ enum mpam_class_types type,
+ struct mpam_class *class,
+ struct mpam_component *comp)
+{
+ int err;
+
+ switch (type) {
+ case MPAM_CLASS_CACHE:
+ err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+ affinity);
+ if (err)
+ return err;
+
+ if (cpumask_empty(affinity))
+ pr_warn_once("%s no CPUs associated with cache node",
+ dev_name(&msc->pdev->dev));
+
+ break;
+ case MPAM_CLASS_MEMORY:
+ get_cpumask_from_node_id(comp->comp_id, affinity);
+ /* affinity may be empty for CPU-less memory nodes */
+ break;
+ case MPAM_CLASS_UNKNOWN:
+ return 0;
+ }
+
+ cpumask_and(affinity, affinity, &msc->accessibility);
+
+ return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id,
+ int component_id, gfp_t gfp)
+{
+ int err;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+ struct mpam_class *class;
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ if (test_and_set_bit(ris_idx, msc->ris_idxs))
+ return -EBUSY;
+
+ ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
+ if (!ris)
+ return -ENOMEM;
+ init_garbage(ris);
+
+ class = mpam_class_get(class_id, type, true, gfp);
+ if (IS_ERR(class))
+ return PTR_ERR(class);
+
+ comp = mpam_component_get(class, component_id, true, gfp);
+ if (IS_ERR(comp)) {
+ if (list_empty(&class->components))
+ mpam_class_destroy(class);
+ return PTR_ERR(comp);
+ }
+
+ vmsc = mpam_vmsc_get(comp, msc, true, gfp);
+ if (IS_ERR(vmsc)) {
+ if (list_empty(&comp->vmsc))
+ mpam_comp_destroy(comp);
+ return PTR_ERR(vmsc);
+ }
+
+ err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+ if (err) {
+ if (list_empty(&vmsc->ris))
+ mpam_vmsc_destroy(vmsc);
+ return err;
+ }
+
+ ris->ris_idx = ris_idx;
+ INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+ ris->vmsc = vmsc;
+
+ cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+ cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+ list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+
+ return 0;
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id, int component_id)
+{
+ int err;
+
+ mutex_lock(&mpam_list_lock);
+ err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+ component_id, GFP_KERNEL);
+ mutex_unlock(&mpam_list_lock);
+ if (err)
+ mpam_free_garbage();
+
+ return err;
+}
+
static void mpam_discovery_complete(void)
{
pr_err("Discovered all MSC\n");
@@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
cpumask_copy(&msc->accessibility, cpu_possible_mask);
err = 0;
} else {
- if (of_device_is_compatible(parent, "memory")) {
+ if (of_device_is_compatible(parent, "cache")) {
+ err = get_cpumask_from_cache(parent,
+ &msc->accessibility);
+ } else if (of_device_is_compatible(parent, "memory")) {
cpumask_copy(&msc->accessibility, cpu_possible_mask);
err = 0;
} else {
@@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
mutex_lock(&mpam_list_lock);
mpam_num_msc--;
- platform_set_drvdata(pdev, NULL);
- list_del_rcu(&msc->glbl_list);
- synchronize_srcu(&mpam_srcu);
- devm_kfree(&pdev->dev, msc);
+ mpam_msc_destroy(msc);
mutex_unlock(&mpam_list_lock);
+
+ mpam_free_garbage();
}
static int mpam_msc_drv_probe(struct platform_device *pdev)
@@ -230,6 +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
err = -ENOMEM;
break;
}
+ init_garbage(msc);
mutex_init(&msc->probe_lock);
mutex_init(&msc->part_sel_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 07e0f240eaca..d49bb884b433 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,10 +7,27 @@
#include <linux/arm_mpam.h>
#include <linux/cpumask.h>
#include <linux/io.h>
+#include <linux/llist.h>
#include <linux/mailbox_client.h>
#include <linux/mutex.h>
#include <linux/resctrl.h>
#include <linux/sizes.h>
+#include <linux/srcu.h>
+
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be gargbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+ /* member of mpam_garbage */
+ struct llist_node llist;
+
+ void *to_free;
+ struct platform_device *pdev;
+};
struct mpam_msc {
/* member of mpam_all_msc */
@@ -57,6 +74,80 @@ struct mpam_msc {
void __iomem *mapped_hwpage;
size_t mapped_hwpage_sz;
+
+ struct mpam_garbage garbage;
+};
+
+struct mpam_class {
+ /* mpam_components in this class */
+ struct list_head components;
+
+ cpumask_t affinity;
+
+ u8 level;
+ enum mpam_class_types type;
+
+ /* member of mpam_classes */
+ struct list_head classes_list;
+
+ struct mpam_garbage garbage;
+};
+
+struct mpam_component {
+ u32 comp_id;
+
+ /* mpam_vmsc in this component */
+ struct list_head vmsc;
+
+ cpumask_t affinity;
+
+ /* member of mpam_class:components */
+ struct list_head class_list;
+
+ /* parent: */
+ struct mpam_class *class;
+
+ struct mpam_garbage garbage;
};
+struct mpam_vmsc {
+ /* member of mpam_component:vmsc_list */
+ struct list_head comp_list;
+
+ /* mpam_msc_ris in this vmsc */
+ struct list_head ris;
+
+ /* All RIS in this vMSC are members of this MSC */
+ struct mpam_msc *msc;
+
+ /* parent: */
+ struct mpam_component *comp;
+
+ struct mpam_garbage garbage;
+};
+
+struct mpam_msc_ris {
+ u8 ris_idx;
+
+ cpumask_t affinity;
+
+ /* member of mpam_vmsc:ris */
+ struct list_head vmsc_list;
+
+ /* member of mpam_msc:ris */
+ struct list_head msc_list;
+
+ /* parent: */
+ struct mpam_vmsc *vmsc;
+
+ struct mpam_garbage garbage;
+};
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+ cpumask_t *affinity);
+
#endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 0edefa6ba019..406a77be68cb 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
#endif
-static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
- enum mpam_class_types type, u8 class_id,
- int component_id)
-{
- return -EINVAL;
-}
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id, int component_id);
#endif /* __LINUX_ARM_MPAM_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (11 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-29 8:42 ` Ben Horgan
2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
` (54 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.
Add the definitions for these registers as offset within the page(s).
Link: https://developer.arm.com/documentation/ihi0099/latest/
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
* Whitepsace churn.
* Cite a more recent document.
* Removed some stale feature, fixed some names etc.
---
drivers/resctrl/mpam_internal.h | 266 ++++++++++++++++++++++++++++++++
1 file changed, 266 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d49bb884b433..6e0982a1a9ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/latest/
+ */
+#define MPAM_ARCHITECTURE_V1 0x10
+
+/* Memory mapped control pages: */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR 0x0000 /* features id register */
+#define MPAMF_MSMON_IDR 0x0080 /* performance monitoring features */
+#define MPAMF_IMPL_IDR 0x0028 /* imp-def partitioning */
+#define MPAMF_CPOR_IDR 0x0030 /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR 0x0038 /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR 0x0040 /* mem-bw partitioning */
+#define MPAMF_PRI_IDR 0x0048 /* priority partitioning */
+#define MPAMF_CSUMON_IDR 0x0088 /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR 0x0090 /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR 0x0050 /* partid-narrowing */
+#define MPAMF_IIDR 0x0018 /* implementer id register */
+#define MPAMF_AIDR 0x0020 /* architectural id register */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL 0x0100 /* partid to configure: */
+#define MPAMCFG_CPBM 0x1000 /* cache-portion config */
+#define MPAMCFG_CMAX 0x0108 /* cache-capacity config */
+#define MPAMCFG_CMIN 0x0110 /* cache-capacity config */
+#define MPAMCFG_MBW_MIN 0x0200 /* min mem-bw config */
+#define MPAMCFG_MBW_MAX 0x0208 /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD 0x0220 /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM 0x2000 /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI 0x0400 /* priority partitioning config */
+#define MPAMCFG_MBW_PROP 0x0500 /* mem-bw stride config */
+#define MPAMCFG_INTPARTID 0x0600 /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL 0x0800 /* monitor selector */
+#define MSMON_CFG_CSU_FLT 0x0810 /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL 0x0818 /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT 0x0820 /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL 0x0828 /* mem-bw monitor config */
+#define MSMON_CSU 0x0840 /* current cache-usage */
+#define MSMON_CSU_CAPTURE 0x0848 /* last cache-usage value captured */
+#define MSMON_MBWU 0x0860 /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE 0x0868 /* last mem-bw value captured */
+#define MSMON_MBWU_L 0x0880 /* current long mem-bw usage value */
+#define MSMON_MBWU_CAPTURE_L 0x0890 /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT 0x0808 /* signal a capture event */
+#define MPAMF_ESR 0x00F8 /* error status register */
+#define MPAMF_ECR 0x00F0 /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART BIT(27)
+#define MPAMF_IDR_EXT BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR BIT(29)
+#define MPAMF_IDR_HAS_MSMON BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW BIT(31)
+#define MPAMF_IDR_HAS_RIS BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR BIT(38)
+#define MPAMF_IDR_HAS_ESR BIT(39)
+#define MPAMF_IDR_RIS_MAX GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP BIT(13)
+#define MPAMF_MBW_IDR_WINDWR BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_PRODUCTID GENMASK(31, 20)
+#define MPAMF_IIDR_PRODUCTID_SHIFT 20
+#define MPAMF_IIDR_VARIANT GENMASK(19, 16)
+#define MPAMF_IIDR_VARIANT_SHIFT 16
+#define MPAMF_IIDR_REVISON GENMASK(15, 12)
+#define MPAMF_IIDR_REVISON_SHIFT 12
+#define MPAMF_IIDR_IMPLEMENTER GENMASK(11, 0)
+#define MPAMF_IIDR_IMPLEMENTER_SHIFT 0
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MAJOR_REV GENMASK(7, 4)
+#define MPAMF_AIDR_ARCH_MINOR_REV GENMASK(3, 0)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL BIT(16)
+#define MPAMCFG_PART_SEL_RIS GENMASK(27, 24)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM BIT(31)
+#define MPAMCFG_CMAX_CMAX GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ * register
+ */
+#define MPAMCFG_MBW_MIN_MIN GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ * register
+ */
+#define MPAMCFG_MBW_MAX_MAX GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ * register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ * configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1 GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON GENMASK(15, 0)
+#define MPAMF_ESR_PMG GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR BIT(31)
+#define MPAMF_ESR_RIS GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE 0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE 1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE 2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE 3
+#define MPAM_ERRCODE_REQ_PMG_RANGE 4
+#define MPAM_ERRCODE_MONITOR_RANGE 5
+#define MPAM_ERRCODE_INTPARTID_RANGE 6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL 7
+
+/*
+ * MSMON_CFG_CSU_FLT - Memory system performance monitor configure cache storage
+ * usage monitor filter register
+ */
+#define MSMON_CFG_CSU_FLT_PARTID GENMASK(15, 0)
+#define MSMON_CFG_CSU_FLT_PMG GENMASK(23, 16)
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ * usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ * bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG BIT(17)
+#define MSMON_CFG_x_CTL_SCLEN BIT(19)
+#define MSMON_CFG_x_CTL_SUBTYPE GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU 0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU 0
+
+/*
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ * bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_MBWU_FLT_PARTID GENMASK(15, 0)
+#define MSMON_CFG_MBWU_FLT_PMG GENMASK(23, 16)
+#define MSMON_CFG_MBWU_FLT_RWBW GENMASK(31, 30)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ * register
+ * MSMON_CSU_CAPTURE - Memory system performance monitor cache storage usage
+ * capture register
+ * MSMON_MBWU - Memory system performance monitor memory bandwidth usage
+ * monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ * capture register
+ */
+#define MSMON___VALUE GENMASK(30, 0)
+#define MSMON___NRDY BIT(31)
+#define MSMON___NRDY_L BIT(63)
+#define MSMON___L_VALUE GENMASK(43, 0)
+#define MSMON___LWD_VALUE GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ * generation register
+ */
+#define MSMON_CAPT_EVNT_NOW BIT(0)
+
#endif /* MPAM_INTERNAL_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (12 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-27 16:08 ` Rob Herring
2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
` (53 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen
Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.
Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed.
Later once MPAM is enabled, this cpuhp callback will be replaced by
one that avoids the global list.
Enabling a static key will also take the cpuhp lock, so can't be done
from the cpuhp callback. Whenever a new MSC has been probed schedule
work to test if all the MSCs have now been probed.
CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 144 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 8 +-
2 files changed, 147 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 5baf2a8786fb..9d6516f98acf 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,6 +4,7 @@
#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
#include <linux/acpi.h>
+#include <linux/atomic.h>
#include <linux/arm_mpam.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
@@ -21,6 +22,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/types.h>
+#include <linux/workqueue.h>
#include <acpi/pcc.h>
@@ -39,6 +41,16 @@ struct srcu_struct mpam_srcu;
/* MPAM isn't available until all the MSC have been probed. */
static u32 mpam_num_msc;
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
/*
* An MSC is a physical container for controls and monitors, each identified by
* their RIS index. These share a base-address, interrupts and some MMIO
@@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
/* List of all objects that can be free()d after synchronise_srcu() */
static LLIST_HEAD(mpam_garbage);
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+ WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+ lockdep_assert_held_once(&msc->part_sel_lock);
+ return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
+
#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
static struct mpam_vmsc *
@@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
return err;
}
-static void mpam_discovery_complete(void)
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+ u64 idr;
+ int err;
+
+ lockdep_assert_held(&msc->probe_lock);
+
+ mutex_lock(&msc->part_sel_lock);
+ idr = mpam_read_partsel_reg(msc, AIDR);
+ if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+ pr_err_once("%s does not match MPAM architecture v1.x\n",
+ dev_name(&msc->pdev->dev));
+ err = -EIO;
+ } else {
+ msc->probed = true;
+ err = 0;
+ }
+ mutex_unlock(&msc->part_sel_lock);
+
+ return err;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
{
- pr_err("Discovered all MSC\n");
+ return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+ int err = 0;
+ struct mpam_msc *msc;
+ bool new_device_probed = false;
+
+ mutex_lock(&mpam_list_lock);
+ list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+ if (!cpumask_test_cpu(cpu, &msc->accessibility))
+ continue;
+
+ mutex_lock(&msc->probe_lock);
+ if (!msc->probed)
+ err = mpam_msc_hw_probe(msc);
+ mutex_unlock(&msc->probe_lock);
+
+ if (!err)
+ new_device_probed = true;
+ else
+ break; // mpam_broken
+ }
+ mutex_unlock(&mpam_list_lock);
+
+ if (new_device_probed && !err)
+ schedule_work(&mpam_enable_work);
+
+ return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+ return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+ int (*offline)(unsigned int offline))
+{
+ mutex_lock(&mpam_cpuhp_state_lock);
+ if (mpam_cpuhp_state) {
+ cpuhp_remove_state(mpam_cpuhp_state);
+ mpam_cpuhp_state = 0;
+ }
+
+ mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
+ online, offline);
+ if (mpam_cpuhp_state <= 0) {
+ pr_err("Failed to register cpuhp callbacks");
+ mpam_cpuhp_state = 0;
+ }
+ mutex_unlock(&mpam_cpuhp_state_lock);
}
static int mpam_dt_count_msc(void)
@@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
}
if (!err && fw_num_msc == mpam_num_msc)
- mpam_discovery_complete();
+ mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
if (err && msc)
mpam_msc_drv_remove(pdev);
@@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
.remove = mpam_msc_drv_remove,
};
+static void mpam_enable_once(void)
+{
+ mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
+
+ pr_info("MPAM enabled\n");
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+ static atomic_t once;
+ struct mpam_msc *msc;
+ bool all_devices_probed = true;
+
+ /* Have we probed all the hw devices? */
+ mutex_lock(&mpam_list_lock);
+ list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+ mutex_lock(&msc->probe_lock);
+ if (!msc->probed)
+ all_devices_probed = false;
+ mutex_unlock(&msc->probe_lock);
+
+ if (!all_devices_probed)
+ break;
+ }
+ mutex_unlock(&mpam_list_lock);
+
+ if (all_devices_probed && !atomic_fetch_inc(&once))
+ mpam_enable_once();
+}
+
/*
* MSC that are hidden under caches are not created as platform devices
* as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6e0982a1a9ac..a98cca08a2ef 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -49,6 +49,7 @@ struct mpam_msc {
* properties become read-only and the lists are protected by SRCU.
*/
struct mutex probe_lock;
+ bool probed;
unsigned long ris_idxs[128 / BITS_PER_LONG];
u32 ris_max;
@@ -59,14 +60,14 @@ struct mpam_msc {
* part_sel_lock protects access to the MSC hardware registers that are
* affected by MPAMCFG_PART_SEL. (including the ID registers that vary
* by RIS).
- * If needed, take msc->lock first.
+ * If needed, take msc->probe_lock first.
*/
struct mutex part_sel_lock;
/*
* mon_sel_lock protects access to the MSC hardware registers that are
* affeted by MPAMCFG_MON_SEL.
- * If needed, take msc->lock first.
+ * If needed, take msc->probe_lock first.
*/
struct mutex outer_mon_sel_lock;
raw_spinlock_t inner_mon_sel_lock;
@@ -147,6 +148,9 @@ struct mpam_msc_ris {
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (13 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-28 13:12 ` Ben Horgan
2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
` (52 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
While doing this, RIS entries that firmware didn't describe are create
under MPAM_CLASS_UNKNOWN.
While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 158 ++++++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 6 ++
include/linux/arm_mpam.h | 14 +++
3 files changed, 171 insertions(+), 7 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9d6516f98acf..012e09e80300 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
#include <linux/acpi.h>
#include <linux/atomic.h>
#include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
@@ -44,6 +45,15 @@ static u32 mpam_num_msc;
static int mpam_cpuhp_state;
static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
/*
* mpam is enabled once all devices have been probed from CPU online callbacks,
* scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
#define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+ WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+ lockdep_assert_held_once(&msc->part_sel_lock);
+ __mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_partsel_reg(msc, reg, val) _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+ u64 idr_high = 0, idr_low;
+
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ idr_low = mpam_read_partsel_reg(msc, IDR);
+ if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+ idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+ return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+ u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+ FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+ __mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+ int err = 0;
+
+ lockdep_assert_irqs_enabled();
+
+ spin_lock(&partid_max_lock);
+ if (!partid_max_init) {
+ mpam_partid_max = partid_max;
+ mpam_pmg_max = pmg_max;
+ partid_max_init = true;
+ } else if (!partid_max_published) {
+ mpam_partid_max = min(mpam_partid_max, partid_max);
+ mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+ } else {
+ /* New requestors can't lower the values */
+ if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+ err = -EBUSY;
+ }
+ spin_unlock(&partid_max_lock);
+
+ return err;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
static struct mpam_vmsc *
@@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+ list_add_rcu(&ris->msc_list, &msc->ris);
return 0;
}
@@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
return err;
}
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+ u8 ris_idx)
+{
+ int err;
+ struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ if (!test_bit(ris_idx, msc->ris_idxs)) {
+ err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+ 0, 0, GFP_ATOMIC);
+ if (err)
+ return ERR_PTR(err);
+ }
+
+ list_for_each_entry(ris, &msc->ris, msc_list) {
+ if (ris->ris_idx == ris_idx) {
+ found = ris;
+ break;
+ }
+ }
+
+ return found;
+}
+
static int mpam_msc_hw_probe(struct mpam_msc *msc)
{
u64 idr;
- int err;
+ u16 partid_max;
+ u8 ris_idx, pmg_max;
+ struct mpam_msc_ris *ris;
lockdep_assert_held(&msc->probe_lock);
@@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
pr_err_once("%s does not match MPAM architecture v1.x\n",
dev_name(&msc->pdev->dev));
- err = -EIO;
- } else {
- msc->probed = true;
- err = 0;
+ mutex_unlock(&msc->part_sel_lock);
+ return -EIO;
}
+
+ idr = mpam_msc_read_idr(msc);
mutex_unlock(&msc->part_sel_lock);
+ msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+ /* Use these values so partid/pmg always starts with a valid value */
+ msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+ msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+ for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+ mutex_lock(&msc->part_sel_lock);
+ __mpam_part_sel(ris_idx, 0, msc);
+ idr = mpam_msc_read_idr(msc);
+ mutex_unlock(&msc->part_sel_lock);
+
+ partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+ pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+ msc->partid_max = min(msc->partid_max, partid_max);
+ msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+ ris = mpam_get_or_create_ris(msc, ris_idx);
+ if (IS_ERR(ris))
+ return PTR_ERR(ris);
+ }
- return err;
+ spin_lock(&partid_max_lock);
+ mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+ mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+ spin_unlock(&partid_max_lock);
+
+ msc->probed = true;
+
+ return 0;
}
static int mpam_cpu_online(unsigned int cpu)
@@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
static void mpam_enable_once(void)
{
+ /*
+ * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+ * longer change.
+ */
+ spin_lock(&partid_max_lock);
+ partid_max_published = true;
+ spin_unlock(&partid_max_lock);
+
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
- pr_info("MPAM enabled\n");
+ printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
+ mpam_partid_max + 1, mpam_pmg_max + 1);
}
/*
@@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
return platform_driver_register(&mpam_msc_driver);
}
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a98cca08a2ef..a623f405ddd8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -50,6 +50,8 @@ struct mpam_msc {
*/
struct mutex probe_lock;
bool probed;
+ u16 partid_max;
+ u8 pmg_max;
unsigned long ris_idxs[128 / BITS_PER_LONG];
u32 ris_max;
@@ -148,6 +150,10 @@ struct mpam_msc_ris {
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
/* Scheduled work callback to enable mpam once all MSC have been probed */
void mpam_enable(struct work_struct *work);
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 406a77be68cb..8af93794c7a2 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
enum mpam_class_types type, u8 class_id, int component_id);
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max: The maximum PARTID value the requestor can generate.
+ * @pmg_max: The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
#endif /* __LINUX_ARM_MPAM_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (14 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-28 17:07 ` Fenghua Yu
2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
` (51 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MSC MON_SEL register needs to be accessed from hardirq context by the
PMU drivers, making an irqsave spinlock the obvious lock to protect these
registers. On systems with SCMI mailboxes it must be able to sleep, meaning
a mutex must be used.
Clearly these two can't exist at the same time.
Add helpers for the MON_SEL locking. The outer lock must be taken in a
pre-emptible context before the inner lock can be taken. On systems with
SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
will fail to be 'taken' if the caller is unable to sleep. This will allow
the PMU driver to fail without having to check the interface type of
each MSC.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a623f405ddd8..c6f087f9fa7d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -68,10 +68,19 @@ struct mpam_msc {
/*
* mon_sel_lock protects access to the MSC hardware registers that are
- * affeted by MPAMCFG_MON_SEL.
+ * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+ * Both the 'inner' and 'outer' must be taken.
+ * For real MMIO MSC, the outer lock is unnecessary - but keeps the
+ * code common with:
+ * Firmware backed MSC need to sleep when accessing the MSC, which
+ * means some code-paths will always fail. For these MSC the outer
+ * lock is providing the protection, and the inner lock fails to
+ * be taken if the task is unable to sleep.
+ *
* If needed, take msc->probe_lock first.
*/
struct mutex outer_mon_sel_lock;
+ bool outer_lock_held;
raw_spinlock_t inner_mon_sel_lock;
unsigned long inner_mon_sel_flags;
@@ -81,6 +90,52 @@ struct mpam_msc {
struct mpam_garbage garbage;
};
+static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
+{
+ /*
+ * The outer lock may be taken by a CPU that then issues an IPI to run
+ * a helper that takes the inner lock. lockdep can't help us here.
+ */
+ WARN_ON_ONCE(!msc->outer_lock_held);
+
+ if (msc->iface == MPAM_IFACE_MMIO) {
+ raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+ return true;
+ }
+
+ /* Accesses must fail if we are not pre-emptible */
+ return !!preemptible();
+}
+
+static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
+{
+ WARN_ON_ONCE(!msc->outer_lock_held);
+
+ if (msc->iface == MPAM_IFACE_MMIO)
+ raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
+{
+ mutex_lock(&msc->outer_mon_sel_lock);
+ msc->outer_lock_held = true;
+}
+
+static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
+{
+ msc->outer_lock_held = false;
+ mutex_unlock(&msc->outer_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+ WARN_ON_ONCE(!msc->outer_lock_held);
+ if (msc->iface == MPAM_IFACE_MMIO)
+ lockdep_assert_held_once(&msc->inner_mon_sel_lock);
+ else
+ lockdep_assert_preemption_enabled();
+}
+
struct mpam_class {
/* mpam_components in this class */
struct list_head components;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (15 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-28 13:44 ` Ben Horgan
2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
` (50 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
Expand the probing support with the control and monitor types
we can use with resctrl.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Made mpam_ris_hw_probe_hw_nrdy() more in C.
* Added static assert on features bitmap size.
---
drivers/resctrl/mpam_devices.c | 156 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 54 +++++++++++
2 files changed, 209 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 012e09e80300..290a04f8654f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
{
- WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
return readl_relaxed(msc->mapped_hwpage + reg);
@@ -131,6 +131,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
}
#define mpam_write_partsel_reg(msc, reg, val) _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+ mpam_mon_sel_lock_held(msc);
+ return __mpam_read_reg(msc, reg);
+}
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+ mpam_mon_sel_lock_held(msc);
+ __mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_monsel_reg(msc, reg, val) _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
static u64 mpam_msc_read_idr(struct mpam_msc *msc)
{
u64 idr_high = 0, idr_low;
@@ -643,6 +657,139 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
return found;
}
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
+{
+ u32 now;
+ u64 mon_sel;
+ bool can_set, can_clear;
+ struct mpam_msc *msc = ris->vmsc->msc;
+
+ if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+ return false;
+
+ mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+ FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+ _mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+ _mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+ now = _mpam_read_monsel_reg(msc, mon_reg);
+ can_set = now & MSMON___NRDY;
+
+ _mpam_write_monsel_reg(msc, mon_reg, 0);
+ now = _mpam_read_monsel_reg(msc, mon_reg);
+ can_clear = !(now & MSMON___NRDY);
+ mpam_mon_sel_inner_unlock(msc);
+
+ return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg) \
+ _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+ int err;
+ struct mpam_msc *msc = ris->vmsc->msc;
+ struct mpam_props *props = &ris->props;
+
+ lockdep_assert_held(&msc->probe_lock);
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ /* Cache Portion partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+ u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+ props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+ if (props->cpbm_wd)
+ mpam_set_feature(mpam_feat_cpor_part, props);
+ }
+
+ /* Memory bandwidth partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+ u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+ /* portion bitmap resolution */
+ props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+ if (props->mbw_pbm_bits &&
+ FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_part, props);
+
+ props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+ if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_max, props);
+ }
+
+ /* Performance Monitoring */
+ if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+ u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+ /*
+ * If the firmware max-nrdy-us property is missing, the
+ * CSU counters can't be used. Should we wait forever?
+ */
+ err = device_property_read_u32(&msc->pdev->dev,
+ "arm,not-ready-us",
+ &msc->nrdy_usec);
+
+ if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+ u32 csumonidr;
+
+ csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+ props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+ if (props->num_csu_mon) {
+ bool hw_managed;
+
+ mpam_set_feature(mpam_feat_msmon_csu, props);
+
+ /* Is NRDY hardware managed? */
+ mpam_mon_sel_outer_lock(msc);
+ hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+ mpam_mon_sel_outer_unlock(msc);
+ if (hw_managed)
+ mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+ }
+
+ /*
+ * Accept the missing firmware property if NRDY appears
+ * un-implemented.
+ */
+ if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+ pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
+ }
+ if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+ bool hw_managed;
+ u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+ props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
+ if (props->num_mbwu_mon)
+ mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+ if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
+ mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
+ /* Is NRDY hardware managed? */
+ mpam_mon_sel_outer_lock(msc);
+ hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+ mpam_mon_sel_outer_unlock(msc);
+ if (hw_managed)
+ mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+ /*
+ * Don't warn about any missing firmware property for
+ * MBWU NRDY - it doesn't make any sense!
+ */
+ }
+ }
+}
+
static int mpam_msc_hw_probe(struct mpam_msc *msc)
{
u64 idr;
@@ -663,6 +810,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
idr = mpam_msc_read_idr(msc);
mutex_unlock(&msc->part_sel_lock);
+
msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
/* Use these values so partid/pmg always starts with a valid value */
@@ -683,6 +831,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
ris = mpam_get_or_create_ris(msc, ris_idx);
if (IS_ERR(ris))
return PTR_ERR(ris);
+ ris->idr = idr;
+
+ mutex_lock(&msc->part_sel_lock);
+ __mpam_part_sel(ris_idx, 0, msc);
+ mpam_ris_hw_probe(ris);
+ mutex_unlock(&msc->part_sel_lock);
}
spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c6f087f9fa7d..9f6cd4a68cce 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
lockdep_assert_preemption_enabled();
}
+/*
+ * When we compact the supported features, we don't care what they are.
+ * Storing them as a bitmap makes life easy.
+ */
+typedef u16 mpam_features_t;
+
+/* Bits for mpam_features_t */
+enum mpam_device_features {
+ mpam_feat_ccap_part = 0,
+ mpam_feat_cpor_part,
+ mpam_feat_mbw_part,
+ mpam_feat_mbw_min,
+ mpam_feat_mbw_max,
+ mpam_feat_mbw_prop,
+ mpam_feat_msmon,
+ mpam_feat_msmon_csu,
+ mpam_feat_msmon_csu_capture,
+ mpam_feat_msmon_csu_hw_nrdy,
+ mpam_feat_msmon_mbwu,
+ mpam_feat_msmon_mbwu_capture,
+ mpam_feat_msmon_mbwu_rwbw,
+ mpam_feat_msmon_mbwu_hw_nrdy,
+ mpam_feat_msmon_capt,
+ MPAM_FEATURE_LAST,
+};
+static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
+#define MPAM_ALL_FEATURES ((1 << MPAM_FEATURE_LAST) - 1)
+
+struct mpam_props {
+ mpam_features_t features;
+
+ u16 cpbm_wd;
+ u16 mbw_pbm_bits;
+ u16 bwa_wd;
+ u16 num_csu_mon;
+ u16 num_mbwu_mon;
+};
+
+static inline bool mpam_has_feature(enum mpam_device_features feat,
+ struct mpam_props *props)
+{
+ return (1 << feat) & props->features;
+}
+
+static inline void mpam_set_feature(enum mpam_device_features feat,
+ struct mpam_props *props)
+{
+ props->features |= (1 << feat);
+}
+
struct mpam_class {
/* mpam_components in this class */
struct list_head components;
@@ -175,6 +225,8 @@ struct mpam_vmsc {
/* mpam_msc_ris in this vmsc */
struct list_head ris;
+ struct mpam_props props;
+
/* All RIS in this vMSC are members of this MSC */
struct mpam_msc *msc;
@@ -186,6 +238,8 @@ struct mpam_vmsc {
struct mpam_msc_ris {
u8 ris_idx;
+ u64 idr;
+ struct mpam_props props;
cpumask_t affinity;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (16 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-08-22 15:29 ` James Morse
2025-08-29 13:54 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
` (49 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.
Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.
If bitmap properties are mismatched within a component we
cannot support the mismatched feature.
Care has to be taken as vMSC may hold mismatched RIS.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 215 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 8 ++
2 files changed, 223 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 290a04f8654f..bb62de6d3847 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1186,8 +1186,223 @@ static struct platform_driver mpam_msc_driver = {
.remove = mpam_msc_drv_remove,
};
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+ if (mpam_has_feature(mpam_feat_mbw_min, props))
+ return true;
+ if (mpam_has_feature(mpam_feat_mbw_max, props))
+ return true;
+ if (mpam_has_feature(mpam_feat_mbw_prop, props))
+ return true;
+ return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias) \
+ helper(parent) && \
+ ((helper(child) && (parent)->field != (child)->field) || \
+ (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias) \
+ mpam_has_feature((feat), (parent)) && \
+ ((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+ (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias) \
+ (alias) && !mpam_has_feature((feat), (parent)) && \
+ mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+ struct mpam_props *child, bool alias)
+{
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+ parent->cpbm_wd = child->cpbm_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+ cpbm_wd, alias)) {
+ pr_debug("%s cleared cpor_part\n", __func__);
+ mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
+ parent->cpbm_wd = 0;
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+ parent->mbw_pbm_bits = child->mbw_pbm_bits;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+ mbw_pbm_bits, alias)) {
+ pr_debug("%s cleared mbw_part\n", __func__);
+ mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
+ parent->mbw_pbm_bits = 0;
+ }
+
+ /* bwa_wd is a count of bits, fewer bits means less precision */
+ if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {
+ parent->bwa_wd = child->bwa_wd;
+ } else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+ bwa_wd, alias)) {
+ pr_debug("%s took the min bwa_wd\n", __func__);
+ parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+ }
+
+ /* For num properties, take the minimum */
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+ parent->num_csu_mon = child->num_csu_mon;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+ num_csu_mon, alias)) {
+ pr_debug("%s took the min num_csu_mon\n", __func__);
+ parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+ parent->num_mbwu_mon = child->num_mbwu_mon;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+ num_mbwu_mon, alias)) {
+ pr_debug("%s took the min num_mbwu_mon\n", __func__);
+ parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
+ }
+
+ if (alias) {
+ /* Merge features for aliased resources */
+ parent->features |= child->features;
+ } else {
+ /* Clear missing features for non aliasing */
+ parent->features &= child->features;
+ }
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+ struct mpam_props *cprops = &class->props;
+ struct mpam_props *vprops = &vmsc->props;
+
+ lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+ pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
+ dev_name(&vmsc->msc->pdev->dev),
+ (long)cprops->features, (long)vprops->features);
+
+ /* Take the safe value for any common features */
+ __props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+ struct mpam_props *rprops = &ris->props;
+ struct mpam_props *vprops = &vmsc->props;
+
+ lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+ pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+ dev_name(&vmsc->msc->pdev->dev),
+ (long)vprops->features, (long)rprops->features);
+
+ /*
+ * Merge mismatched features - Copy any features that aren't common,
+ * but take the safe value for any common features.
+ */
+ __props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+ struct mpam_vmsc *vmsc;
+ struct mpam_component *comp;
+
+ comp = list_first_entry_or_null(&class->components,
+ struct mpam_component, class_list);
+ if (WARN_ON(!comp))
+ return;
+
+ vmsc = list_first_entry_or_null(&comp->vmsc,
+ struct mpam_vmsc, comp_list);
+ if (WARN_ON(!vmsc))
+ return;
+
+ class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+ struct mpam_class *class = comp->class;
+
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+ __vmsc_props_mismatch(vmsc, ris);
+ class->nrdy_usec = max(class->nrdy_usec,
+ vmsc->msc->nrdy_usec);
+ }
+ }
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+ struct mpam_vmsc *vmsc;
+ struct mpam_class *class = comp->class;
+
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+ __class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together, this must be done first.
+ * Next the class features are the bitwise-and of all the vmsc features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+ struct mpam_class *class;
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(class, all_classes_list, classes_list) {
+ list_for_each_entry(comp, &class->components, class_list)
+ mpam_enable_merge_vmsc_features(comp);
+
+ mpam_enable_init_class_features(class);
+
+ list_for_each_entry(comp, &class->components, class_list)
+ mpam_enable_merge_class_features(comp);
+ }
+}
+
static void mpam_enable_once(void)
{
+ mutex_lock(&mpam_list_lock);
+ mpam_enable_merge_features(&mpam_classes);
+ mutex_unlock(&mpam_list_lock);
+
/*
* Once the cpuhp callbacks have been changed, mpam_partid_max can no
* longer change.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f6cd4a68cce..a2b0ff411138 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -186,12 +186,20 @@ static inline void mpam_set_feature(enum mpam_device_features feat,
props->features |= (1 << feat);
}
+static inline void mpam_clear_feature(enum mpam_device_features feat,
+ mpam_features_t *supported)
+{
+ *supported &= ~(1 << feat);
+}
+
struct mpam_class {
/* mpam_components in this class */
struct list_head components;
cpumask_t affinity;
+ struct mpam_props props;
+ u32 nrdy_usec;
u8 level;
enum mpam_class_types type;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (17 preceding siblings ...)
2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-27 16:19 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
` (48 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtyied. e.g. Kexec.
Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.
MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.
If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.
To reset, write the maximum values for all discovered controls.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Last bitmap write will always be non-zero.
* Dropped READ_ONCE() - teh value can no longer change.
---
drivers/resctrl/mpam_devices.c | 121 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 8 +++
2 files changed, 129 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bb62de6d3847..c1f01dd748ad 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
#include <linux/atomic.h>
#include <linux/arm_mpam.h>
#include <linux/bitfield.h>
+#include <linux/bitmap.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
@@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
return 0;
}
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+ u32 num_words, msb;
+ u32 bm = ~0;
+ int i;
+
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ if (wd == 0)
+ return;
+
+ /*
+ * Write all ~0 to all but the last 32bit-word, which may
+ * have fewer bits...
+ */
+ num_words = DIV_ROUND_UP(wd, 32);
+ for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+ __mpam_write_reg(msc, reg, bm);
+
+ /*
+ * ....and then the last (maybe) partial 32bit word. When wd is a
+ * multiple of 32, msb should be 31 to write a full 32bit word.
+ */
+ msb = (wd - 1) % 32;
+ bm = GENMASK(msb, 0);
+ __mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+ u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
+ struct mpam_msc *msc = ris->vmsc->msc;
+ struct mpam_props *rprops = &ris->props;
+
+ mpam_assert_srcu_read_lock_held();
+
+ mutex_lock(&msc->part_sel_lock);
+ __mpam_part_sel(ris->ris_idx, partid, msc);
+
+ if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+ mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+ if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+ mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+ if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+ mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+ if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+ mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+
+ if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
+ mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+ mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+ u16 partid, partid_max;
+
+ mpam_assert_srcu_read_lock_held();
+
+ if (ris->in_reset_state)
+ return;
+
+ spin_lock(&partid_max_lock);
+ partid_max = mpam_partid_max;
+ spin_unlock(&partid_max_lock);
+ for (partid = 0; partid < partid_max; partid++)
+ mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+ int idx;
+ struct mpam_msc_ris *ris;
+
+ mpam_assert_srcu_read_lock_held();
+
+ mpam_mon_sel_outer_lock(msc);
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+ mpam_reset_ris(ris);
+
+ /*
+ * Set in_reset_state when coming online. The reset state
+ * for non-zero partid may be lost while the CPUs are offline.
+ */
+ ris->in_reset_state = online;
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+ mpam_mon_sel_outer_unlock(msc);
+}
+
static int mpam_cpu_online(unsigned int cpu)
{
+ int idx;
+ struct mpam_msc *msc;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ if (!cpumask_test_cpu(cpu, &msc->accessibility))
+ continue;
+
+ if (atomic_fetch_inc(&msc->online_refs) == 0)
+ mpam_reset_msc(msc, true);
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
return 0;
}
@@ -886,6 +994,19 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
static int mpam_cpu_offline(unsigned int cpu)
{
+ int idx;
+ struct mpam_msc *msc;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ if (!cpumask_test_cpu(cpu, &msc->accessibility))
+ continue;
+
+ if (atomic_dec_and_test(&msc->online_refs))
+ mpam_reset_msc(msc, false);
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
return 0;
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a2b0ff411138..466d670a01eb 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
#define MPAM_INTERNAL_H
#include <linux/arm_mpam.h>
+#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/io.h>
#include <linux/llist.h>
@@ -43,6 +44,7 @@ struct mpam_msc {
struct pcc_mbox_chan *pcc_chan;
u32 nrdy_usec;
cpumask_t accessibility;
+ atomic_t online_refs;
/*
* probe_lock is only take during discovery. After discovery these
@@ -248,6 +250,7 @@ struct mpam_msc_ris {
u8 ris_idx;
u64 idr;
struct mpam_props props;
+ bool in_reset_state;
cpumask_t affinity;
@@ -267,6 +270,11 @@ struct mpam_msc_ris {
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
+static inline void mpam_assert_srcu_read_lock_held(void)
+{
+ WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+}
+
/* System wide partid/pmg values */
extern u16 mpam_partid_max;
extern u8 mpam_pmg_max;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (18 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-28 16:13 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
` (47 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.
Add a helper that schedules the provided function if necessary.
Prevent the cpuhp callbacks from changing the MSC state by taking the
cpuhp lock.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c1f01dd748ad..759244966736 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -906,20 +906,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
mutex_unlock(&msc->part_sel_lock);
}
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
{
u16 partid, partid_max;
+ struct mpam_msc_ris *ris = arg;
mpam_assert_srcu_read_lock_held();
if (ris->in_reset_state)
- return;
+ return 0;
spin_lock(&partid_max_lock);
partid_max = mpam_partid_max;
spin_unlock(&partid_max_lock);
for (partid = 0; partid < partid_max; partid++)
mpam_reset_ris_partid(ris, partid);
+
+ return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+ int cpu = raw_smp_processor_id();
+
+ if (cpumask_test_cpu(cpu, &msc->accessibility))
+ return cpu;
+
+ return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+ lockdep_assert_irqs_enabled();
+ lockdep_assert_cpus_held();
+ mpam_assert_srcu_read_lock_held();
+
+ return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
}
static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -932,7 +963,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
mpam_mon_sel_outer_lock(msc);
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
- mpam_reset_ris(ris);
+ mpam_touch_msc(msc, &mpam_reset_ris, ris);
/*
* Set in_reset_state when coming online. The reset state
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (19 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 14:30 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
` (46 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.
Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 62 +++++++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 1 +
2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 759244966736..3516cbe8623e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -915,8 +915,6 @@ static int mpam_reset_ris(void *arg)
u16 partid, partid_max;
struct mpam_msc_ris *ris = arg;
- mpam_assert_srcu_read_lock_held();
-
if (ris->in_reset_state)
return 0;
@@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
mpam_partid_max + 1, mpam_pmg_max + 1);
}
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+ int idx;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ might_sleep();
+ lockdep_assert_cpus_held();
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ if (!ris->in_reset_state)
+ mpam_touch_msc(msc, mpam_reset_ris, ris);
+ ris->in_reset_state = true;
+ }
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+ int idx;
+ struct mpam_component *comp;
+
+ lockdep_assert_cpus_held();
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(comp, &class->components, class_list)
+ mpam_reset_component_locked(comp);
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+ cpus_read_lock();
+ mpam_reset_class_locked(class);
+ cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
+void mpam_disable(void)
+{
+ int idx;
+ struct mpam_class *class;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu))
+ mpam_reset_class(class);
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
/*
* Enable mpam once all devices have been probed.
* Scheduled by mpam_discovery_cpu_online() once all devices have been created.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 466d670a01eb..b30fee2b7674 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -281,6 +281,7 @@ extern u8 mpam_pmg_max;
/* Scheduled work callback to enable mpam once all MSC have been probed */
void mpam_enable(struct work_struct *work);
+void mpam_disable(void);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 22/33] arm_mpam: Register and enable IRQs
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (20 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
` (45 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.
Only the irq handler accesses the ESR register, so no locking is needed.
The work to disable MPAM after an error needs to happen at process
context, use a threaded interrupt.
There is no support for percpu threaded interrupts, for now schedule
the work to be done from the irq handler.
Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.
Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.
CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Use guard marco when walking srcu list.
* Use INTEN macro for enabling interrupts.
* Move partid_max_published up earlier in mpam_enable_once().
---
drivers/resctrl/mpam_devices.c | 311 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 9 +-
2 files changed, 312 insertions(+), 8 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3516cbe8623e..210d64fad0b1 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
#include <linux/device.h>
#include <linux/errno.h>
#include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
#include <linux/list.h>
#include <linux/lockdep.h>
#include <linux/mutex.h>
@@ -62,6 +65,12 @@ static DEFINE_SPINLOCK(partid_max_lock);
*/
static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
/*
* An MSC is a physical container for controls and monitors, each identified by
* their RIS index. These share a base-address, interrupts and some MMIO
@@ -159,6 +168,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
return (idr_high << 32) | idr_low;
}
+static void mpam_msc_zero_esr(struct mpam_msc *msc)
+{
+ __mpam_write_reg(msc, MPAMF_ESR, 0);
+ if (msc->has_extd_esr)
+ __mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+ u64 esr_high = 0, esr_low;
+
+ esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+ if (msc->has_extd_esr)
+ esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+ return (esr_high << 32) | esr_low;
+}
+
static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
{
lockdep_assert_held(&msc->part_sel_lock);
@@ -405,12 +432,12 @@ static void mpam_msc_destroy(struct mpam_msc *msc)
lockdep_assert_held(&mpam_list_lock);
- list_del_rcu(&msc->glbl_list);
- platform_set_drvdata(pdev, NULL);
-
list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
mpam_ris_destroy(ris);
+ list_del_rcu(&msc->glbl_list);
+ platform_set_drvdata(pdev, NULL);
+
add_to_garbage(msc);
msc->garbage.pdev = pdev;
}
@@ -828,6 +855,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
msc->partid_max = min(msc->partid_max, partid_max);
msc->pmg_max = min(msc->pmg_max, pmg_max);
+ msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
ris = mpam_get_or_create_ris(msc, ris_idx);
if (IS_ERR(ris))
@@ -840,6 +868,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
mutex_unlock(&msc->part_sel_lock);
}
+ /* Clear any stale errors */
+ mpam_msc_zero_esr(msc);
+
spin_lock(&partid_max_lock);
mpam_partid_max = min(mpam_partid_max, msc->partid_max);
mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -973,6 +1004,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
mpam_mon_sel_outer_unlock(msc);
}
+static void _enable_percpu_irq(void *_irq)
+{
+ int *irq = _irq;
+
+ enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
static int mpam_cpu_online(unsigned int cpu)
{
int idx;
@@ -983,6 +1021,9 @@ static int mpam_cpu_online(unsigned int cpu)
if (!cpumask_test_cpu(cpu, &msc->accessibility))
continue;
+ if (msc->reenable_error_ppi)
+ _enable_percpu_irq(&msc->reenable_error_ppi);
+
if (atomic_fetch_inc(&msc->online_refs) == 0)
mpam_reset_msc(msc, true);
}
@@ -1031,6 +1072,9 @@ static int mpam_cpu_offline(unsigned int cpu)
if (!cpumask_test_cpu(cpu, &msc->accessibility))
continue;
+ if (msc->reenable_error_ppi)
+ disable_percpu_irq(msc->reenable_error_ppi);
+
if (atomic_dec_and_test(&msc->online_refs))
mpam_reset_msc(msc, false);
}
@@ -1057,6 +1101,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
mutex_unlock(&mpam_cpuhp_state_lock);
}
+static int __setup_ppi(struct mpam_msc *msc)
+{
+ int cpu;
+
+ msc->error_dev_id = alloc_percpu_gfp(struct mpam_msc *, GFP_KERNEL);
+ if (!msc->error_dev_id)
+ return -ENOMEM;
+
+ for_each_cpu(cpu, &msc->accessibility) {
+ struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
+
+ if (empty) {
+ pr_err_once("%s shares PPI with %s!\n",
+ dev_name(&msc->pdev->dev),
+ dev_name(&empty->pdev->dev));
+ return -EBUSY;
+ }
+ *per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+ }
+
+ return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+ int irq;
+
+ irq = platform_get_irq_byname_optional(msc->pdev, "error");
+ if (irq <= 0)
+ return 0;
+
+ /* Allocate and initialise the percpu device pointer for PPI */
+ if (irq_is_percpu(irq))
+ return __setup_ppi(msc);
+
+ /* sanity check: shared interrupts can be routed anywhere? */
+ if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+ pr_err_once("msc:%u is a private resource with a shared error interrupt",
+ msc->id);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int mpam_dt_count_msc(void)
{
int count = 0;
@@ -1265,6 +1354,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
break;
}
+ err = mpam_msc_setup_error_irq(msc);
+ if (err)
+ break;
+
if (device_property_read_u32(&pdev->dev, "pcc-channel",
&msc->pcc_subspace_id))
msc->iface = MPAM_IFACE_MMIO;
@@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
}
}
+static char *mpam_errcode_names[16] = {
+ [0] = "No error",
+ [1] = "PARTID_SEL_Range",
+ [2] = "Req_PARTID_Range",
+ [3] = "MSMONCFG_ID_RANGE",
+ [4] = "Req_PMG_Range",
+ [5] = "Monitor_Range",
+ [6] = "intPARTID_Range",
+ [7] = "Unexpected_INTERNAL",
+ [8] = "Undefined_RIS_PART_SEL",
+ [9] = "RIS_No_Control",
+ [10] = "Undefined_RIS_MON_SEL",
+ [11] = "RIS_No_Monitor",
+ [12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+ struct mpam_msc *msc = _msc;
+
+ __mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+ return 0;
+}
+
+static int mpam_disable_msc_ecr(void *_msc)
+{
+ struct mpam_msc *msc = _msc;
+
+ __mpam_write_reg(msc, MPAMF_ECR, 0);
+
+ return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+ u64 reg;
+ u16 partid;
+ u8 errcode, pmg, ris;
+
+ if (WARN_ON_ONCE(!msc) ||
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+ &msc->accessibility)))
+ return IRQ_NONE;
+
+ reg = mpam_msc_read_esr(msc);
+
+ errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+ if (!errcode)
+ return IRQ_NONE;
+
+ /* Clear level triggered irq */
+ mpam_msc_zero_esr(msc);
+
+ partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+ pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+ ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+ pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+ msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
+
+ if (irq_is_percpu(irq)) {
+ mpam_disable_msc_ecr(msc);
+ schedule_work(&mpam_broken_work);
+ return IRQ_HANDLED;
+ }
+
+ return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+ struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+ return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+ struct mpam_msc *msc = dev_id;
+
+ return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id);
+
+static int mpam_register_irqs(void)
+{
+ int err, irq;
+ struct mpam_msc *msc;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ irq = platform_get_irq_byname_optional(msc->pdev, "error");
+ if (irq <= 0)
+ continue;
+
+ /* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+ /* We anticipate sharing the interrupt with other MSCs */
+ if (irq_is_percpu(irq)) {
+ err = request_percpu_irq(irq, &mpam_ppi_handler,
+ "mpam:msc:error",
+ msc->error_dev_id);
+ if (err)
+ return err;
+
+ msc->reenable_error_ppi = irq;
+ smp_call_function_many(&msc->accessibility,
+ &_enable_percpu_irq, &irq,
+ true);
+ } else {
+ err = devm_request_threaded_irq(&msc->pdev->dev, irq,
+ &mpam_spi_handler,
+ &mpam_disable_thread,
+ IRQF_SHARED,
+ "mpam:msc:error", msc);
+ if (err)
+ return err;
+ }
+
+ msc->error_irq_requested = true;
+ mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+ msc->error_irq_hw_enabled = true;
+ }
+
+ return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+ int irq, idx;
+ struct mpam_msc *msc;
+
+ cpus_read_lock();
+ /* take the lock as free_irq() can sleep */
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ irq = platform_get_irq_byname_optional(msc->pdev, "error");
+ if (irq <= 0)
+ continue;
+
+ if (msc->error_irq_hw_enabled) {
+ mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+ msc->error_irq_hw_enabled = false;
+ }
+
+ if (msc->error_irq_requested) {
+ if (irq_is_percpu(irq)) {
+ msc->reenable_error_ppi = 0;
+ free_percpu_irq(irq, msc->error_dev_id);
+ } else {
+ devm_free_irq(&msc->pdev->dev, irq, msc);
+ }
+ msc->error_irq_requested = false;
+ }
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+ cpus_read_unlock();
+}
+
static void mpam_enable_once(void)
{
- mutex_lock(&mpam_list_lock);
- mpam_enable_merge_features(&mpam_classes);
- mutex_unlock(&mpam_list_lock);
+ int err;
/*
* Once the cpuhp callbacks have been changed, mpam_partid_max can no
@@ -1561,6 +1814,27 @@ static void mpam_enable_once(void)
partid_max_published = true;
spin_unlock(&partid_max_lock);
+ /*
+ * If all the MSC have been probed, enabling the IRQs happens next.
+ * That involves cross-calling to a CPU that can reach the MSC, and
+ * the locks must be taken in this order:
+ */
+ cpus_read_lock();
+ mutex_lock(&mpam_list_lock);
+ mpam_enable_merge_features(&mpam_classes);
+
+ err = mpam_register_irqs();
+ if (err)
+ pr_warn("Failed to register irqs: %d\n", err);
+
+ mutex_unlock(&mpam_list_lock);
+ cpus_read_unlock();
+
+ if (err) {
+ schedule_work(&mpam_broken_work);
+ return;
+ }
+
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
* All of MPAMs errors indicate a software bug, restore any modified
* controls to their reset values.
*/
-void mpam_disable(void)
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
{
int idx;
struct mpam_class *class;
+ struct mpam_msc *msc, *tmp;
+
+ mutex_lock(&mpam_cpuhp_state_lock);
+ if (mpam_cpuhp_state) {
+ cpuhp_remove_state(mpam_cpuhp_state);
+ mpam_cpuhp_state = 0;
+ }
+ mutex_unlock(&mpam_cpuhp_state_lock);
+
+ mpam_unregister_irqs();
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(class, &mpam_classes, classes_list,
srcu_read_lock_held(&mpam_srcu))
mpam_reset_class(class);
srcu_read_unlock(&mpam_srcu, idx);
+
+ mutex_lock(&mpam_list_lock);
+ list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
+ mpam_msc_destroy(msc);
+ mutex_unlock(&mpam_list_lock);
+ mpam_free_garbage();
+
+ return IRQ_HANDLED;
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+ mpam_disable_thread(0, NULL);
}
/*
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b30fee2b7674..c9418c9cf9f2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -44,6 +44,11 @@ struct mpam_msc {
struct pcc_mbox_chan *pcc_chan;
u32 nrdy_usec;
cpumask_t accessibility;
+ bool has_extd_esr;
+
+ int reenable_error_ppi;
+ struct mpam_msc * __percpu *error_dev_id;
+
atomic_t online_refs;
/*
@@ -52,6 +57,8 @@ struct mpam_msc {
*/
struct mutex probe_lock;
bool probed;
+ bool error_irq_requested;
+ bool error_irq_hw_enabled;
u16 partid_max;
u8 pmg_max;
unsigned long ris_idxs[128 / BITS_PER_LONG];
@@ -281,7 +288,7 @@ extern u8 mpam_pmg_max;
/* Scheduled work callback to enable mpam once all MSC have been probed */
void mpam_enable(struct work_struct *work);
-void mpam_disable(void);
+void mpam_disable(struct work_struct *work);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (21 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
` (44 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.
After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.
Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in repsonse to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 8 ++++++++
drivers/resctrl/mpam_internal.h | 8 ++++++++
2 files changed, 16 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 210d64fad0b1..b424af666b1e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -33,6 +33,8 @@
#include "mpam_internal.h"
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* TODO: move to arch code */
+
/*
* mpam_list_lock protects the SRCU lists when writing. Once the
* mpam_enabled key is enabled these lists are read-only,
@@ -1039,6 +1041,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
struct mpam_msc *msc;
bool new_device_probed = false;
+ if (mpam_is_enabled())
+ return 0;
+
mutex_lock(&mpam_list_lock);
list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
if (!cpumask_test_cpu(cpu, &msc->accessibility))
@@ -1835,6 +1840,7 @@ static void mpam_enable_once(void)
return;
}
+ static_branch_enable(&mpam_enabled);
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1902,6 +1908,8 @@ static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
}
mutex_unlock(&mpam_cpuhp_state_lock);
+ static_branch_disable(&mpam_enabled);
+
mpam_unregister_irqs();
idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c9418c9cf9f2..3476ee97f8ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -8,6 +8,7 @@
#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/io.h>
+#include <linux/jump_label.h>
#include <linux/llist.h>
#include <linux/mailbox_client.h>
#include <linux/mutex.h>
@@ -15,6 +16,13 @@
#include <linux/sizes.h>
#include <linux/srcu.h>
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+ return static_branch_likely(&mpam_enabled);
+}
+
/*
* Structures protected by SRCU may not be freed for a surprising amount of
* time (especially if perf is running). To ensure the MPAM error interrupt can
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (22 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-28 16:13 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
` (43 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
When CPUs come online the original configuration should be restored.
Once the maximum partid is known, allocate an configuration array for
each component, and reprogram each RIS configuration from this.
The MPAM spec describes how multiple controls can interact. To prevent
this happening by accident, always reset controls that don't have a
valid configuration. This allows the same helper to be used for
configuration and reset.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Added a comment about the ordering around max_partid.
* Allocate configurations after interrupts are registered to reduce churn.
* Added mpam_assert_partid_sizes_fixed();
---
drivers/resctrl/mpam_devices.c | 253 +++++++++++++++++++++++++++++---
drivers/resctrl/mpam_internal.h | 26 +++-
2 files changed, 251 insertions(+), 28 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b424af666b1e..8f6df2406c22 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -112,6 +112,16 @@ LIST_HEAD(mpam_classes);
/* List of all objects that can be free()d after synchronise_srcu() */
static LLIST_HEAD(mpam_garbage);
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+ WARN_ON_ONCE(!partid_max_published);
+}
+
static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
{
WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
@@ -374,12 +384,16 @@ static void mpam_class_destroy(struct mpam_class *class)
add_to_garbage(class);
}
+static void __destroy_component_cfg(struct mpam_component *comp);
+
static void mpam_comp_destroy(struct mpam_component *comp)
{
struct mpam_class *class = comp->class;
lockdep_assert_held(&mpam_list_lock);
+ __destroy_component_cfg(comp);
+
list_del_rcu(&comp->class_list);
add_to_garbage(comp);
@@ -911,51 +925,90 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
__mpam_write_reg(msc, reg, bm);
}
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+ struct mpam_config *cfg)
{
u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
struct mpam_msc *msc = ris->vmsc->msc;
struct mpam_props *rprops = &ris->props;
- mpam_assert_srcu_read_lock_held();
-
mutex_lock(&msc->part_sel_lock);
__mpam_part_sel(ris->ris_idx, partid, msc);
- if (mpam_has_feature(mpam_feat_cpor_part, rprops))
- mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+ if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
+ if (mpam_has_feature(mpam_feat_cpor_part, cfg))
+ mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+ else
+ mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+ rprops->cpbm_wd);
+ }
- if (mpam_has_feature(mpam_feat_mbw_part, rprops))
- mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+ if (mpam_has_feature(mpam_feat_mbw_part, rprops)) {
+ if (mpam_has_feature(mpam_feat_mbw_part, cfg))
+ mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+ else
+ mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+ rprops->mbw_pbm_bits);
+ }
if (mpam_has_feature(mpam_feat_mbw_min, rprops))
mpam_write_partsel_reg(msc, MBW_MIN, 0);
- if (mpam_has_feature(mpam_feat_mbw_max, rprops))
- mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+ if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
+ if (mpam_has_feature(mpam_feat_mbw_max, cfg))
+ mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
+ else
+ mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+ }
if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
mutex_unlock(&msc->part_sel_lock);
}
+struct reprogram_ris {
+ struct mpam_msc_ris *ris;
+ struct mpam_config *cfg;
+};
+
+/* Call with MSC lock held */
+static int mpam_reprogram_ris(void *_arg)
+{
+ u16 partid, partid_max;
+ struct reprogram_ris *arg = _arg;
+ struct mpam_msc_ris *ris = arg->ris;
+ struct mpam_config *cfg = arg->cfg;
+
+ if (ris->in_reset_state)
+ return 0;
+
+ spin_lock(&partid_max_lock);
+ partid_max = mpam_partid_max;
+ spin_unlock(&partid_max_lock);
+ for (partid = 0; partid <= partid_max; partid++)
+ mpam_reprogram_ris_partid(ris, partid, cfg);
+
+ return 0;
+}
+
/*
* Called via smp_call_on_cpu() to prevent migration, while still being
* pre-emptible.
*/
static int mpam_reset_ris(void *arg)
{
- u16 partid, partid_max;
struct mpam_msc_ris *ris = arg;
+ struct reprogram_ris reprogram_arg;
+ struct mpam_config empty_cfg = { 0 };
if (ris->in_reset_state)
return 0;
- spin_lock(&partid_max_lock);
- partid_max = mpam_partid_max;
- spin_unlock(&partid_max_lock);
- for (partid = 0; partid < partid_max; partid++)
- mpam_reset_ris_partid(ris, partid);
+ reprogram_arg.ris = ris;
+ reprogram_arg.cfg = &empty_cfg;
+
+ mpam_reprogram_ris(&reprogram_arg);
return 0;
}
@@ -986,13 +1039,11 @@ static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
static void mpam_reset_msc(struct mpam_msc *msc, bool online)
{
- int idx;
struct mpam_msc_ris *ris;
mpam_assert_srcu_read_lock_held();
mpam_mon_sel_outer_lock(msc);
- idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
mpam_touch_msc(msc, &mpam_reset_ris, ris);
@@ -1002,10 +1053,42 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
*/
ris->in_reset_state = online;
}
- srcu_read_unlock(&mpam_srcu, idx);
mpam_mon_sel_outer_unlock(msc);
}
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+ u16 partid;
+ bool reset;
+ struct mpam_config *cfg;
+ struct mpam_msc_ris *ris;
+
+ /*
+ * No lock for mpam_partid_max as partid_max_published has been
+ * set by mpam_enabled(), so the values can no longer change.
+ */
+ mpam_assert_partid_sizes_fixed();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_rcu(ris, &msc->ris, msc_list) {
+ if (!mpam_is_enabled() && !ris->in_reset_state) {
+ mpam_touch_msc(msc, &mpam_reset_ris, ris);
+ ris->in_reset_state = true;
+ continue;
+ }
+
+ reset = true;
+ for (partid = 0; partid <= mpam_partid_max; partid++) {
+ cfg = &ris->vmsc->comp->cfg[partid];
+ if (cfg->features)
+ reset = false;
+
+ mpam_reprogram_ris_partid(ris, partid, cfg);
+ }
+ ris->in_reset_state = reset;
+ }
+}
+
static void _enable_percpu_irq(void *_irq)
{
int *irq = _irq;
@@ -1027,7 +1110,7 @@ static int mpam_cpu_online(unsigned int cpu)
_enable_percpu_irq(&msc->reenable_error_ppi);
if (atomic_fetch_inc(&msc->online_refs) == 0)
- mpam_reset_msc(msc, true);
+ mpam_reprogram_msc(msc);
}
srcu_read_unlock(&mpam_srcu, idx);
@@ -1807,6 +1890,45 @@ static void mpam_unregister_irqs(void)
cpus_read_unlock();
}
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+ add_to_garbage(comp->cfg);
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+ mpam_assert_partid_sizes_fixed();
+
+ if (comp->cfg)
+ return 0;
+
+ comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+ if (!comp->cfg)
+ return -ENOMEM;
+ init_garbage(comp->cfg);
+
+ return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+ int err = 0;
+ struct mpam_class *class;
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(class, &mpam_classes, classes_list) {
+ list_for_each_entry(comp, &class->components, class_list) {
+ err = __allocate_component_cfg(comp);
+ if (err)
+ return err;
+ }
+ }
+
+ return 0;
+}
+
static void mpam_enable_once(void)
{
int err;
@@ -1826,12 +1948,21 @@ static void mpam_enable_once(void)
*/
cpus_read_lock();
mutex_lock(&mpam_list_lock);
- mpam_enable_merge_features(&mpam_classes);
+ do {
+ mpam_enable_merge_features(&mpam_classes);
- err = mpam_register_irqs();
- if (err)
- pr_warn("Failed to register irqs: %d\n", err);
+ err = mpam_register_irqs();
+ if (err) {
+ pr_warn("Failed to register irqs: %d\n", err);
+ break;
+ }
+ err = mpam_allocate_config();
+ if (err) {
+ pr_err("Failed to allocate configuration arrays.\n");
+ break;
+ }
+ } while (0);
mutex_unlock(&mpam_list_lock);
cpus_read_unlock();
@@ -1856,6 +1987,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
might_sleep();
lockdep_assert_cpus_held();
+ mpam_assert_partid_sizes_fixed();
+
+ memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
@@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
mpam_enable_once();
}
+struct mpam_write_config_arg {
+ struct mpam_msc_ris *ris;
+ struct mpam_component *comp;
+ u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+ struct mpam_write_config_arg *c = arg;
+
+ mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+ return 0;
+}
+
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+ if (mpam_has_feature(feature, newcfg) && \
+ (newcfg)->member != (cfg)->member) { \
+ (cfg)->member = (newcfg)->member; \
+ cfg->features |= (1 << feature); \
+ \
+ (changes) |= (1 << feature); \
+ } \
+} while (0)
+
+static mpam_features_t mpam_update_config(struct mpam_config *cfg,
+ const struct mpam_config *newcfg)
+{
+ mpam_features_t changes = 0;
+
+ maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
+ maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
+ maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
+
+ return changes;
+}
+
+/* TODO: split into write_config/sync_config */
+/* TODO: add config_dirty bitmap to drive sync_config */
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+ struct mpam_config *cfg)
+{
+ struct mpam_write_config_arg arg;
+ struct mpam_msc_ris *ris;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc *msc;
+ int idx;
+
+ lockdep_assert_cpus_held();
+
+ /* Don't pass in the current config! */
+ WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+ if (!mpam_update_config(&comp->cfg[partid], cfg))
+ return 0;
+
+ arg.comp = comp;
+ arg.partid = partid;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ arg.ris = ris;
+ mpam_touch_msc(msc, __write_config, &arg);
+ }
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
+ return 0;
+}
+
/*
* MSC that are hidden under caches are not created as platform devices
* as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 3476ee97f8ac..70cba9f22746 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -191,11 +191,7 @@ struct mpam_props {
u16 num_mbwu_mon;
};
-static inline bool mpam_has_feature(enum mpam_device_features feat,
- struct mpam_props *props)
-{
- return (1 << feat) & props->features;
-}
+#define mpam_has_feature(_feat, x) ((1 << (_feat)) & (x)->features)
static inline void mpam_set_feature(enum mpam_device_features feat,
struct mpam_props *props)
@@ -226,6 +222,17 @@ struct mpam_class {
struct mpam_garbage garbage;
};
+struct mpam_config {
+ /* Which configuration values are valid. 0 is used for reset */
+ mpam_features_t features;
+
+ u32 cpbm;
+ u32 mbw_pbm;
+ u16 mbw_max;
+
+ struct mpam_garbage garbage;
+};
+
struct mpam_component {
u32 comp_id;
@@ -234,6 +241,12 @@ struct mpam_component {
cpumask_t affinity;
+ /*
+ * Array of configuration values, indexed by partid.
+ * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+ */
+ struct mpam_config *cfg;
+
/* member of mpam_class:components */
struct list_head class_list;
@@ -298,6 +311,9 @@ extern u8 mpam_pmg_max;
void mpam_enable(struct work_struct *work);
void mpam_disable(struct work_struct *work);
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+ struct mpam_config *cfg);
+
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (23 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-28 10:11 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
` (42 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew,
Zeng Heng, Dave Martin
MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.
Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.
PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 175 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 16 ++-
2 files changed, 189 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8f6df2406c22..aedd743d6827 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -213,6 +213,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
__mpam_part_sel_raw(partsel, msc);
}
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+ u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+ FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+ MPAMCFG_PART_SEL_INTERNAL;
+
+ __mpam_part_sel_raw(partsel, msc);
+}
+
int mpam_register_requestor(u16 partid_max, u8 pmg_max)
{
int err = 0;
@@ -743,10 +752,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
int err;
struct mpam_msc *msc = ris->vmsc->msc;
struct mpam_props *props = &ris->props;
+ struct mpam_class *class = ris->vmsc->comp->class;
lockdep_assert_held(&msc->probe_lock);
lockdep_assert_held(&msc->part_sel_lock);
+ /* Cache Capacity Partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+ u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+ props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+ if (props->cmax_wd &&
+ FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+ if (props->cmax_wd &&
+ !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+ if (props->cmax_wd &&
+ FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+ props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+
+ if (props->cassoc_wd &&
+ FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_cassoc, props);
+ }
+
/* Cache Portion partitioning */
if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -769,6 +803,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
mpam_set_feature(mpam_feat_mbw_max, props);
+
+ if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_min, props);
+
+ if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_prop, props);
+ }
+
+ /* Priority partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+ u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+ props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+ if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+ mpam_set_feature(mpam_feat_intpri_part, props);
+ if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+ mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+ }
+
+ props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+ if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+ mpam_set_feature(mpam_feat_dspri_part, props);
+ if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+ mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+ }
}
/* Performance Monitoring */
@@ -832,6 +891,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
*/
}
}
+
+ /*
+ * RIS with PARTID narrowing don't have enough storage for one
+ * configuration per PARTID. If these are in a class we could use,
+ * reduce the supported partid_max to match the number of intpartid.
+ * If the class is unknown, just ignore it.
+ */
+ if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+ class->type != MPAM_CLASS_UNKNOWN) {
+ u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+ u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+ mpam_set_feature(mpam_feat_partid_nrw, props);
+ msc->partid_max = min(msc->partid_max, partid_max);
+ }
}
static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -929,13 +1003,29 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
struct mpam_config *cfg)
{
+ u32 pri_val = 0;
+ u16 cmax = MPAMCFG_CMAX_CMAX;
u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
struct mpam_msc *msc = ris->vmsc->msc;
struct mpam_props *rprops = &ris->props;
+ u16 dspri = GENMASK(rprops->dspri_wd, 0);
+ u16 intpri = GENMASK(rprops->intpri_wd, 0);
mutex_lock(&msc->part_sel_lock);
__mpam_part_sel(ris->ris_idx, partid, msc);
+ if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+ /* Update the intpartid mapping */
+ mpam_write_partsel_reg(msc, INTPARTID,
+ MPAMCFG_INTPARTID_INTERNAL | partid);
+
+ /*
+ * Then switch to the 'internal' partid to update the
+ * configuration.
+ */
+ __mpam_intpart_sel(ris->ris_idx, partid, msc);
+ }
+
if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
if (mpam_has_feature(mpam_feat_cpor_part, cfg))
mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
@@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+
+ if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+ mpam_write_partsel_reg(msc, CMAX, cmax);
+
+ if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+ mpam_write_partsel_reg(msc, CMIN, 0);
+
+ if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+ mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+ /* aces high? */
+ if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+ intpri = 0;
+ if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+ dspri = 0;
+
+ if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+ pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+ if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+ pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+ mpam_write_partsel_reg(msc, PRI, pri_val);
+ }
+
mutex_unlock(&msc->part_sel_lock);
}
@@ -1529,6 +1642,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
return false;
}
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+ if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+ return true;
+ if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+ return true;
+ return false;
+}
+
#define MISMATCHED_HELPER(parent, child, helper, field, alias) \
helper(parent) && \
((helper(child) && (parent)->field != (child)->field) || \
@@ -1583,6 +1706,23 @@ static void __props_mismatch(struct mpam_props *parent,
parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
}
+ if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+ parent->cmax_wd = child->cmax_wd;
+ } else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+ cmax_wd, alias)) {
+ pr_debug("%s took the min cmax_wd\n", __func__);
+ parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+ parent->cassoc_wd = child->cassoc_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+ cassoc_wd, alias)) {
+ pr_debug("%s cleared cassoc_wd\n", __func__);
+ mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
+ parent->cassoc_wd = 0;
+ }
+
/* For num properties, take the minimum */
if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
parent->num_csu_mon = child->num_csu_mon;
@@ -1600,6 +1740,41 @@ static void __props_mismatch(struct mpam_props *parent,
parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
}
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+ parent->intpri_wd = child->intpri_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+ intpri_wd, alias)) {
+ pr_debug("%s took the min intpri_wd\n", __func__);
+ parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+ parent->dspri_wd = child->dspri_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+ dspri_wd, alias)) {
+ pr_debug("%s took the min dspri_wd\n", __func__);
+ parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+ }
+
+ /* TODO: alias support for these two */
+ /* {int,ds}pri may not have differing 0-low behaviour */
+ if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+ (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+ mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+ mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+ pr_debug("%s cleared intpri_part\n", __func__);
+ mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
+ mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
+ }
+ if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+ (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+ mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+ mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+ pr_debug("%s cleared dspri_part\n", __func__);
+ mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
+ mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
+ }
+
if (alias) {
/* Merge features for aliased resources */
parent->features |= child->features;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 70cba9f22746..23445aedbabd 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -157,16 +157,23 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
* When we compact the supported features, we don't care what they are.
* Storing them as a bitmap makes life easy.
*/
-typedef u16 mpam_features_t;
+typedef u32 mpam_features_t;
/* Bits for mpam_features_t */
enum mpam_device_features {
- mpam_feat_ccap_part = 0,
+ mpam_feat_cmax_softlim,
+ mpam_feat_cmax_cmax,
+ mpam_feat_cmax_cmin,
+ mpam_feat_cmax_cassoc,
mpam_feat_cpor_part,
mpam_feat_mbw_part,
mpam_feat_mbw_min,
mpam_feat_mbw_max,
mpam_feat_mbw_prop,
+ mpam_feat_intpri_part,
+ mpam_feat_intpri_part_0_low,
+ mpam_feat_dspri_part,
+ mpam_feat_dspri_part_0_low,
mpam_feat_msmon,
mpam_feat_msmon_csu,
mpam_feat_msmon_csu_capture,
@@ -176,6 +183,7 @@ enum mpam_device_features {
mpam_feat_msmon_mbwu_rwbw,
mpam_feat_msmon_mbwu_hw_nrdy,
mpam_feat_msmon_capt,
+ mpam_feat_partid_nrw,
MPAM_FEATURE_LAST,
};
static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
@@ -187,6 +195,10 @@ struct mpam_props {
u16 cpbm_wd;
u16 mbw_pbm_bits;
u16 bwa_wd;
+ u16 cmax_wd;
+ u16 cassoc_wd;
+ u16 intpri_wd;
+ u16 dspri_wd;
u16 num_csu_mon;
u16 num_mbwu_mon;
};
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 26/33] arm_mpam: Add helpers to allocate monitors
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (24 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 15:47 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
` (41 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 2 ++
drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index aedd743d6827..e7e00c632512 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -348,6 +348,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
class->level = level_idx;
class->type = type;
INIT_LIST_HEAD_RCU(&class->classes_list);
+ ida_init(&class->ida_csu_mon);
+ ida_init(&class->ida_mbwu_mon);
list_add_rcu(&class->classes_list, &mpam_classes);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 23445aedbabd..4981de120869 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -231,6 +231,9 @@ struct mpam_class {
/* member of mpam_classes */
struct list_head classes_list;
+ struct ida ida_csu_mon;
+ struct ida ida_mbwu_mon;
+
struct mpam_garbage garbage;
};
@@ -306,6 +309,38 @@ struct mpam_msc_ris {
struct mpam_garbage garbage;
};
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+ return -EOPNOTSUPP;
+
+ return ida_alloc_range(&class->ida_csu_mon, 0, cprops->num_csu_mon - 1,
+ GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+ ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+ return -EOPNOTSUPP;
+
+ return ida_alloc_range(&class->ida_mbwu_mon, 0,
+ cprops->num_mbwu_mon - 1, GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+ ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
/* List of all classes - protected by srcu*/
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (25 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 15:55 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
` (40 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.
Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.
CC: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 222 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 18 +++
2 files changed, 240 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e7e00c632512..9ce771aaf671 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
return 0;
}
+struct mon_read {
+ struct mpam_msc_ris *ris;
+ struct mon_cfg *ctx;
+ enum mpam_device_features type;
+ u64 *val;
+ int err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+ u32 *flt_val)
+{
+ struct mon_cfg *ctx = m->ctx;
+
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ *ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
+ break;
+ case mpam_feat_msmon_mbwu:
+ *ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+ break;
+ default:
+ return;
+ }
+
+ /*
+ * For CSU counters its implementation-defined what happens when not
+ * filtering by partid.
+ */
+ *ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
+
+ *flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+ if (m->ctx->match_pmg) {
+ *ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+ *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
+ }
+
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+ *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+ u32 *flt_val)
+{
+ struct mpam_msc *msc = m->ris->vmsc->msc;
+
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ *ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+ *flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+ break;
+ case mpam_feat_msmon_mbwu:
+ *ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+ *flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+ break;
+ default:
+ return;
+ }
+}
+
+/* Remove values set by the hardware to prevent apparant mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+ *cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+ u32 flt_val)
+{
+ struct mpam_msc *msc = m->ris->vmsc->msc;
+
+ /*
+ * Write the ctl_val with the enable bit cleared, reset the counter,
+ * then enable counter.
+ */
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+ mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+ mpam_write_monsel_reg(msc, CSU, 0);
+ mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+ break;
+ case mpam_feat_msmon_mbwu:
+ mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+ mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+ mpam_write_monsel_reg(msc, MBWU, 0);
+ mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+ break;
+ default:
+ return;
+ }
+}
+
+/* Call with MSC lock held */
+static void __ris_msmon_read(void *arg)
+{
+ u64 now;
+ bool nrdy = false;
+ struct mon_read *m = arg;
+ struct mon_cfg *ctx = m->ctx;
+ struct mpam_msc_ris *ris = m->ris;
+ struct mpam_props *rprops = &ris->props;
+ struct mpam_msc *msc = m->ris->vmsc->msc;
+ u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+ if (!mpam_mon_sel_inner_lock(msc)) {
+ m->err = -EIO;
+ return;
+ }
+ mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+ FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+ mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+ /*
+ * Read the existing configuration to avoid re-writing the same values.
+ * This saves waiting for 'nrdy' on subsequent reads.
+ */
+ read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+ clean_msmon_ctl_val(&cur_ctl);
+ gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+ if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+ write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ now = mpam_read_monsel_reg(msc, CSU);
+ if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY;
+ break;
+ case mpam_feat_msmon_mbwu:
+ now = mpam_read_monsel_reg(msc, MBWU);
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY;
+ break;
+ default:
+ m->err = -EINVAL;
+ break;
+ }
+ mpam_mon_sel_inner_unlock(msc);
+
+ if (nrdy) {
+ m->err = -EBUSY;
+ return;
+ }
+
+ now = FIELD_GET(MSMON___VALUE, now);
+ *m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+ int err, idx;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ mpam_mon_sel_outer_lock(msc);
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ arg->ris = ris;
+
+ err = smp_call_function_any(&msc->accessibility,
+ __ris_msmon_read, arg,
+ true);
+ if (!err && arg->err)
+ err = arg->err;
+ if (err)
+ break;
+ }
+ mpam_mon_sel_outer_unlock(msc);
+ if (err)
+ break;
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
+ return err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+ enum mpam_device_features type, u64 *val)
+{
+ int err;
+ struct mon_read arg;
+ u64 wait_jiffies = 0;
+ struct mpam_props *cprops = &comp->class->props;
+
+ might_sleep();
+
+ if (!mpam_is_enabled())
+ return -EIO;
+
+ if (!mpam_has_feature(type, cprops))
+ return -EOPNOTSUPP;
+
+ memset(&arg, 0, sizeof(arg));
+ arg.ctx = ctx;
+ arg.type = type;
+ arg.val = val;
+ *val = 0;
+
+ err = _msmon_read(comp, &arg);
+ if (err == -EBUSY && comp->class->nrdy_usec)
+ wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+ while (wait_jiffies)
+ wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+ if (err == -EBUSY) {
+ memset(&arg, 0, sizeof(arg));
+ arg.ctx = ctx;
+ arg.type = type;
+ arg.val = val;
+ *val = 0;
+
+ err = _msmon_read(comp, &arg);
+ }
+
+ return err;
+}
+
static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
{
u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4981de120869..76e406a2b0d1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -309,6 +309,21 @@ struct mpam_msc_ris {
struct mpam_garbage garbage;
};
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+ COUNT_BOTH = 0,
+ COUNT_WRITE = 1,
+ COUNT_READ = 2,
+};
+
+struct mon_cfg {
+ u16 mon;
+ u8 pmg;
+ bool match_pmg;
+ u32 partid;
+ enum mon_filter_options opts;
+};
+
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
@@ -361,6 +376,9 @@ void mpam_disable(struct work_struct *work);
int mpam_apply_config(struct mpam_component *comp, u16 partid,
struct mpam_config *cfg);
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+ enum mpam_device_features, u64 *val);
+
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (26 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 16:09 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
` (39 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Bandwidth counters need to run continuously to correctly reflect the
bandwidth.
The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 163 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 54 ++++++++---
2 files changed, 200 insertions(+), 17 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9ce771aaf671..11be34b54643 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1004,6 +1004,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+ *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
if (m->ctx->match_pmg) {
*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
@@ -1041,6 +1042,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
u32 flt_val)
{
+ struct msmon_mbwu_state *mbwu_state;
struct mpam_msc *msc = m->ris->vmsc->msc;
/*
@@ -1059,20 +1061,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
mpam_write_monsel_reg(msc, MBWU, 0);
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+
+ mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+ if (mbwu_state)
+ mbwu_state->prev_val = 0;
+
break;
default:
return;
}
}
+static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+{
+ /* TODO: scaling, and long counters */
+ return GENMASK_ULL(30, 0);
+}
+
/* Call with MSC lock held */
static void __ris_msmon_read(void *arg)
{
- u64 now;
bool nrdy = false;
struct mon_read *m = arg;
+ u64 now, overflow_val = 0;
struct mon_cfg *ctx = m->ctx;
struct mpam_msc_ris *ris = m->ris;
+ struct msmon_mbwu_state *mbwu_state;
struct mpam_props *rprops = &ris->props;
struct mpam_msc *msc = m->ris->vmsc->msc;
u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1100,11 +1114,30 @@ static void __ris_msmon_read(void *arg)
now = mpam_read_monsel_reg(msc, CSU);
if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
nrdy = now & MSMON___NRDY;
+ now = FIELD_GET(MSMON___VALUE, now);
break;
case mpam_feat_msmon_mbwu:
now = mpam_read_monsel_reg(msc, MBWU);
if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
nrdy = now & MSMON___NRDY;
+ now = FIELD_GET(MSMON___VALUE, now);
+
+ if (nrdy)
+ break;
+
+ mbwu_state = &ris->mbwu_state[ctx->mon];
+ if (!mbwu_state)
+ break;
+
+ /* Add any pre-overflow value to the mbwu_state->val */
+ if (mbwu_state->prev_val > now)
+ overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+
+ mbwu_state->prev_val = now;
+ mbwu_state->correction += overflow_val;
+
+ /* Include bandwidth consumed before the last hardware reset */
+ now += mbwu_state->correction;
break;
default:
m->err = -EINVAL;
@@ -1117,7 +1150,6 @@ static void __ris_msmon_read(void *arg)
return;
}
- now = FIELD_GET(MSMON___VALUE, now);
*m->val += now;
}
@@ -1329,6 +1361,72 @@ static int mpam_reprogram_ris(void *_arg)
return 0;
}
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+ int i;
+ struct mon_read mwbu_arg;
+ struct mpam_msc_ris *ris = _ris;
+ struct mpam_msc *msc = ris->vmsc->msc;
+
+ mpam_mon_sel_outer_lock(msc);
+
+ for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+ if (ris->mbwu_state[i].enabled) {
+ mwbu_arg.ris = ris;
+ mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+ mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+ __ris_msmon_read(&mwbu_arg);
+ }
+ }
+
+ mpam_mon_sel_outer_unlock(msc);
+
+ return 0;
+}
+
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_save_mbwu_state(void *arg)
+{
+ int i;
+ u64 val;
+ struct mon_cfg *cfg;
+ u32 cur_flt, cur_ctl, mon_sel;
+ struct mpam_msc_ris *ris = arg;
+ struct msmon_mbwu_state *mbwu_state;
+ struct mpam_msc *msc = ris->vmsc->msc;
+
+ for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+ mbwu_state = &ris->mbwu_state[i];
+ cfg = &mbwu_state->cfg;
+
+ if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+ return -EIO;
+
+ mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+ FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+ mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+ cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+ cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+ mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+ val = mpam_read_monsel_reg(msc, MBWU);
+ mpam_write_monsel_reg(msc, MBWU, 0);
+
+ cfg->mon = i;
+ cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
+ cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+ cfg->partid = FIELD_GET(MSMON_CFG_MBWU_FLT_PARTID, cur_flt);
+ mbwu_state->correction += val;
+ mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+ mpam_mon_sel_inner_unlock(msc);
+ }
+
+ return 0;
+}
+
/*
* Called via smp_call_on_cpu() to prevent migration, while still being
* pre-emptible.
@@ -1389,6 +1487,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
* for non-zero partid may be lost while the CPUs are offline.
*/
ris->in_reset_state = online;
+
+ if (mpam_is_enabled() && !online)
+ mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
}
mpam_mon_sel_outer_unlock(msc);
}
@@ -1423,6 +1524,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
mpam_reprogram_ris_partid(ris, partid, cfg);
}
ris->in_reset_state = reset;
+
+ if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+ mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
}
}
@@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
static void __destroy_component_cfg(struct mpam_component *comp)
{
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ lockdep_assert_held(&mpam_list_lock);
+
add_to_garbage(comp->cfg);
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ mpam_mon_sel_outer_lock(msc);
+ if (mpam_mon_sel_inner_lock(msc)) {
+ list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+ add_to_garbage(ris->mbwu_state);
+ mpam_mon_sel_inner_unlock(msc);
+ }
+ mpam_mon_sel_outer_lock(msc);
+ }
}
static int __allocate_component_cfg(struct mpam_component *comp)
{
+ int err = 0;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+ struct msmon_mbwu_state *mbwu_state;
+
+ lockdep_assert_held(&mpam_list_lock);
mpam_assert_partid_sizes_fixed();
if (comp->cfg)
@@ -2306,6 +2434,37 @@ static int __allocate_component_cfg(struct mpam_component *comp)
return -ENOMEM;
init_garbage(comp->cfg);
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ if (!vmsc->props.num_mbwu_mon)
+ continue;
+
+ msc = vmsc->msc;
+ mpam_mon_sel_outer_lock(msc);
+ list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+ if (!ris->props.num_mbwu_mon)
+ continue;
+
+ mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+ sizeof(*ris->mbwu_state),
+ GFP_KERNEL);
+ if (!mbwu_state) {
+ __destroy_component_cfg(comp);
+ err = -ENOMEM;
+ break;
+ }
+
+ if (mpam_mon_sel_inner_lock(msc)) {
+ init_garbage(mbwu_state);
+ ris->mbwu_state = mbwu_state;
+ mpam_mon_sel_inner_unlock(msc);
+ }
+ }
+ mpam_mon_sel_outer_unlock(msc);
+
+ if (err)
+ break;
+ }
+
return 0;
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 76e406a2b0d1..9a50a5432f4a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -271,6 +271,42 @@ struct mpam_component {
struct mpam_garbage garbage;
};
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+ COUNT_BOTH = 0,
+ COUNT_WRITE = 1,
+ COUNT_READ = 2,
+};
+
+struct mon_cfg {
+ /* mon is wider than u16 to hold an out of range 'USE_RMID_IDX' */
+ u32 mon;
+ u8 pmg;
+ bool match_pmg;
+ u32 partid;
+ enum mon_filter_options opts;
+};
+
+/*
+ * Changes to enabled and cfg are protected by the msc->lock.
+ * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ */
+struct msmon_mbwu_state {
+ bool enabled;
+ struct mon_cfg cfg;
+
+ /* The value last read from the hardware. Used to detect overflow. */
+ u64 prev_val;
+
+ /*
+ * The value to add to the new reading to account for power management,
+ * and shifts to trigger the overflow interrupt.
+ */
+ u64 correction;
+
+ struct mpam_garbage garbage;
+};
+
struct mpam_vmsc {
/* member of mpam_component:vmsc_list */
struct list_head comp_list;
@@ -306,22 +342,10 @@ struct mpam_msc_ris {
/* parent: */
struct mpam_vmsc *vmsc;
- struct mpam_garbage garbage;
-};
+ /* msmon mbwu configuration is preserved over reset */
+ struct msmon_mbwu_state *mbwu_state;
-/* The values for MSMON_CFG_MBWU_FLT.RWBW */
-enum mon_filter_options {
- COUNT_BOTH = 0,
- COUNT_WRITE = 1,
- COUNT_READ = 2,
-};
-
-struct mon_cfg {
- u16 mon;
- u8 pmg;
- bool match_pmg;
- u32 partid;
- enum mon_filter_options opts;
+ struct mpam_garbage garbage;
};
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (27 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-28 16:14 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
` (38 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
indicating support for long counters. As of now, a 44 bit counter
represented by HAS_LONG field (bit 30) and a 63 bit counter represented
by LWD (bit 29) can be optionally integrated. Probe for these counters
and set corresponding feature bits if any of these counters are present.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 23 ++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 8 ++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 11be34b54643..2ab7f127baaa 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
}
if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
- bool hw_managed;
+ bool has_long, hw_managed;
u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
@@ -880,6 +880,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+ /*
+ * Treat long counter and its extension, lwd as mutually
+ * exclusive feature bits. Though these are dependent
+ * fields at the implementation level, there would never
+ * be a need for mpam_feat_msmon_mbwu_44counter (long
+ * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
+ * bits to be set together.
+ *
+ * mpam_feat_msmon_mbwu isn't treated as an exclusive
+ * bit as this feature bit would be used as the "front
+ * facing feature bit" for any checks related to mbwu
+ * monitors.
+ */
+ has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumonidr);
+ if (props->num_mbwu_mon && has_long) {
+ if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumonidr))
+ mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+ else
+ mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+ }
+
/* Is NRDY hardware managed? */
mpam_mon_sel_outer_lock(msc);
hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9a50a5432f4a..9f627b5f72a1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -178,7 +178,15 @@ enum mpam_device_features {
mpam_feat_msmon_csu,
mpam_feat_msmon_csu_capture,
mpam_feat_msmon_csu_hw_nrdy,
+
+ /*
+ * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
+ * counter would be used. The exact counter used is decided based on the
+ * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
+ */
mpam_feat_msmon_mbwu,
+ mpam_feat_msmon_mbwu_44counter,
+ mpam_feat_msmon_mbwu_63counter,
mpam_feat_msmon_mbwu_capture,
mpam_feat_msmon_mbwu_rwbw,
mpam_feat_msmon_mbwu_hw_nrdy,
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (28 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 16:39 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
` (37 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
the RIS, use long/LWD counter instead of the regular 31 bit mbwu
counter.
Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit]
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Commit message wrangling.
* Refer to 31 bit counters as opposed to 32 bit (registers).
---
drivers/resctrl/mpam_devices.c | 89 ++++++++++++++++++++++++++++++----
1 file changed, 80 insertions(+), 9 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2ab7f127baaa..8fbcf6eb946a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1002,6 +1002,48 @@ struct mon_read {
int err;
};
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+ return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+ mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+ int retry = 3;
+ u32 mbwu_l_low;
+ u64 mbwu_l_high1, mbwu_l_high2;
+
+ mpam_mon_sel_lock_held(msc);
+
+ WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+ do {
+ mbwu_l_high1 = mbwu_l_high2;
+ mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+ mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+ retry--;
+ } while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+ if (mbwu_l_high1 == mbwu_l_high2)
+ return (mbwu_l_high1 << 32) | mbwu_l_low;
+ return MSMON___NRDY_L;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+ mpam_mon_sel_lock_held(msc);
+
+ WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ __mpam_write_reg(msc, MSMON_MBWU_L, 0);
+ __mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
u32 *flt_val)
{
@@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
static void clean_msmon_ctl_val(u32 *cur_ctl)
{
*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+ *cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
}
static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -1080,7 +1123,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
case mpam_feat_msmon_mbwu:
mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
- mpam_write_monsel_reg(msc, MBWU, 0);
+ if (mpam_ris_has_mbwu_long_counter(m->ris))
+ mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
+ else
+ mpam_write_monsel_reg(msc, MBWU, 0);
+
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
@@ -1095,8 +1142,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
{
- /* TODO: scaling, and long counters */
- return GENMASK_ULL(30, 0);
+ /* TODO: implement scaling counters */
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
+ return GENMASK_ULL(62, 0);
+ else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
+ return GENMASK_ULL(43, 0);
+ else
+ return GENMASK_ULL(30, 0);
}
/* Call with MSC lock held */
@@ -1138,10 +1190,24 @@ static void __ris_msmon_read(void *arg)
now = FIELD_GET(MSMON___VALUE, now);
break;
case mpam_feat_msmon_mbwu:
- now = mpam_read_monsel_reg(msc, MBWU);
- if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
- nrdy = now & MSMON___NRDY;
- now = FIELD_GET(MSMON___VALUE, now);
+ /*
+ * If long or lwd counters are supported, use them, else revert
+ * to the 31 bit counter.
+ */
+ if (mpam_ris_has_mbwu_long_counter(ris)) {
+ now = mpam_msc_read_mbwu_l(msc);
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY_L;
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
+ now = FIELD_GET(MSMON___LWD_VALUE, now);
+ else
+ now = FIELD_GET(MSMON___L_VALUE, now);
+ } else {
+ now = mpam_read_monsel_reg(msc, MBWU);
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY;
+ now = FIELD_GET(MSMON___VALUE, now);
+ }
if (nrdy)
break;
@@ -1433,8 +1499,13 @@ static int mpam_save_mbwu_state(void *arg)
cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
- val = mpam_read_monsel_reg(msc, MBWU);
- mpam_write_monsel_reg(msc, MBWU, 0);
+ if (mpam_ris_has_mbwu_long_counter(ris)) {
+ val = mpam_msc_read_mbwu_l(msc);
+ mpam_msc_zero_mbwu_l(msc);
+ } else {
+ val = mpam_read_monsel_reg(msc, MBWU);
+ mpam_write_monsel_reg(msc, MBWU, 0);
+ }
cfg->mon = i;
cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (29 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
` (36 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
resctrl expects to reset the bandwidth counters when the filesystem
is mounted.
To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 49 +++++++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 5 +++-
2 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8fbcf6eb946a..65c30ebfe001 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1155,9 +1155,11 @@ static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
static void __ris_msmon_read(void *arg)
{
bool nrdy = false;
+ bool config_mismatch;
struct mon_read *m = arg;
u64 now, overflow_val = 0;
struct mon_cfg *ctx = m->ctx;
+ bool reset_on_next_read = false;
struct mpam_msc_ris *ris = m->ris;
struct msmon_mbwu_state *mbwu_state;
struct mpam_props *rprops = &ris->props;
@@ -1172,6 +1174,14 @@ static void __ris_msmon_read(void *arg)
FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+ if (m->type == mpam_feat_msmon_mbwu) {
+ mbwu_state = &ris->mbwu_state[ctx->mon];
+ if (mbwu_state) {
+ reset_on_next_read = mbwu_state->reset_on_next_read;
+ mbwu_state->reset_on_next_read = false;
+ }
+ }
+
/*
* Read the existing configuration to avoid re-writing the same values.
* This saves waiting for 'nrdy' on subsequent reads.
@@ -1179,7 +1189,10 @@ static void __ris_msmon_read(void *arg)
read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
clean_msmon_ctl_val(&cur_ctl);
gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
- if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+ config_mismatch = cur_flt != flt_val ||
+ cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+ if (config_mismatch || reset_on_next_read)
write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
switch (m->type) {
@@ -1212,7 +1225,6 @@ static void __ris_msmon_read(void *arg)
if (nrdy)
break;
- mbwu_state = &ris->mbwu_state[ctx->mon];
if (!mbwu_state)
break;
@@ -1314,6 +1326,39 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
return err;
}
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+ int idx;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ if (!mpam_is_enabled())
+ return;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+ continue;
+
+ msc = vmsc->msc;
+ mpam_mon_sel_outer_lock(msc);
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+ continue;
+
+ if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+ continue;
+
+ ris->mbwu_state[ctx->mon].correction = 0;
+ ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+ mpam_mon_sel_inner_unlock(msc);
+ }
+ mpam_mon_sel_outer_unlock(msc);
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
{
u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f627b5f72a1..bbf0306abc82 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -297,10 +297,12 @@ struct mon_cfg {
/*
* Changes to enabled and cfg are protected by the msc->lock.
- * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ * Changes to reset_on_next_read, prev_val and correction are protected by the
+ * msc's mon_sel_lock.
*/
struct msmon_mbwu_state {
bool enabled;
+ bool reset_on_next_read;
struct mon_cfg cfg;
/* The value last read from the hardware. Used to detect overflow. */
@@ -410,6 +412,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (30 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 16:56 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
` (35 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich,
Jonathan Cameron
The bitmap reset code has been a source of bugs. Add a unit test.
This currently has to be built in, as the rest of the driver is
builtin.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/Kconfig | 13 ++++++
drivers/resctrl/mpam_devices.c | 4 ++
drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
3 files changed, 85 insertions(+)
create mode 100644 drivers/resctrl/test_mpam_devices.c
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index dff7b87280ab..f5e0609975e4 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
bool "MPAM driver for System IP, e,g. caches and memory controllers"
depends on ARM64_MPAM && EXPERT
+menu "ARM64 MPAM driver options"
+
config ARM64_MPAM_DRIVER_DEBUG
bool "Enable debug messages from the MPAM driver."
depends on ARM64_MPAM_DRIVER
help
Say yes here to enable debug messages from the MPAM driver.
+
+config MPAM_KUNIT_TEST
+ bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+ depends on KUNIT=y
+ default KUNIT_ALL_TESTS
+ help
+ Enable this option to run tests in the MPAM driver.
+
+ If unsure, say N.
+
+endmenu
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 65c30ebfe001..4cf5aae88c53 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2903,3 +2903,7 @@ static int __init mpam_msc_driver_init(void)
}
/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..8e9d6c88171c
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2024 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+ char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+ struct mpam_msc fake_msc;
+ u32 *test_result;
+
+ if (!buf)
+ return;
+
+ fake_msc.mapped_hwpage = buf;
+ fake_msc.mapped_hwpage_sz = SZ_16K;
+ cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+ mutex_init(&fake_msc.part_sel_lock);
+ mutex_lock(&fake_msc.part_sel_lock);
+
+ test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+ KUNIT_EXPECT_EQ(test, test_result[0], 1);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+ KUNIT_EXPECT_EQ(test, test_result[1], 1);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+ KUNIT_CASE(test_mpam_reset_msc_bitmap),
+ {}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+ .name = "mpam_devices_test_suite",
+ .test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (31 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 17:11 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (34 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.
Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 8 +-
drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
2 files changed, 329 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index bbf0306abc82..6e973be095f8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -18,6 +18,12 @@
DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
static inline bool mpam_is_enabled(void)
{
return static_branch_likely(&mpam_enabled);
@@ -209,7 +215,7 @@ struct mpam_props {
u16 dspri_wd;
u16 num_csu_mon;
u16 num_mbwu_mon;
-};
+} PACKED_FOR_KUNIT;
#define mpam_has_feature(_feat, x) ((1 << (_feat)) & (x)->features)
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 8e9d6c88171c..ef39696e7ff8 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,326 @@
#include <kunit/test.h>
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+ struct mpam_props parent = { 0 };
+ struct mpam_props child;
+
+ memset(&child, 0xff, sizeof(child));
+ __props_mismatch(&parent, &child, false);
+
+ memset(&child, 0, sizeof(child));
+ KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+ memset(&child, 0xff, sizeof(child));
+ __props_mismatch(&parent, &child, true);
+
+ KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+ /* o/` How deep is your stack? o/` */
+ struct list_head fake_classes_list;
+ struct mpam_class fake_class = { 0 };
+ struct mpam_component fake_comp1 = { 0 };
+ struct mpam_component fake_comp2 = { 0 };
+ struct mpam_vmsc fake_vmsc1 = { 0 };
+ struct mpam_vmsc fake_vmsc2 = { 0 };
+ struct mpam_msc fake_msc1 = { 0 };
+ struct mpam_msc fake_msc2 = { 0 };
+ struct mpam_msc_ris fake_ris1 = { 0 };
+ struct mpam_msc_ris fake_ris2 = { 0 };
+ struct platform_device fake_pdev = { 0 };
+
+#define RESET_FAKE_HIEARCHY() do { \
+ INIT_LIST_HEAD(&fake_classes_list); \
+ \
+ memset(&fake_class, 0, sizeof(fake_class)); \
+ fake_class.level = 3; \
+ fake_class.type = MPAM_CLASS_CACHE; \
+ INIT_LIST_HEAD_RCU(&fake_class.components); \
+ INIT_LIST_HEAD(&fake_class.classes_list); \
+ \
+ memset(&fake_comp1, 0, sizeof(fake_comp1)); \
+ memset(&fake_comp2, 0, sizeof(fake_comp2)); \
+ fake_comp1.comp_id = 1; \
+ fake_comp2.comp_id = 2; \
+ INIT_LIST_HEAD(&fake_comp1.vmsc); \
+ INIT_LIST_HEAD(&fake_comp1.class_list); \
+ INIT_LIST_HEAD(&fake_comp2.vmsc); \
+ INIT_LIST_HEAD(&fake_comp2.class_list); \
+ \
+ memset(&fake_vmsc1, 0, sizeof(fake_vmsc1)); \
+ memset(&fake_vmsc2, 0, sizeof(fake_vmsc2)); \
+ INIT_LIST_HEAD(&fake_vmsc1.ris); \
+ INIT_LIST_HEAD(&fake_vmsc1.comp_list); \
+ fake_vmsc1.msc = &fake_msc1; \
+ INIT_LIST_HEAD(&fake_vmsc2.ris); \
+ INIT_LIST_HEAD(&fake_vmsc2.comp_list); \
+ fake_vmsc2.msc = &fake_msc2; \
+ \
+ memset(&fake_ris1, 0, sizeof(fake_ris1)); \
+ memset(&fake_ris2, 0, sizeof(fake_ris2)); \
+ fake_ris1.ris_idx = 1; \
+ INIT_LIST_HEAD(&fake_ris1.msc_list); \
+ fake_ris2.ris_idx = 2; \
+ INIT_LIST_HEAD(&fake_ris2.msc_list); \
+ \
+ fake_msc1.pdev = &fake_pdev; \
+ fake_msc2.pdev = &fake_pdev; \
+ \
+ list_add(&fake_class.classes_list, &fake_classes_list); \
+} while (0)
+
+ RESET_FAKE_HIEARCHY();
+
+ mutex_lock(&mpam_list_lock);
+
+ /* One Class+Comp, two RIS in one vMSC with common features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = NULL;
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc1;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cpbm_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = NULL;
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc1;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cmax_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /* Multiple RIS within one MSC controlling the same resource can be mismatched */
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+ KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cpbm_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with non-overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cmax_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple RIS in different MSC can't the same resource, mismatched
+ * features can not be supported.
+ */
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with incompatible overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 5;
+ fake_ris2.props.cpbm_wd = 3;
+ fake_ris1.props.mbw_pbm_bits = 5;
+ fake_ris2.props.mbw_pbm_bits = 3;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple RIS in different MSC can't the same resource, mismatched
+ * features can not be supported.
+ */
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+ KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with overlapping features that need tweaking */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+ mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+ fake_ris1.props.bwa_wd = 5;
+ fake_ris2.props.bwa_wd = 3;
+ fake_ris1.props.cmax_wd = 5;
+ fake_ris2.props.cmax_wd = 3;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple RIS in different MSC can't the same resource, mismatched
+ * features can not be supported.
+ */
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class Two Comp with overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = &fake_class;
+ list_add(&fake_comp2.class_list, &fake_class.components);
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp2;
+ list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cpbm_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class Two Comp with non-overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = &fake_class;
+ list_add(&fake_comp2.class_list, &fake_class.components);
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp2;
+ list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cmax_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple components can't control the same resource, mismatched features can
+ * not be supported.
+ */
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+ mutex_unlock(&mpam_list_lock);
+
+#undef RESET_FAKE_HIEARCHY
+}
+
static void test_mpam_reset_msc_bitmap(struct kunit *test)
{
char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -57,6 +377,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
static struct kunit_case mpam_devices_test_cases[] = {
KUNIT_CASE(test_mpam_reset_msc_bitmap),
+ KUNIT_CASE(test_mpam_enable_merge_features),
+ KUNIT_CASE(test__props_mismatch),
{}
};
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 00/33] arm_mpam: Add basic mpam driver
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (32 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
` (33 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Hello,
This is just enough MPAM driver for the ACPI and DT pre-requisites.
It doesn't contain any of the resctrl code, meaning you can't actually drive it
from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
This will change once the user interface is connected up.
This is the initial group of patches that allows the resctrl code to be built
on top. Including that will increase the number of trees that may need to
coordinate, so breaking it up make sense.
The locking looks very strange - but is influenced by the 'mpam-fb' firmware
interface specification that is still alpha. That thing needs to wait for an
interrupt after every system register write, which significantly impacts the
driver. Some features just won't work, e.g. reading the monitor registers via
perf.
The aim is to not have to make invasive changes to the locking to support the
firmware interface, hence it looks strange from day-1.
I've not found a platform that can test all the behaviours around the monitors,
so this is where I'd expect the most bugs.
The MPAM spec that describes all the system and MMIO registers can be found here:
https://developer.arm.com/documentation/ddi0598/db/?lang=en
(Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
This document has the best overview)
The expectation is this will go via the arm64 tree.
This series is based on v6.17-rc2, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/rv1
The rest of the driver can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.17-rc2
What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/
This series was previously posted here:
[RFC] lore.kernel.org/r/20250711183648.30766-2-james.morse@arm.com
Bugs welcome,
Thanks,
James Morse (29):
cacheinfo: Expose the code to generate a cache-id from a device_node
drivers: base: cacheinfo: Add helper to find the cache size from
cpu+level
ACPI / PPTT: Add a helper to fill a cpumask from a processor container
ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear
levels
ACPI / PPTT: Find cache level by cache-id
ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
arm64: kconfig: Add Kconfig entry for MPAM
ACPI / MPAM: Parse the MPAM table
arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
arm_mpam: Add the class and component structures for ris firmware
described
arm_mpam: Add MPAM MSC register layout definitions
arm_mpam: Add cpuhp callbacks to probe MSC hardware
arm_mpam: Probe MSCs to find the supported partid/pmg values
arm_mpam: Add helpers for managing the locking around the mon_sel
registers
arm_mpam: Probe the hardware features resctrl supports
arm_mpam: Merge supported features during mpam_enable() into
mpam_class
arm_mpam: Reset MSC controls from cpu hp callbacks
arm_mpam: Add a helper to touch an MSC from any CPU
arm_mpam: Extend reset logic to allow devices to be reset any time
arm_mpam: Register and enable IRQs
arm_mpam: Use a static key to indicate when mpam is enabled
arm_mpam: Allow configuration to be applied and restored during cpu
online
arm_mpam: Probe and reset the rest of the features
arm_mpam: Add helpers to allocate monitors
arm_mpam: Add mpam_msmon_read() to read monitor value
arm_mpam: Track bandwidth counter state for overflow and power
management
arm_mpam: Add helper to reset saved mbwu state
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add kunit tests for props_mismatch()
Rob Herring (1):
dt-bindings: arm: Add MPAM MSC binding
Rohit Mathew (2):
arm_mpam: Probe for long/lwd mbwu counters
arm_mpam: Use long MBWU counters if supported
Shanker Donthineni (1):
arm_mpam: Add support for memory controller MSC on DT platforms
.../devicetree/bindings/arm/arm,mpam-msc.yaml | 200 ++
arch/arm64/Kconfig | 19 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/acpi/arm64/Kconfig | 3 +
drivers/acpi/arm64/Makefile | 1 +
drivers/acpi/arm64/mpam.c | 331 ++
drivers/acpi/pptt.c | 230 +-
drivers/acpi/tables.c | 2 +-
drivers/base/cacheinfo.c | 19 +-
drivers/resctrl/Kconfig | 24 +
drivers/resctrl/Makefile | 4 +
drivers/resctrl/mpam_devices.c | 2909 +++++++++++++++++
drivers/resctrl/mpam_internal.h | 692 ++++
drivers/resctrl/test_mpam_devices.c | 390 +++
include/linux/acpi.h | 26 +
include/linux/arm_mpam.h | 56 +
include/linux/cacheinfo.h | 16 +
18 files changed, 4911 insertions(+), 14 deletions(-)
create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
create mode 100644 drivers/acpi/arm64/mpam.c
create mode 100644 drivers/resctrl/Kconfig
create mode 100644 drivers/resctrl/Makefile
create mode 100644 drivers/resctrl/mpam_devices.c
create mode 100644 drivers/resctrl/mpam_internal.h
create mode 100644 drivers/resctrl/test_mpam_devices.c
create mode 100644 include/linux/arm_mpam.h
--
2.20.1
^ permalink raw reply [flat|nested] 130+ messages in thread
* [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (33 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
` (32 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM driver identifies caches by id for use with resctrl. It
needs to know the cache-id when probe-ing, but the value isn't set
in cacheinfo until device_initcall().
Expose the code that generates the cache-id. The parts of the MPAM
driver that run early can use this to set up the resctrl structures
before cacheinfo is ready in device_initcall().
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
* Renamed cache_of_get_id() cache_of_calculate_id().
---
drivers/base/cacheinfo.c | 19 +++++++++++++------
include/linux/cacheinfo.h | 1 +
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 613410705a47..f6289d142ba9 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
#define arch_compact_of_hwid(_x) (_x)
#endif
-static void cache_of_set_id(struct cacheinfo *this_leaf,
- struct device_node *cache_node)
+unsigned long cache_of_calculate_id(struct device_node *cache_node)
{
struct device_node *cpu;
- u32 min_id = ~0;
+ unsigned long min_id = ~0UL;
for_each_of_cpu_node(cpu) {
u64 id = of_get_cpu_hwid(cpu, 0);
@@ -219,15 +218,23 @@ static void cache_of_set_id(struct cacheinfo *this_leaf,
id = arch_compact_of_hwid(id);
if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
of_node_put(cpu);
- return;
+ return ~0UL;
}
if (match_cache_node(cpu, cache_node))
min_id = min(min_id, id);
}
- if (min_id != ~0) {
- this_leaf->id = min_id;
+ return min_id;
+}
+
+static void cache_of_set_id(struct cacheinfo *this_leaf,
+ struct device_node *cache_node)
+{
+ unsigned long id = cache_of_calculate_id(cache_node);
+
+ if (id != ~0UL) {
+ this_leaf->id = id;
this_leaf->attributes |= CACHE_ID;
}
}
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index c8f4f0a0b874..2dcbb69139e9 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,
#endif
const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
+unsigned long cache_of_calculate_id(struct device_node *np);
/*
* Get the cacheinfo structure for the cache associated with @cpu at
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (34 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
` (31 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM needs to know the size of a cache associated with a particular CPU.
The DT/ACPI agnostic way of doing this is to ask cacheinfo.
Add a helper to do this.
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
* Converted to kdoc.
* Simplified helper to use get_cpu_cacheinfo_level().
---
include/linux/cacheinfo.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2dcbb69139e9..e12d6f2c6a57 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
return ci ? ci->id : -1;
}
+/**
+ * get_cpu_cacheinfo_size() - Get the size of the cache.
+ * @cpu: The cpu that is associated with the cache.
+ * @level: The level of the cache as seen by @cpu.
+ *
+ * Callers must hold the cpuhp lock.
+ * Returns the cache-size on success, or 0 for an error.
+ */
+static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
+{
+ struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
+
+ return ci ? ci->size : 0;
+}
+
#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
#define use_arch_cache_info() (true)
#else
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (35 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
` (30 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The PPTT describes CPUs and caches, as well as processor containers.
The ACPI table for MPAM describes the set of CPUs that can access an MSC
with the UID of a processor container.
Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.
CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
* Added missing : in kernel-doc
* Made helper return void as this never actually returns an error.
---
drivers/acpi/pptt.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 3 ++
2 files changed, 89 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..4791ca2bdfac 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
return NULL;
}
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
+ * @table_hdr: A reference to the PPTT table.
+ * @parent_node: A pointer to the processor node in the @table_hdr.
+ * @cpus: A cpumask to fill with the CPUs below @parent_node.
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+ struct acpi_pptt_processor *parent_node,
+ cpumask_t *cpus)
+{
+ struct acpi_pptt_processor *cpu_node;
+ u32 acpi_id;
+ int cpu;
+
+ cpumask_clear(cpus);
+
+ for_each_possible_cpu(cpu) {
+ acpi_id = get_acpi_id_for_cpu(cpu);
+ cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+ while (cpu_node) {
+ if (cpu_node == parent_node) {
+ cpumask_set_cpu(cpu, cpus);
+ break;
+ }
+ cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+ }
+ }
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ * processor containers
+ * @acpi_cpu_id: The UID of the processor container.
+ * @cpus: The resulting CPU mask.
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
+ * Container, they may exist purely to describe a Private resource. CPUs
+ * have to be leaves, so a Processor Container is a non-leaf that has the
+ * 'ACPI Processor ID valid' flag set.
+ *
+ * Return: 0 for a complete walk, or an error if the mask is incomplete.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+ struct acpi_pptt_processor *cpu_node;
+ struct acpi_table_header *table_hdr;
+ struct acpi_subtable_header *entry;
+ unsigned long table_end;
+ acpi_status status;
+ bool leaf_flag;
+ u32 proc_sz;
+
+ cpumask_clear(cpus);
+
+ status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
+ if (ACPI_FAILURE(status))
+ return;
+
+ table_end = (unsigned long)table_hdr + table_hdr->length;
+ entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+ sizeof(struct acpi_table_pptt));
+ proc_sz = sizeof(struct acpi_pptt_processor);
+ while ((unsigned long)entry + proc_sz <= table_end) {
+ cpu_node = (struct acpi_pptt_processor *)entry;
+ if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
+ cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
+ leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
+ if (!leaf_flag) {
+ if (cpu_node->acpi_processor_id == acpi_cpu_id)
+ acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+ }
+ }
+ entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+ entry->length);
+ }
+
+ acpi_put_table(table_hdr);
+}
+
static u8 acpi_cache_type(enum cache_type type)
{
switch (type) {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1c5bb1e887cd..f97a9ff678cc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
int find_acpi_cpu_topology_cluster(unsigned int cpu);
int find_acpi_cpu_topology_package(unsigned int cpu);
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
#else
static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
{
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
{
return -EINVAL;
}
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+ cpumask_t *cpus) { }
#endif
void acpi_arch_init(void);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (36 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
` (29 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
acpi_count_levels() passes the number of levels back via a pointer argument.
It also passes this to acpi_find_cache_level() as the starting_level, and
preserves this value as it walks up the cpu_node tree counting the levels.
This means the caller must initialise 'levels' due to acpi_count_levels()
internals. The only caller acpi_get_cache_info() happens to have already
initialised levels to zero, which acpi_count_levels() depends on to get the
correct result.
Two results are passed back from acpi_count_levels(), unlike split_levels,
levels is not optional.
Split these two results up. The mandatory 'levels' is always returned,
which hides the internal details from the caller, and avoids having
duplicated initialisation in all callers. split_levels remains an
optional argument passed back.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Made acpi_count_levels() return the levels value.
---
drivers/acpi/pptt.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 4791ca2bdfac..8f9b9508acba 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
* levels and split cache levels (data/instruction).
* @table_hdr: Pointer to the head of the PPTT table
* @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
* @split_levels: Number of split cache levels (data/instruction) if
- * success. Can by NULL.
+ * success. Can be NULL.
*
+ * Returns number of levels.
* Given a processor node containing a processing unit, walk into it and count
* how many levels exist solely for it, and then walk up each level until we hit
* the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
* split cache levels (data/instruction) that exist at each level on the way
* up.
*/
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
- struct acpi_pptt_processor *cpu_node,
- unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+ struct acpi_pptt_processor *cpu_node,
+ unsigned int *split_levels)
{
+ int starting_level = 0;
+
do {
- acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+ acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
} while (cpu_node);
+
+ return starting_level;
}
/**
@@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
if (!cpu_node)
return -ENOENT;
- acpi_count_levels(table, cpu_node, levels, split_levels);
+ *levels = acpi_count_levels(table, cpu_node, split_levels);
pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
*levels, split_levels ? *split_levels : -1);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (37 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
` (28 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.
Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.
Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.
Add a cleanup based free-ing mechanism for acpi_get_table().
CC: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* acpi_count_levels() now returns a value.
* Converted the table-get stuff to use Jonathan's cleanup helper.
* Dropped Sudeep's Review tag due to the cleanup change.
---
drivers/acpi/pptt.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 17 ++++++++++++
2 files changed, 81 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 8f9b9508acba..660457644a5b 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
ACPI_PPTT_ACPI_IDENTICAL);
}
+
+/**
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the unified cache
+ *
+ * Determine the level relative to any CPU for the unified cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group unified caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * If one CPUs L2 is shared with another as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+ u32 acpi_cpu_id;
+ int level, cpu, num_levels;
+ struct acpi_pptt_cache *cache;
+ struct acpi_pptt_cache_v1 *cache_v1;
+ struct acpi_pptt_processor *cpu_node;
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+ if (IS_ERR(table))
+ return PTR_ERR(table);
+
+ if (table->revision < 3)
+ return -ENOENT;
+
+ /*
+ * If we found the cache first, we'd still need to walk from each CPU
+ * to find the level...
+ */
+ for_each_possible_cpu(cpu) {
+ acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+ cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+ if (!cpu_node)
+ return -ENOENT;
+ num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+ /* Start at 1 for L1 */
+ for (level = 1; level <= num_levels; level++) {
+ cache = acpi_find_cache_node(table, acpi_cpu_id,
+ ACPI_PPTT_CACHE_TYPE_UNIFIED,
+ level, &cpu_node);
+ if (!cache)
+ continue;
+
+ cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+ cache,
+ sizeof(struct acpi_pptt_cache));
+
+ if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+ cache_v1->cache_id == cache_id)
+ return level;
+ }
+ }
+
+ return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f97a9ff678cc..30c10b1dcdb2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
#ifndef _LINUX_ACPI_H
#define _LINUX_ACPI_H
+#include <linux/cleanup.h>
#include <linux/errno.h>
#include <linux/ioport.h> /* for struct resource */
#include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
void acpi_table_init_complete (void);
int acpi_table_init (void);
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+ struct acpi_table_header *table;
+ int status = acpi_get_table(signature, instance, &table);
+
+ if (ACPI_FAILURE(status))
+ return ERR_PTR(-ENOENT);
+ return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
int __init_or_acpilib acpi_table_parse_entries(char *id,
unsigned long table_size, int entry_id,
@@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
int find_acpi_cpu_topology_package(unsigned int cpu);
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
#else
static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
{
@@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
}
static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+ return -EINVAL;
+}
#endif
void acpi_arch_init(void);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (38 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
` (27 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
MPAM identifies CPUs by the cache_id in the PPTT cache structure.
The driver needs to know which CPUs are associated with the cache,
the CPUs may not all be online, so cacheinfo does not have the
information.
Add a helper to pull this information out of the PPTT.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
---
Changes since RFC:
* acpi_count_levels() now returns a value.
* Converted the table-get stuff to use Jonathan's cleanup helper.
* Dropped Sudeep's Review tag due to the cleanup change.
---
drivers/acpi/pptt.c | 62 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 6 +++++
2 files changed, 68 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 660457644a5b..cb93a9a7f9b6 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
return -ENOENT;
}
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ * specified cache
+ * @cache_id: The id field of the unified cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+ u32 acpi_cpu_id;
+ int level, cpu, num_levels;
+ struct acpi_pptt_cache *cache;
+ struct acpi_pptt_cache_v1 *cache_v1;
+ struct acpi_pptt_processor *cpu_node;
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+ cpumask_clear(cpus);
+
+ if (IS_ERR(table))
+ return -ENOENT;
+
+ if (table->revision < 3)
+ return -ENOENT;
+
+ /*
+ * If we found the cache first, we'd still need to walk from each cpu.
+ */
+ for_each_possible_cpu(cpu) {
+ acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+ cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+ if (!cpu_node)
+ return 0;
+ num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+ /* Start at 1 for L1 */
+ for (level = 1; level <= num_levels; level++) {
+ cache = acpi_find_cache_node(table, acpi_cpu_id,
+ ACPI_PPTT_CACHE_TYPE_UNIFIED,
+ level, &cpu_node);
+ if (!cache)
+ continue;
+
+ cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+ cache,
+ sizeof(struct acpi_pptt_cache));
+
+ if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+ cache_v1->cache_id == cache_id)
+ cpumask_set_cpu(cpu, cpus);
+ }
+ }
+
+ return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30c10b1dcdb2..4ad08f5f1d83 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1555,6 +1555,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
#else
static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
{
@@ -1582,6 +1583,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
{
return -EINVAL;
}
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+ cpumask_t *cpus)
+{
+ return -EINVAL;
+}
#endif
void acpi_arch_init(void);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (39 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
` (26 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it, as MPAM is only found on arm64
platforms, that is where the Kconfig option makes the most sense.
This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and registering the CPUs
properties with the MPAM driver.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
arch/arm64/Kconfig | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..658e47fc0c5a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses.
+config ARM64_MPAM
+ bool "Enable support for MPAM"
+ help
+ Memory Partitioning and Monitoring is an optional extension
+ that allows the CPUs to mark load and store transactions with
+ labels for partition-id and performance-monitoring-group.
+ System components, such as the caches, can use the partition-id
+ to apply a performance policy. MPAM monitors can use the
+ partition-id and performance-monitoring-group to measure the
+ cache occupancy or data throughput.
+
+ Use of this extension requires CPU support, support in the
+ memory system components (MSC), and a description from firmware
+ of where the MSC are in the address space.
+
+ MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
endmenu # "ARMv8.4 architectural features"
menu "ARMv8.5 architectural features"
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (40 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
` (25 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.
CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Used DEFINE_RES_IRQ_NAMED() and friends macros.
* Additional error handling.
* Check for zero sized MSC.
* Allow table revisions greater than 1. (no spec for revision 0!)
* Use cleanup helpers to retrive ACPI tables, which allows some functions
to be folded together.
---
arch/arm64/Kconfig | 1 +
drivers/acpi/arm64/Kconfig | 3 +
drivers/acpi/arm64/Makefile | 1 +
drivers/acpi/arm64/mpam.c | 331 ++++++++++++++++++++++++++++++++++++
drivers/acpi/tables.c | 2 +-
include/linux/arm_mpam.h | 46 +++++
6 files changed, 383 insertions(+), 1 deletion(-)
create mode 100644 drivers/acpi/arm64/mpam.c
create mode 100644 include/linux/arm_mpam.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 658e47fc0c5a..e51ccf1da102 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
config ARM64_MPAM
bool "Enable support for MPAM"
+ select ACPI_MPAM if ACPI
help
Memory Partitioning and Monitoring is an optional extension
that allows the CPUs to mark load and store transactions with
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
config ACPI_APMT
bool
+
+config ACPI_MPAM
+ bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) += apmt.o
obj-$(CONFIG_ACPI_FFH) += ffh.o
obj-$(CONFIG_ACPI_GTDT) += gtdt.o
obj-$(CONFIG_ACPI_IORT) += iort.o
+obj-$(CONFIG_ACPI_MPAM) += mpam.o
obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
obj-$(CONFIG_ARM_AMBA) += amba.o
obj-y += dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..e55fc2729ac5
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE_MASK BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED 0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID BIT(4)
+
+static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
+ int *irq, u32 processor_container_uid)
+{
+ int sense;
+
+ if (!intid)
+ return false;
+
+ if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
+ ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+ return false;
+
+ sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
+
+ /*
+ * If the GSI is in the GIC's PPI range, try and create a partitioned
+ * percpu interrupt.
+ */
+ if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
+ pr_err_once("Partitioned interrupts not supported\n");
+ return false;
+ }
+
+ *irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
+ if (*irq <= 0) {
+ pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
+ intid);
+ return false;
+ }
+
+ return true;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+ struct acpi_mpam_msc_node *tbl_msc,
+ struct resource *res, int *res_idx)
+{
+ u32 flags, aff;
+ int irq;
+
+ flags = tbl_msc->overflow_interrupt_flags;
+ if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+ flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+ aff = tbl_msc->overflow_interrupt_affinity;
+ else
+ aff = ~0;
+ if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
+ res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+ flags = tbl_msc->error_interrupt_flags;
+ if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+ flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+ aff = tbl_msc->error_interrupt_affinity;
+ else
+ aff = ~0;
+ if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
+ res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+ struct acpi_mpam_resource_node *res)
+{
+ int level, nid;
+ u32 cache_id;
+
+ switch (res->locator_type) {
+ case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+ cache_id = res->locator.cache_locator.cache_reference;
+ level = find_acpi_cache_level_from_id(cache_id);
+ if (level <= 0) {
+ pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
+ return -EINVAL;
+ }
+ return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+ level, cache_id);
+ case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+ nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+ if (nid == NUMA_NO_NODE)
+ nid = 0;
+ return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+ 255, nid);
+ default:
+ /* These get discovered later and treated as unknown */
+ return 0;
+ }
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+ struct acpi_mpam_msc_node *tbl_msc)
+{
+ int i, err;
+ struct acpi_mpam_resource_node *resources;
+
+ resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
+ for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+ err = acpi_mpam_parse_resource(msc, &resources[i]);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+ struct platform_device *pdev,
+ u32 *acpi_id)
+{
+ bool acpi_id_valid = false;
+ struct acpi_device *buddy;
+ char hid[16], uid[16];
+ int err;
+
+ memset(&hid, 0, sizeof(hid));
+ memcpy(hid, &tbl_msc->hardware_id_linked_device,
+ sizeof(tbl_msc->hardware_id_linked_device));
+
+ if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+ *acpi_id = tbl_msc->instance_id_linked_device;
+ acpi_id_valid = true;
+ }
+
+ err = snprintf(uid, sizeof(uid), "%u",
+ tbl_msc->instance_id_linked_device);
+ if (err >= sizeof(uid))
+ return acpi_id_valid;
+
+ buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+ if (buddy)
+ device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+ return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+ enum mpam_msc_iface *iface)
+{
+ switch (tbl_msc->interface_type) {
+ case 0:
+ *iface = MPAM_IFACE_MMIO;
+ return 0;
+ case 0xa:
+ *iface = MPAM_IFACE_PCC;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int __init acpi_mpam_parse(void)
+{
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+ char *table_end, *table_offset = (char *)(table + 1);
+ struct property_entry props[4]; /* needs a sentinel */
+ struct acpi_mpam_msc_node *tbl_msc;
+ int next_res, next_prop, err = 0;
+ struct acpi_device *companion;
+ struct platform_device *pdev;
+ enum mpam_msc_iface iface;
+ struct resource res[3];
+ char uid[16];
+ u32 acpi_id;
+
+ if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
+ return 0;
+
+ if (IS_ERR(table))
+ return 0;
+
+ if (table->revision < 1)
+ return 0;
+
+ table_end = (char *)table + table->length;
+
+ while (table_offset < table_end) {
+ tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+ table_offset += tbl_msc->length;
+
+ /*
+ * If any of the reserved fields are set, make no attempt to
+ * parse the msc structure. This will prevent the driver from
+ * probing all the MSC, meaning it can't discover the system
+ * wide supported partid and pmg ranges. This avoids whatever
+ * this MSC is truncating the partids and creating a screaming
+ * error interrupt.
+ */
+ if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
+ continue;
+
+ if (!tbl_msc->mmio_size)
+ continue;
+
+ if (decode_interface_type(tbl_msc, &iface))
+ continue;
+
+ next_res = 0;
+ next_prop = 0;
+ memset(res, 0, sizeof(res));
+ memset(props, 0, sizeof(props));
+
+ pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
+ if (!pdev) {
+ err = -ENOMEM;
+ break;
+ }
+
+ if (tbl_msc->length < sizeof(*tbl_msc)) {
+ err = -EINVAL;
+ break;
+ }
+
+ /* Some power management is described in the namespace: */
+ err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+ if (err > 0 && err < sizeof(uid)) {
+ companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+ if (companion)
+ ACPI_COMPANION_SET(&pdev->dev, companion);
+ }
+
+ if (iface == MPAM_IFACE_MMIO) {
+ res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+ tbl_msc->mmio_size,
+ "MPAM:MSC");
+ } else if (iface == MPAM_IFACE_PCC) {
+ props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+ tbl_msc->base_address);
+ next_prop++;
+ }
+
+ acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+ err = platform_device_add_resources(pdev, res, next_res);
+ if (err)
+ break;
+
+ props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+ tbl_msc->max_nrdy_usec);
+
+ /*
+ * The MSC's CPU affinity is described via its linked power
+ * management device, but only if it points at a Processor or
+ * Processor Container.
+ */
+ if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
+ props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
+ acpi_id);
+ }
+
+ err = device_create_managed_software_node(&pdev->dev, props,
+ NULL);
+ if (err)
+ break;
+
+ /* Come back later if you want the RIS too */
+ err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+ if (err)
+ break;
+
+ err = platform_device_add(pdev);
+ if (err)
+ break;
+ }
+
+ if (err)
+ platform_device_put(pdev);
+
+ return err;
+}
+
+int acpi_mpam_count_msc(void)
+{
+ struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+ char *table_end, *table_offset = (char *)(table + 1);
+ struct acpi_mpam_msc_node *tbl_msc;
+ int count = 0;
+
+ if (IS_ERR(table))
+ return 0;
+
+ if (table->revision < 1)
+ return 0;
+
+ tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+ table_end = (char *)table + table->length;
+
+ while (table_offset < table_end) {
+ if (!tbl_msc->mmio_size)
+ continue;
+
+ if (tbl_msc->length < sizeof(*tbl_msc))
+ return -EINVAL;
+
+ count++;
+
+ table_offset += tbl_msc->length;
+ tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+ }
+
+ return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index fa9bb8c8ce95..835e3795ede3 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
- ACPI_SIG_NBFT };
+ ACPI_SIG_NBFT, ACPI_SIG_MPAM };
#define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..0edefa6ba019
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+ MPAM_IFACE_MMIO, /* a real MPAM MSC */
+ MPAM_IFACE_PCC, /* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+ MPAM_CLASS_CACHE, /* Well known caches, e.g. L2 */
+ MPAM_CLASS_MEMORY, /* Main memory */
+ MPAM_CLASS_UNKNOWN, /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+/* Parse the ACPI description of resources entries for this MSC. */
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+ struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+ struct acpi_mpam_msc_node *tbl_msc)
+{
+ return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id,
+ int component_id)
+{
+ return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (41 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
` (24 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rob Herring <robh@kernel.org>
The binding is designed around the assumption that an MSC will be a
sub-block of something else such as a memory controller, cache controller,
or IOMMU. However, it's certainly possible a design does not have that
association or has a mixture of both, so the binding illustrates how we can
support that with RIS child nodes.
A key part of MPAM is we need to know about all of the MSCs in the system
before it can be enabled. This drives the need for the genericish
'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
until a h/w specific driver potentially enables the h/w.
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Syntax(?) corrections supplied by Rob.
* Culled some context in the example.
---
.../devicetree/bindings/arm/arm,mpam-msc.yaml | 200 ++++++++++++++++++
1 file changed, 200 insertions(+)
create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
new file mode 100644
index 000000000000..d984817b3385
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
@@ -0,0 +1,200 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/arm,mpam-msc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
+
+description: |
+ The Arm MPAM specification can be found here:
+
+ https://developer.arm.com/documentation/ddi0598/latest
+
+maintainers:
+ - Rob Herring <robh@kernel.org>
+
+properties:
+ compatible:
+ items:
+ - const: arm,mpam-msc # Further details are discoverable
+ - const: arm,mpam-memory-controller-msc
+
+ reg:
+ maxItems: 1
+ description: A memory region containing registers as defined in the MPAM
+ specification.
+
+ interrupts:
+ minItems: 1
+ items:
+ - description: error (optional)
+ - description: overflow (optional, only for monitoring)
+
+ interrupt-names:
+ oneOf:
+ - items:
+ - enum: [ error, overflow ]
+ - items:
+ - const: error
+ - const: overflow
+
+ arm,not-ready-us:
+ description: The maximum time in microseconds for monitoring data to be
+ accurate after a settings change. For more information, see the
+ Not-Ready (NRDY) bit description in the MPAM specification.
+
+ numa-node-id: true # see NUMA binding
+
+ '#address-cells':
+ const: 1
+
+ '#size-cells':
+ const: 0
+
+patternProperties:
+ '^ris@[0-9a-f]$':
+ type: object
+ additionalProperties: false
+ description:
+ RIS nodes for each RIS in an MSC. These nodes are required for each RIS
+ implementing known MPAM controls
+
+ properties:
+ compatible:
+ enum:
+ # Bulk storage for cache
+ - arm,mpam-cache
+ # Memory bandwidth
+ - arm,mpam-memory
+
+ reg:
+ minimum: 0
+ maximum: 0xf
+
+ cpus:
+ description:
+ Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
+ device's affinity is used.
+
+ arm,mpam-device:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ By default, the MPAM enabled device associated with a RIS is the MSC's
+ parent node. It is possible for each RIS to be associated with different
+ devices in which case 'arm,mpam-device' should be used.
+
+ required:
+ - compatible
+ - reg
+
+required:
+ - compatible
+ - reg
+
+dependencies:
+ interrupts: [ interrupt-names ]
+
+additionalProperties: false
+
+examples:
+ - |
+ L3: cache-controller@30000000 {
+ compatible = "arm,dsu-l3-cache", "cache";
+ cache-level = <3>;
+ cache-unified;
+
+ ranges = <0x0 0x30000000 0x800000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ msc@10000 {
+ compatible = "arm,mpam-msc";
+
+ /* CPU affinity implied by parent cache node's */
+ reg = <0x10000 0x2000>;
+ interrupts = <1>, <2>;
+ interrupt-names = "error", "overflow";
+ arm,not-ready-us = <1>;
+ };
+ };
+
+ mem: memory-controller@20000 {
+ compatible = "foo,a-memory-controller";
+ reg = <0x20000 0x1000>;
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ msc@21000 {
+ compatible = "arm,mpam-memory-controller-msc", "arm,mpam-msc";
+ reg = <0x21000 0x1000>;
+ interrupts = <3>;
+ interrupt-names = "error";
+ arm,not-ready-us = <1>;
+ numa-node-id = <1>;
+ };
+ };
+
+ iommu@40000 {
+ reg = <0x40000 0x1000>;
+
+ ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ msc@41000 {
+ compatible = "arm,mpam-msc";
+ reg = <0 0x1000>;
+ interrupts = <5>, <6>;
+ interrupt-names = "error", "overflow";
+ arm,not-ready-us = <1>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ris@2 {
+ compatible = "arm,mpam-cache";
+ reg = <0>;
+ // TODO: How to map to device(s)?
+ };
+ };
+ };
+
+ msc@80000 {
+ compatible = "foo,a-standalone-msc";
+ reg = <0x80000 0x1000>;
+
+ clocks = <&clks 123>;
+
+ ranges;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ msc@10000 {
+ compatible = "arm,mpam-msc";
+
+ reg = <0x10000 0x2000>;
+ interrupts = <7>;
+ interrupt-names = "overflow";
+ arm,not-ready-us = <1>;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ris@0 {
+ compatible = "arm,mpam-cache";
+ reg = <0>;
+ arm,mpam-device = <&L2_0>;
+ };
+
+ ris@1 {
+ compatible = "arm,mpam-memory";
+ reg = <1>;
+ arm,mpam-device = <&mem>;
+ };
+ };
+ };
+
+...
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (42 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
` (23 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.
Start with driver probe/remove and mapping the MSC.
CC: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Check for status=broken DT devices.
* Moved all the files around.
* Made Kconfig symbols depend on EXPERT
---
arch/arm64/Kconfig | 1 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/resctrl/Kconfig | 11 ++
drivers/resctrl/Makefile | 4 +
drivers/resctrl/mpam_devices.c | 336 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 62 ++++++
7 files changed, 417 insertions(+)
create mode 100644 drivers/resctrl/Kconfig
create mode 100644 drivers/resctrl/Makefile
create mode 100644 drivers/resctrl/mpam_devices.c
create mode 100644 drivers/resctrl/mpam_internal.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e51ccf1da102..ea3c54e04275 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
config ARM64_MPAM
bool "Enable support for MPAM"
+ select ARM64_MPAM_DRIVER
select ACPI_MPAM if ACPI
help
Memory Partitioning and Monitoring is an optional extension
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
source "drivers/cdx/Kconfig"
+source "drivers/resctrl/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index b5749cf67044..f41cf4eddeba 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,5 +194,6 @@ obj-$(CONFIG_HTE) += hte/
obj-$(CONFIG_DRM_ACCEL) += accel/
obj-$(CONFIG_CDX_BUS) += cdx/
obj-$(CONFIG_DPLL) += dpll/
+obj-y += resctrl/
obj-$(CONFIG_S390) += s390/
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..dff7b87280ab
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,11 @@
+# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
+# CPU resources, not containers or cgroups etc.
+config ARM64_MPAM_DRIVER
+ bool "MPAM driver for System IP, e,g. caches and memory controllers"
+ depends on ARM64_MPAM && EXPERT
+
+config ARM64_MPAM_DRIVER_DEBUG
+ bool "Enable debug messages from the MPAM driver."
+ depends on ARM64_MPAM_DRIVER
+ help
+ Say yes here to enable debug messages from the MPAM driver.
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..92b48fa20108
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
+mpam-y += mpam_devices.o
+
+cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..a0d9a699a6e7
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+#include <acpi/pcc.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+static struct srcu_struct mpam_srcu;
+
+/* MPAM isn't available until all the MSC have been probed. */
+static u32 mpam_num_msc;
+
+static void mpam_discovery_complete(void)
+{
+ pr_err("Discovered all MSC\n");
+}
+
+static int mpam_dt_count_msc(void)
+{
+ int count = 0;
+ struct device_node *np;
+
+ for_each_compatible_node(np, NULL, "arm,mpam-msc") {
+ if (of_device_is_available(np))
+ count++;
+ }
+
+ return count;
+}
+
+static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
+ u32 ris_idx)
+{
+ int err = 0;
+ u32 level = 0;
+ unsigned long cache_id;
+ struct device_node *cache;
+
+ do {
+ if (of_device_is_compatible(np, "arm,mpam-cache")) {
+ cache = of_parse_phandle(np, "arm,mpam-device", 0);
+ if (!cache) {
+ pr_err("Failed to read phandle\n");
+ break;
+ }
+ } else if (of_device_is_compatible(np->parent, "cache")) {
+ cache = of_node_get(np->parent);
+ } else {
+ /* For now, only caches are supported */
+ cache = NULL;
+ break;
+ }
+
+ err = of_property_read_u32(cache, "cache-level", &level);
+ if (err) {
+ pr_err("Failed to read cache-level\n");
+ break;
+ }
+
+ cache_id = cache_of_calculate_id(cache);
+ if (cache_id == ~0UL) {
+ err = -ENOENT;
+ break;
+ }
+
+ err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
+ cache_id);
+ } while (0);
+ of_node_put(cache);
+
+ return err;
+}
+
+static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
+{
+ int err, num_ris = 0;
+ const u32 *ris_idx_p;
+ struct device_node *iter, *np;
+
+ np = msc->pdev->dev.of_node;
+ for_each_child_of_node(np, iter) {
+ ris_idx_p = of_get_property(iter, "reg", NULL);
+ if (ris_idx_p) {
+ num_ris++;
+ err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
+ if (err) {
+ of_node_put(iter);
+ return err;
+ }
+ }
+ }
+
+ if (!num_ris)
+ mpam_dt_parse_resource(msc, np, 0);
+
+ return err;
+}
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * the corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+ struct device_node *parent;
+ u32 affinity_id;
+ int err;
+
+ if (!acpi_disabled) {
+ err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+ &affinity_id);
+ if (err)
+ cpumask_copy(&msc->accessibility, cpu_possible_mask);
+ else
+ acpi_pptt_get_cpus_from_container(affinity_id,
+ &msc->accessibility);
+
+ return 0;
+ }
+
+ /* This depends on the path to of_node */
+ parent = of_get_parent(msc->pdev->dev.of_node);
+ if (parent == of_root) {
+ cpumask_copy(&msc->accessibility, cpu_possible_mask);
+ err = 0;
+ } else {
+ err = -EINVAL;
+ pr_err("Cannot determine accessibility of MSC: %s\n",
+ dev_name(&msc->pdev->dev));
+ }
+ of_node_put(parent);
+
+ return err;
+}
+
+static int fw_num_msc;
+
+static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
+{
+ /* TODO: wake up tasks blocked on this MSC's PCC channel */
+}
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+ struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+ if (!msc)
+ return;
+
+ mutex_lock(&mpam_list_lock);
+ mpam_num_msc--;
+ platform_set_drvdata(pdev, NULL);
+ list_del_rcu(&msc->glbl_list);
+ synchronize_srcu(&mpam_srcu);
+ devm_kfree(&pdev->dev, msc);
+ mutex_unlock(&mpam_list_lock);
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+ int err;
+ struct mpam_msc *msc;
+ struct resource *msc_res;
+ void *plat_data = pdev->dev.platform_data;
+
+ mutex_lock(&mpam_list_lock);
+ do {
+ msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+ if (!msc) {
+ err = -ENOMEM;
+ break;
+ }
+
+ mutex_init(&msc->probe_lock);
+ mutex_init(&msc->part_sel_lock);
+ mutex_init(&msc->outer_mon_sel_lock);
+ raw_spin_lock_init(&msc->inner_mon_sel_lock);
+ msc->id = mpam_num_msc++;
+ msc->pdev = pdev;
+ INIT_LIST_HEAD_RCU(&msc->glbl_list);
+ INIT_LIST_HEAD_RCU(&msc->ris);
+
+ err = update_msc_accessibility(msc);
+ if (err)
+ break;
+ if (cpumask_empty(&msc->accessibility)) {
+ pr_err_once("msc:%u is not accessible from any CPU!",
+ msc->id);
+ err = -EINVAL;
+ break;
+ }
+
+ if (device_property_read_u32(&pdev->dev, "pcc-channel",
+ &msc->pcc_subspace_id))
+ msc->iface = MPAM_IFACE_MMIO;
+ else
+ msc->iface = MPAM_IFACE_PCC;
+
+ if (msc->iface == MPAM_IFACE_MMIO) {
+ void __iomem *io;
+
+ io = devm_platform_get_and_ioremap_resource(pdev, 0,
+ &msc_res);
+ if (IS_ERR(io)) {
+ pr_err("Failed to map MSC base address\n");
+ err = PTR_ERR(io);
+ break;
+ }
+ msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+ msc->mapped_hwpage = io;
+ } else if (msc->iface == MPAM_IFACE_PCC) {
+ msc->pcc_cl.dev = &pdev->dev;
+ msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
+ msc->pcc_cl.tx_block = false;
+ msc->pcc_cl.tx_tout = 1000; /* 1s */
+ msc->pcc_cl.knows_txdone = false;
+
+ msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
+ msc->pcc_subspace_id);
+ if (IS_ERR(msc->pcc_chan)) {
+ pr_err("Failed to request MSC PCC channel\n");
+ err = PTR_ERR(msc->pcc_chan);
+ break;
+ }
+ }
+
+ list_add_rcu(&msc->glbl_list, &mpam_all_msc);
+ platform_set_drvdata(pdev, msc);
+ } while (0);
+ mutex_unlock(&mpam_list_lock);
+
+ if (!err) {
+ /* Create RIS entries described by firmware */
+ if (!acpi_disabled)
+ err = acpi_mpam_parse_resources(msc, plat_data);
+ else
+ err = mpam_dt_parse_resources(msc, plat_data);
+ }
+
+ if (!err && fw_num_msc == mpam_num_msc)
+ mpam_discovery_complete();
+
+ if (err && msc)
+ mpam_msc_drv_remove(pdev);
+
+ return err;
+}
+
+static const struct of_device_id mpam_of_match[] = {
+ { .compatible = "arm,mpam-msc", },
+ {},
+};
+MODULE_DEVICE_TABLE(of, mpam_of_match);
+
+static struct platform_driver mpam_msc_driver = {
+ .driver = {
+ .name = "mpam_msc",
+ .of_match_table = of_match_ptr(mpam_of_match),
+ },
+ .probe = mpam_msc_drv_probe,
+ .remove = mpam_msc_drv_remove,
+};
+
+/*
+ * MSC that are hidden under caches are not created as platform devices
+ * as there is no cache driver. Caches are also special-cased in
+ * update_msc_accessibility().
+ */
+static void mpam_dt_create_foundling_msc(void)
+{
+ int err;
+ struct device_node *cache;
+
+ for_each_compatible_node(cache, NULL, "cache") {
+ err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
+ if (err)
+ pr_err("Failed to create MSC devices under caches\n");
+ }
+}
+
+static int __init mpam_msc_driver_init(void)
+{
+ if (!system_supports_mpam())
+ return -EOPNOTSUPP;
+
+ init_srcu_struct(&mpam_srcu);
+
+ if (!acpi_disabled)
+ fw_num_msc = acpi_mpam_count_msc();
+ else
+ fw_num_msc = mpam_dt_count_msc();
+
+ if (fw_num_msc <= 0) {
+ pr_err("No MSC devices found in firmware\n");
+ return -EINVAL;
+ }
+
+ if (acpi_disabled)
+ mpam_dt_create_foundling_msc();
+
+ return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..07e0f240eaca
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2024 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/resctrl.h>
+#include <linux/sizes.h>
+
+struct mpam_msc {
+ /* member of mpam_all_msc */
+ struct list_head glbl_list;
+
+ int id;
+ struct platform_device *pdev;
+
+ /* Not modified after mpam_is_enabled() becomes true */
+ enum mpam_msc_iface iface;
+ u32 pcc_subspace_id;
+ struct mbox_client pcc_cl;
+ struct pcc_mbox_chan *pcc_chan;
+ u32 nrdy_usec;
+ cpumask_t accessibility;
+
+ /*
+ * probe_lock is only take during discovery. After discovery these
+ * properties become read-only and the lists are protected by SRCU.
+ */
+ struct mutex probe_lock;
+ unsigned long ris_idxs[128 / BITS_PER_LONG];
+ u32 ris_max;
+
+ /* mpam_msc_ris of this component */
+ struct list_head ris;
+
+ /*
+ * part_sel_lock protects access to the MSC hardware registers that are
+ * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+ * by RIS).
+ * If needed, take msc->lock first.
+ */
+ struct mutex part_sel_lock;
+
+ /*
+ * mon_sel_lock protects access to the MSC hardware registers that are
+ * affeted by MPAMCFG_MON_SEL.
+ * If needed, take msc->lock first.
+ */
+ struct mutex outer_mon_sel_lock;
+ raw_spinlock_t inner_mon_sel_lock;
+ unsigned long inner_mon_sel_flags;
+
+ void __iomem *mapped_hwpage;
+ size_t mapped_hwpage_sz;
+};
+
+#endif /* MPAM_INTERNAL_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (43 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
` (22 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Shanker Donthineni <sdonthineni@nvidia.com>
The device-tree binding has two examples for MSC associated with
memory controllers. Add the support to discover the component_id
from the device-tree and create 'memory' RIS.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: split out of a bigger patch, added affinity piece ]
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 67 ++++++++++++++++++++++++----------
1 file changed, 47 insertions(+), 20 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a0d9a699a6e7..71a1fb1a9c75 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -62,41 +62,63 @@ static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
u32 ris_idx)
{
int err = 0;
- u32 level = 0;
- unsigned long cache_id;
- struct device_node *cache;
+ u32 class_id = 0, component_id = 0;
+ struct device_node *cache = NULL, *memory = NULL;
+ enum mpam_class_types type = MPAM_CLASS_UNKNOWN;
do {
+ /* What kind of MSC is this? */
if (of_device_is_compatible(np, "arm,mpam-cache")) {
cache = of_parse_phandle(np, "arm,mpam-device", 0);
if (!cache) {
pr_err("Failed to read phandle\n");
break;
}
+ type = MPAM_CLASS_CACHE;
} else if (of_device_is_compatible(np->parent, "cache")) {
cache = of_node_get(np->parent);
+ type = MPAM_CLASS_CACHE;
+ } else if (of_device_is_compatible(np, "arm,mpam-memory")) {
+ memory = of_parse_phandle(np, "arm,mpam-device", 0);
+ if (!memory) {
+ pr_err("Failed to read phandle\n");
+ break;
+ }
+ type = MPAM_CLASS_MEMORY;
+ } else if (of_device_is_compatible(np, "arm,mpam-memory-controller-msc")) {
+ memory = of_node_get(np->parent);
+ type = MPAM_CLASS_MEMORY;
} else {
- /* For now, only caches are supported */
- cache = NULL;
+ /*
+ * For now, only caches and memory controllers are
+ * supported.
+ */
break;
}
- err = of_property_read_u32(cache, "cache-level", &level);
- if (err) {
- pr_err("Failed to read cache-level\n");
- break;
- }
-
- cache_id = cache_of_calculate_id(cache);
- if (cache_id == ~0UL) {
- err = -ENOENT;
- break;
+ /* Determine the class and component ids, based on type. */
+ if (type == MPAM_CLASS_CACHE) {
+ err = of_property_read_u32(cache, "cache-level", &class_id);
+ if (err) {
+ pr_err("Failed to read cache-level\n");
+ break;
+ }
+ component_id = cache_of_calculate_id(cache);
+ if (component_id == ~0UL) {
+ err = -ENOENT;
+ break;
+ }
+ } else if (type == MPAM_CLASS_MEMORY) {
+ err = of_node_to_nid(np);
+ component_id = (err == NUMA_NO_NODE) ? 0 : err;
+ class_id = 255;
}
- err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
- cache_id);
+ err = mpam_ris_create(msc, ris_idx, type, class_id,
+ component_id);
} while (0);
of_node_put(cache);
+ of_node_put(memory);
return err;
}
@@ -157,9 +179,14 @@ static int update_msc_accessibility(struct mpam_msc *msc)
cpumask_copy(&msc->accessibility, cpu_possible_mask);
err = 0;
} else {
- err = -EINVAL;
- pr_err("Cannot determine accessibility of MSC: %s\n",
- dev_name(&msc->pdev->dev));
+ if (of_device_is_compatible(parent, "memory")) {
+ cpumask_copy(&msc->accessibility, cpu_possible_mask);
+ err = 0;
+ } else {
+ err = -EINVAL;
+ pr_err("Cannot determine accessibility of MSC: %s\n",
+ dev_name(&msc->pdev->dev));
+ }
}
of_node_put(parent);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (44 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-29 12:41 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
` (21 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.
To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)
struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
This is to allow hardware implementations where two controls are presented
as different RIS. Re-combining these RIS allows their feature bits to
be or-ed. This structure is not visible outside mpam_devices.c
struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
visible as each L2 cache may be composed of individual slices which need
to be configured the same as the hardware is not able to distribute the
configuration.
Add support for creating and destroying these structures.
A gfp is passed as the structures may need creating when a new RIS entry
is discovered when probing the MSC.
CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* removed a pr_err() debug message that crept in.
---
drivers/resctrl/mpam_devices.c | 488 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 91 ++++++
include/linux/arm_mpam.h | 8 +-
3 files changed, 574 insertions(+), 13 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 71a1fb1a9c75..5baf2a8786fb 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -20,7 +20,6 @@
#include <linux/printk.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
-#include <linux/srcu.h>
#include <linux/types.h>
#include <acpi/pcc.h>
@@ -35,11 +34,483 @@
static DEFINE_MUTEX(mpam_list_lock);
static LIST_HEAD(mpam_all_msc);
-static struct srcu_struct mpam_srcu;
+struct srcu_struct mpam_srcu;
/* MPAM isn't available until all the MSC have been probed. */
static u32 mpam_num_msc;
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
+{
+ struct mpam_vmsc *vmsc;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ vmsc = kzalloc(sizeof(*vmsc), gfp);
+ if (!comp)
+ return ERR_PTR(-ENOMEM);
+ init_garbage(vmsc);
+
+ INIT_LIST_HEAD_RCU(&vmsc->ris);
+ INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+ vmsc->comp = comp;
+ vmsc->msc = msc;
+
+ list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+ return vmsc;
+}
+
+static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
+ struct mpam_msc *msc, bool alloc,
+ gfp_t gfp)
+{
+ struct mpam_vmsc *vmsc;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ if (vmsc->msc->id == msc->id)
+ return vmsc;
+ }
+
+ if (!alloc)
+ return ERR_PTR(-ENOENT);
+
+ return mpam_vmsc_alloc(comp, msc, gfp);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp)
+{
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ comp = kzalloc(sizeof(*comp), gfp);
+ if (!comp)
+ return ERR_PTR(-ENOMEM);
+ init_garbage(comp);
+
+ comp->comp_id = id;
+ INIT_LIST_HEAD_RCU(&comp->vmsc);
+ /* affinity is updated when ris are added */
+ INIT_LIST_HEAD_RCU(&comp->class_list);
+ comp->class = class;
+
+ list_add_rcu(&comp->class_list, &class->components);
+
+ return comp;
+}
+
+static struct mpam_component *
+mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t gfp)
+{
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(comp, &class->components, class_list) {
+ if (comp->comp_id == id)
+ return comp;
+ }
+
+ if (!alloc)
+ return ERR_PTR(-ENOENT);
+
+ return mpam_component_alloc(class, id, gfp);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
+{
+ struct mpam_class *class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ class = kzalloc(sizeof(*class), gfp);
+ if (!class)
+ return ERR_PTR(-ENOMEM);
+ init_garbage(class);
+
+ INIT_LIST_HEAD_RCU(&class->components);
+ /* affinity is updated when ris are added */
+ class->level = level_idx;
+ class->type = type;
+ INIT_LIST_HEAD_RCU(&class->classes_list);
+
+ list_add_rcu(&class->classes_list, &mpam_classes);
+
+ return class;
+}
+
+static struct mpam_class *
+mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc, gfp_t gfp)
+{
+ bool found = false;
+ struct mpam_class *class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(class, &mpam_classes, classes_list) {
+ if (class->type == type && class->level == level_idx) {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ return class;
+
+ if (!alloc)
+ return ERR_PTR(-ENOENT);
+
+ return mpam_class_alloc(level_idx, type, gfp);
+}
+
+#define add_to_garbage(x) \
+do { \
+ __typeof__(x) _x = x; \
+ (_x)->garbage.to_free = (_x); \
+ llist_add(&(_x)->garbage.llist, &mpam_garbage); \
+} while (0)
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&class->classes_list);
+ add_to_garbage(class);
+}
+
+static void mpam_comp_destroy(struct mpam_component *comp)
+{
+ struct mpam_class *class = comp->class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&comp->class_list);
+ add_to_garbage(comp);
+
+ if (list_empty(&class->components))
+ mpam_class_destroy(class);
+}
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+ struct mpam_component *comp = vmsc->comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&vmsc->comp_list);
+ add_to_garbage(vmsc);
+
+ if (list_empty(&comp->vmsc))
+ mpam_comp_destroy(comp);
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+ struct mpam_vmsc *vmsc = ris->vmsc;
+ struct mpam_msc *msc = vmsc->msc;
+ struct platform_device *pdev = msc->pdev;
+ struct mpam_component *comp = vmsc->comp;
+ struct mpam_class *class = comp->class;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+ cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+ clear_bit(ris->ris_idx, msc->ris_idxs);
+ list_del_rcu(&ris->vmsc_list);
+ list_del_rcu(&ris->msc_list);
+ add_to_garbage(ris);
+ ris->garbage.pdev = pdev;
+
+ if (list_empty(&vmsc->ris))
+ mpam_vmsc_destroy(vmsc);
+}
+
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+ struct platform_device *pdev = msc->pdev;
+ struct mpam_msc_ris *ris, *tmp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_del_rcu(&msc->glbl_list);
+ platform_set_drvdata(pdev, NULL);
+
+ list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+ mpam_ris_destroy(ris);
+
+ add_to_garbage(msc);
+ msc->garbage.pdev = pdev;
+}
+
+static void mpam_free_garbage(void)
+{
+ struct mpam_garbage *iter, *tmp;
+ struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+ if (!to_free)
+ return;
+
+ synchronize_srcu(&mpam_srcu);
+
+ llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+ if (iter->pdev)
+ devm_kfree(&iter->pdev->dev, iter->to_free);
+ else
+ kfree(iter->to_free);
+ }
+}
+
+/* Called recursively to walk the list of caches from a particular CPU */
+static void __mpam_get_cpumask_from_cache_id(int cpu, struct device_node *cache_node,
+ unsigned long cache_id,
+ u32 cache_level,
+ cpumask_t *affinity)
+{
+ int err;
+ u32 iter_level;
+ unsigned long iter_cache_id;
+ struct device_node *iter_node __free(device_node) = of_find_next_cache_node(cache_node);
+
+ if (!iter_node)
+ return;
+
+ err = of_property_read_u32(iter_node, "cache-level", &iter_level);
+ if (err)
+ return;
+
+ /*
+ * get_cpu_cacheinfo_id() isn't ready until sometime
+ * during device_initcall(). Use cache_of_calculate_id().
+ */
+ iter_cache_id = cache_of_calculate_id(iter_node);
+ if (cache_id == ~0UL)
+ return;
+
+ if (iter_level == cache_level && iter_cache_id == cache_id)
+ cpumask_set_cpu(cpu, affinity);
+
+ __mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id, cache_level,
+ affinity);
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ * This helper walks the device tree to include offline CPUs too.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+ cpumask_t *affinity)
+{
+ int cpu;
+
+ if (!acpi_disabled)
+ return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+
+ for_each_possible_cpu(cpu) {
+ struct device_node *cpu_node __free(device_node) = of_get_cpu_node(cpu, NULL);
+ if (!cpu_node) {
+ pr_err("Failed to find cpu%d device node\n", cpu);
+ return -ENOENT;
+ }
+
+ __mpam_get_cpumask_from_cache_id(cpu, cpu_node, cache_id,
+ cache_level, affinity);
+ continue;
+ }
+
+ return 0;
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (node_id == cpu_to_node(cpu))
+ cpumask_set_cpu(cpu, affinity);
+ }
+}
+
+static int get_cpumask_from_cache(struct device_node *cache,
+ cpumask_t *affinity)
+{
+ int err;
+ u32 cache_level;
+ unsigned long cache_id;
+
+ err = of_property_read_u32(cache, "cache-level", &cache_level);
+ if (err) {
+ pr_err("Failed to read cache-level from cache node\n");
+ return -ENOENT;
+ }
+
+ cache_id = cache_of_calculate_id(cache);
+ if (cache_id == ~0UL) {
+ pr_err("Failed to calculate cache-id from cache node\n");
+ return -ENOENT;
+ }
+
+ return mpam_get_cpumask_from_cache_id(cache_id, cache_level, affinity);
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+ enum mpam_class_types type,
+ struct mpam_class *class,
+ struct mpam_component *comp)
+{
+ int err;
+
+ switch (type) {
+ case MPAM_CLASS_CACHE:
+ err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+ affinity);
+ if (err)
+ return err;
+
+ if (cpumask_empty(affinity))
+ pr_warn_once("%s no CPUs associated with cache node",
+ dev_name(&msc->pdev->dev));
+
+ break;
+ case MPAM_CLASS_MEMORY:
+ get_cpumask_from_node_id(comp->comp_id, affinity);
+ /* affinity may be empty for CPU-less memory nodes */
+ break;
+ case MPAM_CLASS_UNKNOWN:
+ return 0;
+ }
+
+ cpumask_and(affinity, affinity, &msc->accessibility);
+
+ return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id,
+ int component_id, gfp_t gfp)
+{
+ int err;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+ struct mpam_class *class;
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ if (test_and_set_bit(ris_idx, msc->ris_idxs))
+ return -EBUSY;
+
+ ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
+ if (!ris)
+ return -ENOMEM;
+ init_garbage(ris);
+
+ class = mpam_class_get(class_id, type, true, gfp);
+ if (IS_ERR(class))
+ return PTR_ERR(class);
+
+ comp = mpam_component_get(class, component_id, true, gfp);
+ if (IS_ERR(comp)) {
+ if (list_empty(&class->components))
+ mpam_class_destroy(class);
+ return PTR_ERR(comp);
+ }
+
+ vmsc = mpam_vmsc_get(comp, msc, true, gfp);
+ if (IS_ERR(vmsc)) {
+ if (list_empty(&comp->vmsc))
+ mpam_comp_destroy(comp);
+ return PTR_ERR(vmsc);
+ }
+
+ err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+ if (err) {
+ if (list_empty(&vmsc->ris))
+ mpam_vmsc_destroy(vmsc);
+ return err;
+ }
+
+ ris->ris_idx = ris_idx;
+ INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+ ris->vmsc = vmsc;
+
+ cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+ cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+ list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+
+ return 0;
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id, int component_id)
+{
+ int err;
+
+ mutex_lock(&mpam_list_lock);
+ err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+ component_id, GFP_KERNEL);
+ mutex_unlock(&mpam_list_lock);
+ if (err)
+ mpam_free_garbage();
+
+ return err;
+}
+
static void mpam_discovery_complete(void)
{
pr_err("Discovered all MSC\n");
@@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
cpumask_copy(&msc->accessibility, cpu_possible_mask);
err = 0;
} else {
- if (of_device_is_compatible(parent, "memory")) {
+ if (of_device_is_compatible(parent, "cache")) {
+ err = get_cpumask_from_cache(parent,
+ &msc->accessibility);
+ } else if (of_device_is_compatible(parent, "memory")) {
cpumask_copy(&msc->accessibility, cpu_possible_mask);
err = 0;
} else {
@@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
mutex_lock(&mpam_list_lock);
mpam_num_msc--;
- platform_set_drvdata(pdev, NULL);
- list_del_rcu(&msc->glbl_list);
- synchronize_srcu(&mpam_srcu);
- devm_kfree(&pdev->dev, msc);
+ mpam_msc_destroy(msc);
mutex_unlock(&mpam_list_lock);
+
+ mpam_free_garbage();
}
static int mpam_msc_drv_probe(struct platform_device *pdev)
@@ -230,6 +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
err = -ENOMEM;
break;
}
+ init_garbage(msc);
mutex_init(&msc->probe_lock);
mutex_init(&msc->part_sel_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 07e0f240eaca..d49bb884b433 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,10 +7,27 @@
#include <linux/arm_mpam.h>
#include <linux/cpumask.h>
#include <linux/io.h>
+#include <linux/llist.h>
#include <linux/mailbox_client.h>
#include <linux/mutex.h>
#include <linux/resctrl.h>
#include <linux/sizes.h>
+#include <linux/srcu.h>
+
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be gargbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+ /* member of mpam_garbage */
+ struct llist_node llist;
+
+ void *to_free;
+ struct platform_device *pdev;
+};
struct mpam_msc {
/* member of mpam_all_msc */
@@ -57,6 +74,80 @@ struct mpam_msc {
void __iomem *mapped_hwpage;
size_t mapped_hwpage_sz;
+
+ struct mpam_garbage garbage;
+};
+
+struct mpam_class {
+ /* mpam_components in this class */
+ struct list_head components;
+
+ cpumask_t affinity;
+
+ u8 level;
+ enum mpam_class_types type;
+
+ /* member of mpam_classes */
+ struct list_head classes_list;
+
+ struct mpam_garbage garbage;
+};
+
+struct mpam_component {
+ u32 comp_id;
+
+ /* mpam_vmsc in this component */
+ struct list_head vmsc;
+
+ cpumask_t affinity;
+
+ /* member of mpam_class:components */
+ struct list_head class_list;
+
+ /* parent: */
+ struct mpam_class *class;
+
+ struct mpam_garbage garbage;
};
+struct mpam_vmsc {
+ /* member of mpam_component:vmsc_list */
+ struct list_head comp_list;
+
+ /* mpam_msc_ris in this vmsc */
+ struct list_head ris;
+
+ /* All RIS in this vMSC are members of this MSC */
+ struct mpam_msc *msc;
+
+ /* parent: */
+ struct mpam_component *comp;
+
+ struct mpam_garbage garbage;
+};
+
+struct mpam_msc_ris {
+ u8 ris_idx;
+
+ cpumask_t affinity;
+
+ /* member of mpam_vmsc:ris */
+ struct list_head vmsc_list;
+
+ /* member of mpam_msc:ris */
+ struct list_head msc_list;
+
+ /* parent: */
+ struct mpam_vmsc *vmsc;
+
+ struct mpam_garbage garbage;
+};
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+ cpumask_t *affinity);
+
#endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 0edefa6ba019..406a77be68cb 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
#endif
-static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
- enum mpam_class_types type, u8 class_id,
- int component_id)
-{
- return -EINVAL;
-}
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+ enum mpam_class_types type, u8 class_id, int component_id);
#endif /* __LINUX_ARM_MPAM_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (45 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
` (20 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.
Add the definitions for these registers as offset within the page(s).
Link: https://developer.arm.com/documentation/ihi0099/latest/
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
* Whitepsace churn.
* Cite a more recent document.
* Removed some stale feature, fixed some names etc.
---
drivers/resctrl/mpam_internal.h | 266 ++++++++++++++++++++++++++++++++
1 file changed, 266 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d49bb884b433..6e0982a1a9ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/latest/
+ */
+#define MPAM_ARCHITECTURE_V1 0x10
+
+/* Memory mapped control pages: */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR 0x0000 /* features id register */
+#define MPAMF_MSMON_IDR 0x0080 /* performance monitoring features */
+#define MPAMF_IMPL_IDR 0x0028 /* imp-def partitioning */
+#define MPAMF_CPOR_IDR 0x0030 /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR 0x0038 /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR 0x0040 /* mem-bw partitioning */
+#define MPAMF_PRI_IDR 0x0048 /* priority partitioning */
+#define MPAMF_CSUMON_IDR 0x0088 /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR 0x0090 /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR 0x0050 /* partid-narrowing */
+#define MPAMF_IIDR 0x0018 /* implementer id register */
+#define MPAMF_AIDR 0x0020 /* architectural id register */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL 0x0100 /* partid to configure: */
+#define MPAMCFG_CPBM 0x1000 /* cache-portion config */
+#define MPAMCFG_CMAX 0x0108 /* cache-capacity config */
+#define MPAMCFG_CMIN 0x0110 /* cache-capacity config */
+#define MPAMCFG_MBW_MIN 0x0200 /* min mem-bw config */
+#define MPAMCFG_MBW_MAX 0x0208 /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD 0x0220 /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM 0x2000 /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI 0x0400 /* priority partitioning config */
+#define MPAMCFG_MBW_PROP 0x0500 /* mem-bw stride config */
+#define MPAMCFG_INTPARTID 0x0600 /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL 0x0800 /* monitor selector */
+#define MSMON_CFG_CSU_FLT 0x0810 /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL 0x0818 /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT 0x0820 /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL 0x0828 /* mem-bw monitor config */
+#define MSMON_CSU 0x0840 /* current cache-usage */
+#define MSMON_CSU_CAPTURE 0x0848 /* last cache-usage value captured */
+#define MSMON_MBWU 0x0860 /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE 0x0868 /* last mem-bw value captured */
+#define MSMON_MBWU_L 0x0880 /* current long mem-bw usage value */
+#define MSMON_MBWU_CAPTURE_L 0x0890 /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT 0x0808 /* signal a capture event */
+#define MPAMF_ESR 0x00F8 /* error status register */
+#define MPAMF_ECR 0x00F0 /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART BIT(27)
+#define MPAMF_IDR_EXT BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR BIT(29)
+#define MPAMF_IDR_HAS_MSMON BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW BIT(31)
+#define MPAMF_IDR_HAS_RIS BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR BIT(38)
+#define MPAMF_IDR_HAS_ESR BIT(39)
+#define MPAMF_IDR_RIS_MAX GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP BIT(13)
+#define MPAMF_MBW_IDR_WINDWR BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_PRODUCTID GENMASK(31, 20)
+#define MPAMF_IIDR_PRODUCTID_SHIFT 20
+#define MPAMF_IIDR_VARIANT GENMASK(19, 16)
+#define MPAMF_IIDR_VARIANT_SHIFT 16
+#define MPAMF_IIDR_REVISON GENMASK(15, 12)
+#define MPAMF_IIDR_REVISON_SHIFT 12
+#define MPAMF_IIDR_IMPLEMENTER GENMASK(11, 0)
+#define MPAMF_IIDR_IMPLEMENTER_SHIFT 0
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MAJOR_REV GENMASK(7, 4)
+#define MPAMF_AIDR_ARCH_MINOR_REV GENMASK(3, 0)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL BIT(16)
+#define MPAMCFG_PART_SEL_RIS GENMASK(27, 24)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM BIT(31)
+#define MPAMCFG_CMAX_CMAX GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ * register
+ */
+#define MPAMCFG_MBW_MIN_MIN GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ * register
+ */
+#define MPAMCFG_MBW_MAX_MAX GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ * register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ * configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1 GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON GENMASK(15, 0)
+#define MPAMF_ESR_PMG GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR BIT(31)
+#define MPAMF_ESR_RIS GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE 0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE 1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE 2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE 3
+#define MPAM_ERRCODE_REQ_PMG_RANGE 4
+#define MPAM_ERRCODE_MONITOR_RANGE 5
+#define MPAM_ERRCODE_INTPARTID_RANGE 6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL 7
+
+/*
+ * MSMON_CFG_CSU_FLT - Memory system performance monitor configure cache storage
+ * usage monitor filter register
+ */
+#define MSMON_CFG_CSU_FLT_PARTID GENMASK(15, 0)
+#define MSMON_CFG_CSU_FLT_PMG GENMASK(23, 16)
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ * usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ * bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG BIT(17)
+#define MSMON_CFG_x_CTL_SCLEN BIT(19)
+#define MSMON_CFG_x_CTL_SUBTYPE GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU 0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU 0
+
+/*
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ * bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_MBWU_FLT_PARTID GENMASK(15, 0)
+#define MSMON_CFG_MBWU_FLT_PMG GENMASK(23, 16)
+#define MSMON_CFG_MBWU_FLT_RWBW GENMASK(31, 30)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ * register
+ * MSMON_CSU_CAPTURE - Memory system performance monitor cache storage usage
+ * capture register
+ * MSMON_MBWU - Memory system performance monitor memory bandwidth usage
+ * monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ * capture register
+ */
+#define MSMON___VALUE GENMASK(30, 0)
+#define MSMON___NRDY BIT(31)
+#define MSMON___NRDY_L BIT(63)
+#define MSMON___L_VALUE GENMASK(43, 0)
+#define MSMON___LWD_VALUE GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ * generation register
+ */
+#define MSMON_CAPT_EVNT_NOW BIT(0)
+
#endif /* MPAM_INTERNAL_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (46 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
` (19 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen
Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.
Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed.
Later once MPAM is enabled, this cpuhp callback will be replaced by
one that avoids the global list.
Enabling a static key will also take the cpuhp lock, so can't be done
from the cpuhp callback. Whenever a new MSC has been probed schedule
work to test if all the MSCs have now been probed.
CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 144 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 8 +-
2 files changed, 147 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 5baf2a8786fb..9d6516f98acf 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,6 +4,7 @@
#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
#include <linux/acpi.h>
+#include <linux/atomic.h>
#include <linux/arm_mpam.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
@@ -21,6 +22,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/types.h>
+#include <linux/workqueue.h>
#include <acpi/pcc.h>
@@ -39,6 +41,16 @@ struct srcu_struct mpam_srcu;
/* MPAM isn't available until all the MSC have been probed. */
static u32 mpam_num_msc;
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
/*
* An MSC is a physical container for controls and monitors, each identified by
* their RIS index. These share a base-address, interrupts and some MMIO
@@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
/* List of all objects that can be free()d after synchronise_srcu() */
static LLIST_HEAD(mpam_garbage);
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+ WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+ lockdep_assert_held_once(&msc->part_sel_lock);
+ return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
+
#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
static struct mpam_vmsc *
@@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
return err;
}
-static void mpam_discovery_complete(void)
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+ u64 idr;
+ int err;
+
+ lockdep_assert_held(&msc->probe_lock);
+
+ mutex_lock(&msc->part_sel_lock);
+ idr = mpam_read_partsel_reg(msc, AIDR);
+ if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+ pr_err_once("%s does not match MPAM architecture v1.x\n",
+ dev_name(&msc->pdev->dev));
+ err = -EIO;
+ } else {
+ msc->probed = true;
+ err = 0;
+ }
+ mutex_unlock(&msc->part_sel_lock);
+
+ return err;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
{
- pr_err("Discovered all MSC\n");
+ return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+ int err = 0;
+ struct mpam_msc *msc;
+ bool new_device_probed = false;
+
+ mutex_lock(&mpam_list_lock);
+ list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+ if (!cpumask_test_cpu(cpu, &msc->accessibility))
+ continue;
+
+ mutex_lock(&msc->probe_lock);
+ if (!msc->probed)
+ err = mpam_msc_hw_probe(msc);
+ mutex_unlock(&msc->probe_lock);
+
+ if (!err)
+ new_device_probed = true;
+ else
+ break; // mpam_broken
+ }
+ mutex_unlock(&mpam_list_lock);
+
+ if (new_device_probed && !err)
+ schedule_work(&mpam_enable_work);
+
+ return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+ return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+ int (*offline)(unsigned int offline))
+{
+ mutex_lock(&mpam_cpuhp_state_lock);
+ if (mpam_cpuhp_state) {
+ cpuhp_remove_state(mpam_cpuhp_state);
+ mpam_cpuhp_state = 0;
+ }
+
+ mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
+ online, offline);
+ if (mpam_cpuhp_state <= 0) {
+ pr_err("Failed to register cpuhp callbacks");
+ mpam_cpuhp_state = 0;
+ }
+ mutex_unlock(&mpam_cpuhp_state_lock);
}
static int mpam_dt_count_msc(void)
@@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
}
if (!err && fw_num_msc == mpam_num_msc)
- mpam_discovery_complete();
+ mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
if (err && msc)
mpam_msc_drv_remove(pdev);
@@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
.remove = mpam_msc_drv_remove,
};
+static void mpam_enable_once(void)
+{
+ mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
+
+ pr_info("MPAM enabled\n");
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+ static atomic_t once;
+ struct mpam_msc *msc;
+ bool all_devices_probed = true;
+
+ /* Have we probed all the hw devices? */
+ mutex_lock(&mpam_list_lock);
+ list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+ mutex_lock(&msc->probe_lock);
+ if (!msc->probed)
+ all_devices_probed = false;
+ mutex_unlock(&msc->probe_lock);
+
+ if (!all_devices_probed)
+ break;
+ }
+ mutex_unlock(&mpam_list_lock);
+
+ if (all_devices_probed && !atomic_fetch_inc(&once))
+ mpam_enable_once();
+}
+
/*
* MSC that are hidden under caches are not created as platform devices
* as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6e0982a1a9ac..a98cca08a2ef 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -49,6 +49,7 @@ struct mpam_msc {
* properties become read-only and the lists are protected by SRCU.
*/
struct mutex probe_lock;
+ bool probed;
unsigned long ris_idxs[128 / BITS_PER_LONG];
u32 ris_max;
@@ -59,14 +60,14 @@ struct mpam_msc {
* part_sel_lock protects access to the MSC hardware registers that are
* affected by MPAMCFG_PART_SEL. (including the ID registers that vary
* by RIS).
- * If needed, take msc->lock first.
+ * If needed, take msc->probe_lock first.
*/
struct mutex part_sel_lock;
/*
* mon_sel_lock protects access to the MSC hardware registers that are
* affeted by MPAMCFG_MON_SEL.
- * If needed, take msc->lock first.
+ * If needed, take msc->probe_lock first.
*/
struct mutex outer_mon_sel_lock;
raw_spinlock_t inner_mon_sel_lock;
@@ -147,6 +148,9 @@ struct mpam_msc_ris {
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (47 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
` (18 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
While doing this, RIS entries that firmware didn't describe are create
under MPAM_CLASS_UNKNOWN.
While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 158 ++++++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 6 ++
include/linux/arm_mpam.h | 14 +++
3 files changed, 171 insertions(+), 7 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9d6516f98acf..012e09e80300 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
#include <linux/acpi.h>
#include <linux/atomic.h>
#include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
@@ -44,6 +45,15 @@ static u32 mpam_num_msc;
static int mpam_cpuhp_state;
static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
/*
* mpam is enabled once all devices have been probed from CPU online callbacks,
* scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
#define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+ WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+ lockdep_assert_held_once(&msc->part_sel_lock);
+ __mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_partsel_reg(msc, reg, val) _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+ u64 idr_high = 0, idr_low;
+
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ idr_low = mpam_read_partsel_reg(msc, IDR);
+ if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+ idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+ return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+ u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+ FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+ __mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+ int err = 0;
+
+ lockdep_assert_irqs_enabled();
+
+ spin_lock(&partid_max_lock);
+ if (!partid_max_init) {
+ mpam_partid_max = partid_max;
+ mpam_pmg_max = pmg_max;
+ partid_max_init = true;
+ } else if (!partid_max_published) {
+ mpam_partid_max = min(mpam_partid_max, partid_max);
+ mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+ } else {
+ /* New requestors can't lower the values */
+ if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+ err = -EBUSY;
+ }
+ spin_unlock(&partid_max_lock);
+
+ return err;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
static struct mpam_vmsc *
@@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+ list_add_rcu(&ris->msc_list, &msc->ris);
return 0;
}
@@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
return err;
}
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+ u8 ris_idx)
+{
+ int err;
+ struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ if (!test_bit(ris_idx, msc->ris_idxs)) {
+ err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+ 0, 0, GFP_ATOMIC);
+ if (err)
+ return ERR_PTR(err);
+ }
+
+ list_for_each_entry(ris, &msc->ris, msc_list) {
+ if (ris->ris_idx == ris_idx) {
+ found = ris;
+ break;
+ }
+ }
+
+ return found;
+}
+
static int mpam_msc_hw_probe(struct mpam_msc *msc)
{
u64 idr;
- int err;
+ u16 partid_max;
+ u8 ris_idx, pmg_max;
+ struct mpam_msc_ris *ris;
lockdep_assert_held(&msc->probe_lock);
@@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
pr_err_once("%s does not match MPAM architecture v1.x\n",
dev_name(&msc->pdev->dev));
- err = -EIO;
- } else {
- msc->probed = true;
- err = 0;
+ mutex_unlock(&msc->part_sel_lock);
+ return -EIO;
}
+
+ idr = mpam_msc_read_idr(msc);
mutex_unlock(&msc->part_sel_lock);
+ msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+ /* Use these values so partid/pmg always starts with a valid value */
+ msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+ msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+ for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+ mutex_lock(&msc->part_sel_lock);
+ __mpam_part_sel(ris_idx, 0, msc);
+ idr = mpam_msc_read_idr(msc);
+ mutex_unlock(&msc->part_sel_lock);
+
+ partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+ pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+ msc->partid_max = min(msc->partid_max, partid_max);
+ msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+ ris = mpam_get_or_create_ris(msc, ris_idx);
+ if (IS_ERR(ris))
+ return PTR_ERR(ris);
+ }
- return err;
+ spin_lock(&partid_max_lock);
+ mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+ mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+ spin_unlock(&partid_max_lock);
+
+ msc->probed = true;
+
+ return 0;
}
static int mpam_cpu_online(unsigned int cpu)
@@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
static void mpam_enable_once(void)
{
+ /*
+ * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+ * longer change.
+ */
+ spin_lock(&partid_max_lock);
+ partid_max_published = true;
+ spin_unlock(&partid_max_lock);
+
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
- pr_info("MPAM enabled\n");
+ printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
+ mpam_partid_max + 1, mpam_pmg_max + 1);
}
/*
@@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
return platform_driver_register(&mpam_msc_driver);
}
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a98cca08a2ef..a623f405ddd8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -50,6 +50,8 @@ struct mpam_msc {
*/
struct mutex probe_lock;
bool probed;
+ u16 partid_max;
+ u8 pmg_max;
unsigned long ris_idxs[128 / BITS_PER_LONG];
u32 ris_max;
@@ -148,6 +150,10 @@ struct mpam_msc_ris {
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
/* Scheduled work callback to enable mpam once all MSC have been probed */
void mpam_enable(struct work_struct *work);
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 406a77be68cb..8af93794c7a2 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
enum mpam_class_types type, u8 class_id, int component_id);
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max: The maximum PARTID value the requestor can generate.
+ * @pmg_max: The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
#endif /* __LINUX_ARM_MPAM_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (48 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
` (17 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MSC MON_SEL register needs to be accessed from hardirq context by the
PMU drivers, making an irqsave spinlock the obvious lock to protect these
registers. On systems with SCMI mailboxes it must be able to sleep, meaning
a mutex must be used.
Clearly these two can't exist at the same time.
Add helpers for the MON_SEL locking. The outer lock must be taken in a
pre-emptible context before the inner lock can be taken. On systems with
SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
will fail to be 'taken' if the caller is unable to sleep. This will allow
the PMU driver to fail without having to check the interface type of
each MSC.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a623f405ddd8..c6f087f9fa7d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -68,10 +68,19 @@ struct mpam_msc {
/*
* mon_sel_lock protects access to the MSC hardware registers that are
- * affeted by MPAMCFG_MON_SEL.
+ * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+ * Both the 'inner' and 'outer' must be taken.
+ * For real MMIO MSC, the outer lock is unnecessary - but keeps the
+ * code common with:
+ * Firmware backed MSC need to sleep when accessing the MSC, which
+ * means some code-paths will always fail. For these MSC the outer
+ * lock is providing the protection, and the inner lock fails to
+ * be taken if the task is unable to sleep.
+ *
* If needed, take msc->probe_lock first.
*/
struct mutex outer_mon_sel_lock;
+ bool outer_lock_held;
raw_spinlock_t inner_mon_sel_lock;
unsigned long inner_mon_sel_flags;
@@ -81,6 +90,52 @@ struct mpam_msc {
struct mpam_garbage garbage;
};
+static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
+{
+ /*
+ * The outer lock may be taken by a CPU that then issues an IPI to run
+ * a helper that takes the inner lock. lockdep can't help us here.
+ */
+ WARN_ON_ONCE(!msc->outer_lock_held);
+
+ if (msc->iface == MPAM_IFACE_MMIO) {
+ raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+ return true;
+ }
+
+ /* Accesses must fail if we are not pre-emptible */
+ return !!preemptible();
+}
+
+static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
+{
+ WARN_ON_ONCE(!msc->outer_lock_held);
+
+ if (msc->iface == MPAM_IFACE_MMIO)
+ raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
+{
+ mutex_lock(&msc->outer_mon_sel_lock);
+ msc->outer_lock_held = true;
+}
+
+static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
+{
+ msc->outer_lock_held = false;
+ mutex_unlock(&msc->outer_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+ WARN_ON_ONCE(!msc->outer_lock_held);
+ if (msc->iface == MPAM_IFACE_MMIO)
+ lockdep_assert_held_once(&msc->inner_mon_sel_lock);
+ else
+ lockdep_assert_preemption_enabled();
+}
+
struct mpam_class {
/* mpam_components in this class */
struct list_head components;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (49 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
` (16 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
Expand the probing support with the control and monitor types
we can use with resctrl.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Made mpam_ris_hw_probe_hw_nrdy() more in C.
* Added static assert on features bitmap size.
---
drivers/resctrl/mpam_devices.c | 156 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 54 +++++++++++
2 files changed, 209 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 012e09e80300..290a04f8654f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
{
- WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
return readl_relaxed(msc->mapped_hwpage + reg);
@@ -131,6 +131,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
}
#define mpam_write_partsel_reg(msc, reg, val) _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+ mpam_mon_sel_lock_held(msc);
+ return __mpam_read_reg(msc, reg);
+}
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+ mpam_mon_sel_lock_held(msc);
+ __mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_monsel_reg(msc, reg, val) _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
static u64 mpam_msc_read_idr(struct mpam_msc *msc)
{
u64 idr_high = 0, idr_low;
@@ -643,6 +657,139 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
return found;
}
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
+{
+ u32 now;
+ u64 mon_sel;
+ bool can_set, can_clear;
+ struct mpam_msc *msc = ris->vmsc->msc;
+
+ if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+ return false;
+
+ mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+ FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+ _mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+ _mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+ now = _mpam_read_monsel_reg(msc, mon_reg);
+ can_set = now & MSMON___NRDY;
+
+ _mpam_write_monsel_reg(msc, mon_reg, 0);
+ now = _mpam_read_monsel_reg(msc, mon_reg);
+ can_clear = !(now & MSMON___NRDY);
+ mpam_mon_sel_inner_unlock(msc);
+
+ return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg) \
+ _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+ int err;
+ struct mpam_msc *msc = ris->vmsc->msc;
+ struct mpam_props *props = &ris->props;
+
+ lockdep_assert_held(&msc->probe_lock);
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ /* Cache Portion partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+ u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+ props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+ if (props->cpbm_wd)
+ mpam_set_feature(mpam_feat_cpor_part, props);
+ }
+
+ /* Memory bandwidth partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+ u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+ /* portion bitmap resolution */
+ props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+ if (props->mbw_pbm_bits &&
+ FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_part, props);
+
+ props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+ if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_max, props);
+ }
+
+ /* Performance Monitoring */
+ if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+ u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+ /*
+ * If the firmware max-nrdy-us property is missing, the
+ * CSU counters can't be used. Should we wait forever?
+ */
+ err = device_property_read_u32(&msc->pdev->dev,
+ "arm,not-ready-us",
+ &msc->nrdy_usec);
+
+ if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+ u32 csumonidr;
+
+ csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+ props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+ if (props->num_csu_mon) {
+ bool hw_managed;
+
+ mpam_set_feature(mpam_feat_msmon_csu, props);
+
+ /* Is NRDY hardware managed? */
+ mpam_mon_sel_outer_lock(msc);
+ hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+ mpam_mon_sel_outer_unlock(msc);
+ if (hw_managed)
+ mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+ }
+
+ /*
+ * Accept the missing firmware property if NRDY appears
+ * un-implemented.
+ */
+ if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+ pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
+ }
+ if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+ bool hw_managed;
+ u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+ props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
+ if (props->num_mbwu_mon)
+ mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+ if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
+ mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
+ /* Is NRDY hardware managed? */
+ mpam_mon_sel_outer_lock(msc);
+ hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+ mpam_mon_sel_outer_unlock(msc);
+ if (hw_managed)
+ mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+ /*
+ * Don't warn about any missing firmware property for
+ * MBWU NRDY - it doesn't make any sense!
+ */
+ }
+ }
+}
+
static int mpam_msc_hw_probe(struct mpam_msc *msc)
{
u64 idr;
@@ -663,6 +810,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
idr = mpam_msc_read_idr(msc);
mutex_unlock(&msc->part_sel_lock);
+
msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
/* Use these values so partid/pmg always starts with a valid value */
@@ -683,6 +831,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
ris = mpam_get_or_create_ris(msc, ris_idx);
if (IS_ERR(ris))
return PTR_ERR(ris);
+ ris->idr = idr;
+
+ mutex_lock(&msc->part_sel_lock);
+ __mpam_part_sel(ris_idx, 0, msc);
+ mpam_ris_hw_probe(ris);
+ mutex_unlock(&msc->part_sel_lock);
}
spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c6f087f9fa7d..9f6cd4a68cce 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
lockdep_assert_preemption_enabled();
}
+/*
+ * When we compact the supported features, we don't care what they are.
+ * Storing them as a bitmap makes life easy.
+ */
+typedef u16 mpam_features_t;
+
+/* Bits for mpam_features_t */
+enum mpam_device_features {
+ mpam_feat_ccap_part = 0,
+ mpam_feat_cpor_part,
+ mpam_feat_mbw_part,
+ mpam_feat_mbw_min,
+ mpam_feat_mbw_max,
+ mpam_feat_mbw_prop,
+ mpam_feat_msmon,
+ mpam_feat_msmon_csu,
+ mpam_feat_msmon_csu_capture,
+ mpam_feat_msmon_csu_hw_nrdy,
+ mpam_feat_msmon_mbwu,
+ mpam_feat_msmon_mbwu_capture,
+ mpam_feat_msmon_mbwu_rwbw,
+ mpam_feat_msmon_mbwu_hw_nrdy,
+ mpam_feat_msmon_capt,
+ MPAM_FEATURE_LAST,
+};
+static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
+#define MPAM_ALL_FEATURES ((1 << MPAM_FEATURE_LAST) - 1)
+
+struct mpam_props {
+ mpam_features_t features;
+
+ u16 cpbm_wd;
+ u16 mbw_pbm_bits;
+ u16 bwa_wd;
+ u16 num_csu_mon;
+ u16 num_mbwu_mon;
+};
+
+static inline bool mpam_has_feature(enum mpam_device_features feat,
+ struct mpam_props *props)
+{
+ return (1 << feat) & props->features;
+}
+
+static inline void mpam_set_feature(enum mpam_device_features feat,
+ struct mpam_props *props)
+{
+ props->features |= (1 << feat);
+}
+
struct mpam_class {
/* mpam_components in this class */
struct list_head components;
@@ -175,6 +225,8 @@ struct mpam_vmsc {
/* mpam_msc_ris in this vmsc */
struct list_head ris;
+ struct mpam_props props;
+
/* All RIS in this vMSC are members of this MSC */
struct mpam_msc *msc;
@@ -186,6 +238,8 @@ struct mpam_vmsc {
struct mpam_msc_ris {
u8 ris_idx;
+ u64 idr;
+ struct mpam_props props;
cpumask_t affinity;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (50 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
` (15 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.
Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.
If bitmap properties are mismatched within a component we
cannot support the mismatched feature.
Care has to be taken as vMSC may hold mismatched RIS.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 215 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 8 ++
2 files changed, 223 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 290a04f8654f..bb62de6d3847 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1186,8 +1186,223 @@ static struct platform_driver mpam_msc_driver = {
.remove = mpam_msc_drv_remove,
};
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+ if (mpam_has_feature(mpam_feat_mbw_min, props))
+ return true;
+ if (mpam_has_feature(mpam_feat_mbw_max, props))
+ return true;
+ if (mpam_has_feature(mpam_feat_mbw_prop, props))
+ return true;
+ return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias) \
+ helper(parent) && \
+ ((helper(child) && (parent)->field != (child)->field) || \
+ (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias) \
+ mpam_has_feature((feat), (parent)) && \
+ ((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+ (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias) \
+ (alias) && !mpam_has_feature((feat), (parent)) && \
+ mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+ struct mpam_props *child, bool alias)
+{
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+ parent->cpbm_wd = child->cpbm_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+ cpbm_wd, alias)) {
+ pr_debug("%s cleared cpor_part\n", __func__);
+ mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
+ parent->cpbm_wd = 0;
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+ parent->mbw_pbm_bits = child->mbw_pbm_bits;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+ mbw_pbm_bits, alias)) {
+ pr_debug("%s cleared mbw_part\n", __func__);
+ mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
+ parent->mbw_pbm_bits = 0;
+ }
+
+ /* bwa_wd is a count of bits, fewer bits means less precision */
+ if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {
+ parent->bwa_wd = child->bwa_wd;
+ } else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+ bwa_wd, alias)) {
+ pr_debug("%s took the min bwa_wd\n", __func__);
+ parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+ }
+
+ /* For num properties, take the minimum */
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+ parent->num_csu_mon = child->num_csu_mon;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+ num_csu_mon, alias)) {
+ pr_debug("%s took the min num_csu_mon\n", __func__);
+ parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+ parent->num_mbwu_mon = child->num_mbwu_mon;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+ num_mbwu_mon, alias)) {
+ pr_debug("%s took the min num_mbwu_mon\n", __func__);
+ parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
+ }
+
+ if (alias) {
+ /* Merge features for aliased resources */
+ parent->features |= child->features;
+ } else {
+ /* Clear missing features for non aliasing */
+ parent->features &= child->features;
+ }
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+ struct mpam_props *cprops = &class->props;
+ struct mpam_props *vprops = &vmsc->props;
+
+ lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+ pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
+ dev_name(&vmsc->msc->pdev->dev),
+ (long)cprops->features, (long)vprops->features);
+
+ /* Take the safe value for any common features */
+ __props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+ struct mpam_props *rprops = &ris->props;
+ struct mpam_props *vprops = &vmsc->props;
+
+ lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+ pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+ dev_name(&vmsc->msc->pdev->dev),
+ (long)vprops->features, (long)rprops->features);
+
+ /*
+ * Merge mismatched features - Copy any features that aren't common,
+ * but take the safe value for any common features.
+ */
+ __props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+ struct mpam_vmsc *vmsc;
+ struct mpam_component *comp;
+
+ comp = list_first_entry_or_null(&class->components,
+ struct mpam_component, class_list);
+ if (WARN_ON(!comp))
+ return;
+
+ vmsc = list_first_entry_or_null(&comp->vmsc,
+ struct mpam_vmsc, comp_list);
+ if (WARN_ON(!vmsc))
+ return;
+
+ class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+ struct mpam_class *class = comp->class;
+
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+ __vmsc_props_mismatch(vmsc, ris);
+ class->nrdy_usec = max(class->nrdy_usec,
+ vmsc->msc->nrdy_usec);
+ }
+ }
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+ struct mpam_vmsc *vmsc;
+ struct mpam_class *class = comp->class;
+
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+ __class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together, this must be done first.
+ * Next the class features are the bitwise-and of all the vmsc features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+ struct mpam_class *class;
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(class, all_classes_list, classes_list) {
+ list_for_each_entry(comp, &class->components, class_list)
+ mpam_enable_merge_vmsc_features(comp);
+
+ mpam_enable_init_class_features(class);
+
+ list_for_each_entry(comp, &class->components, class_list)
+ mpam_enable_merge_class_features(comp);
+ }
+}
+
static void mpam_enable_once(void)
{
+ mutex_lock(&mpam_list_lock);
+ mpam_enable_merge_features(&mpam_classes);
+ mutex_unlock(&mpam_list_lock);
+
/*
* Once the cpuhp callbacks have been changed, mpam_partid_max can no
* longer change.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f6cd4a68cce..a2b0ff411138 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -186,12 +186,20 @@ static inline void mpam_set_feature(enum mpam_device_features feat,
props->features |= (1 << feat);
}
+static inline void mpam_clear_feature(enum mpam_device_features feat,
+ mpam_features_t *supported)
+{
+ *supported &= ~(1 << feat);
+}
+
struct mpam_class {
/* mpam_components in this class */
struct list_head components;
cpumask_t affinity;
+ struct mpam_props props;
+ u32 nrdy_usec;
u8 level;
enum mpam_class_types type;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (51 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
` (14 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtyied. e.g. Kexec.
Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.
MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.
If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.
To reset, write the maximum values for all discovered controls.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Last bitmap write will always be non-zero.
* Dropped READ_ONCE() - teh value can no longer change.
---
drivers/resctrl/mpam_devices.c | 121 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 8 +++
2 files changed, 129 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bb62de6d3847..c1f01dd748ad 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
#include <linux/atomic.h>
#include <linux/arm_mpam.h>
#include <linux/bitfield.h>
+#include <linux/bitmap.h>
#include <linux/cacheinfo.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
@@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
return 0;
}
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+ u32 num_words, msb;
+ u32 bm = ~0;
+ int i;
+
+ lockdep_assert_held(&msc->part_sel_lock);
+
+ if (wd == 0)
+ return;
+
+ /*
+ * Write all ~0 to all but the last 32bit-word, which may
+ * have fewer bits...
+ */
+ num_words = DIV_ROUND_UP(wd, 32);
+ for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+ __mpam_write_reg(msc, reg, bm);
+
+ /*
+ * ....and then the last (maybe) partial 32bit word. When wd is a
+ * multiple of 32, msb should be 31 to write a full 32bit word.
+ */
+ msb = (wd - 1) % 32;
+ bm = GENMASK(msb, 0);
+ __mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+ u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
+ struct mpam_msc *msc = ris->vmsc->msc;
+ struct mpam_props *rprops = &ris->props;
+
+ mpam_assert_srcu_read_lock_held();
+
+ mutex_lock(&msc->part_sel_lock);
+ __mpam_part_sel(ris->ris_idx, partid, msc);
+
+ if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+ mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+ if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+ mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+ if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+ mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+ if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+ mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+
+ if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
+ mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+ mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+ u16 partid, partid_max;
+
+ mpam_assert_srcu_read_lock_held();
+
+ if (ris->in_reset_state)
+ return;
+
+ spin_lock(&partid_max_lock);
+ partid_max = mpam_partid_max;
+ spin_unlock(&partid_max_lock);
+ for (partid = 0; partid < partid_max; partid++)
+ mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+ int idx;
+ struct mpam_msc_ris *ris;
+
+ mpam_assert_srcu_read_lock_held();
+
+ mpam_mon_sel_outer_lock(msc);
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+ mpam_reset_ris(ris);
+
+ /*
+ * Set in_reset_state when coming online. The reset state
+ * for non-zero partid may be lost while the CPUs are offline.
+ */
+ ris->in_reset_state = online;
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+ mpam_mon_sel_outer_unlock(msc);
+}
+
static int mpam_cpu_online(unsigned int cpu)
{
+ int idx;
+ struct mpam_msc *msc;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ if (!cpumask_test_cpu(cpu, &msc->accessibility))
+ continue;
+
+ if (atomic_fetch_inc(&msc->online_refs) == 0)
+ mpam_reset_msc(msc, true);
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
return 0;
}
@@ -886,6 +994,19 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
static int mpam_cpu_offline(unsigned int cpu)
{
+ int idx;
+ struct mpam_msc *msc;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ if (!cpumask_test_cpu(cpu, &msc->accessibility))
+ continue;
+
+ if (atomic_dec_and_test(&msc->online_refs))
+ mpam_reset_msc(msc, false);
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
return 0;
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a2b0ff411138..466d670a01eb 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
#define MPAM_INTERNAL_H
#include <linux/arm_mpam.h>
+#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/io.h>
#include <linux/llist.h>
@@ -43,6 +44,7 @@ struct mpam_msc {
struct pcc_mbox_chan *pcc_chan;
u32 nrdy_usec;
cpumask_t accessibility;
+ atomic_t online_refs;
/*
* probe_lock is only take during discovery. After discovery these
@@ -248,6 +250,7 @@ struct mpam_msc_ris {
u8 ris_idx;
u64 idr;
struct mpam_props props;
+ bool in_reset_state;
cpumask_t affinity;
@@ -267,6 +270,11 @@ struct mpam_msc_ris {
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
+static inline void mpam_assert_srcu_read_lock_held(void)
+{
+ WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+}
+
/* System wide partid/pmg values */
extern u16 mpam_partid_max;
extern u8 mpam_pmg_max;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (52 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
` (13 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.
Add a helper that schedules the provided function if necessary.
Prevent the cpuhp callbacks from changing the MSC state by taking the
cpuhp lock.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c1f01dd748ad..759244966736 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -906,20 +906,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
mutex_unlock(&msc->part_sel_lock);
}
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
{
u16 partid, partid_max;
+ struct mpam_msc_ris *ris = arg;
mpam_assert_srcu_read_lock_held();
if (ris->in_reset_state)
- return;
+ return 0;
spin_lock(&partid_max_lock);
partid_max = mpam_partid_max;
spin_unlock(&partid_max_lock);
for (partid = 0; partid < partid_max; partid++)
mpam_reset_ris_partid(ris, partid);
+
+ return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+ int cpu = raw_smp_processor_id();
+
+ if (cpumask_test_cpu(cpu, &msc->accessibility))
+ return cpu;
+
+ return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+ lockdep_assert_irqs_enabled();
+ lockdep_assert_cpus_held();
+ mpam_assert_srcu_read_lock_held();
+
+ return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
}
static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -932,7 +963,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
mpam_mon_sel_outer_lock(msc);
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
- mpam_reset_ris(ris);
+ mpam_touch_msc(msc, &mpam_reset_ris, ris);
/*
* Set in_reset_state when coming online. The reset state
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (53 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
` (12 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.
Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 62 +++++++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 1 +
2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 759244966736..3516cbe8623e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -915,8 +915,6 @@ static int mpam_reset_ris(void *arg)
u16 partid, partid_max;
struct mpam_msc_ris *ris = arg;
- mpam_assert_srcu_read_lock_held();
-
if (ris->in_reset_state)
return 0;
@@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
mpam_partid_max + 1, mpam_pmg_max + 1);
}
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+ int idx;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ might_sleep();
+ lockdep_assert_cpus_held();
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ if (!ris->in_reset_state)
+ mpam_touch_msc(msc, mpam_reset_ris, ris);
+ ris->in_reset_state = true;
+ }
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+ int idx;
+ struct mpam_component *comp;
+
+ lockdep_assert_cpus_held();
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(comp, &class->components, class_list)
+ mpam_reset_component_locked(comp);
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+ cpus_read_lock();
+ mpam_reset_class_locked(class);
+ cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
+void mpam_disable(void)
+{
+ int idx;
+ struct mpam_class *class;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu))
+ mpam_reset_class(class);
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
/*
* Enable mpam once all devices have been probed.
* Scheduled by mpam_discovery_cpu_online() once all devices have been created.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 466d670a01eb..b30fee2b7674 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -281,6 +281,7 @@ extern u8 mpam_pmg_max;
/* Scheduled work callback to enable mpam once all MSC have been probed */
void mpam_enable(struct work_struct *work);
+void mpam_disable(void);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 22/33] arm_mpam: Register and enable IRQs
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (54 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-08-22 15:30 ` James Morse
2025-09-01 10:05 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
` (11 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.
Only the irq handler accesses the ESR register, so no locking is needed.
The work to disable MPAM after an error needs to happen at process
context, use a threaded interrupt.
There is no support for percpu threaded interrupts, for now schedule
the work to be done from the irq handler.
Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.
Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.
CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Use guard marco when walking srcu list.
* Use INTEN macro for enabling interrupts.
* Move partid_max_published up earlier in mpam_enable_once().
---
drivers/resctrl/mpam_devices.c | 311 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 9 +-
2 files changed, 312 insertions(+), 8 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3516cbe8623e..210d64fad0b1 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
#include <linux/device.h>
#include <linux/errno.h>
#include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
#include <linux/list.h>
#include <linux/lockdep.h>
#include <linux/mutex.h>
@@ -62,6 +65,12 @@ static DEFINE_SPINLOCK(partid_max_lock);
*/
static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
/*
* An MSC is a physical container for controls and monitors, each identified by
* their RIS index. These share a base-address, interrupts and some MMIO
@@ -159,6 +168,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
return (idr_high << 32) | idr_low;
}
+static void mpam_msc_zero_esr(struct mpam_msc *msc)
+{
+ __mpam_write_reg(msc, MPAMF_ESR, 0);
+ if (msc->has_extd_esr)
+ __mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+ u64 esr_high = 0, esr_low;
+
+ esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+ if (msc->has_extd_esr)
+ esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+ return (esr_high << 32) | esr_low;
+}
+
static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
{
lockdep_assert_held(&msc->part_sel_lock);
@@ -405,12 +432,12 @@ static void mpam_msc_destroy(struct mpam_msc *msc)
lockdep_assert_held(&mpam_list_lock);
- list_del_rcu(&msc->glbl_list);
- platform_set_drvdata(pdev, NULL);
-
list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
mpam_ris_destroy(ris);
+ list_del_rcu(&msc->glbl_list);
+ platform_set_drvdata(pdev, NULL);
+
add_to_garbage(msc);
msc->garbage.pdev = pdev;
}
@@ -828,6 +855,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
msc->partid_max = min(msc->partid_max, partid_max);
msc->pmg_max = min(msc->pmg_max, pmg_max);
+ msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
ris = mpam_get_or_create_ris(msc, ris_idx);
if (IS_ERR(ris))
@@ -840,6 +868,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
mutex_unlock(&msc->part_sel_lock);
}
+ /* Clear any stale errors */
+ mpam_msc_zero_esr(msc);
+
spin_lock(&partid_max_lock);
mpam_partid_max = min(mpam_partid_max, msc->partid_max);
mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -973,6 +1004,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
mpam_mon_sel_outer_unlock(msc);
}
+static void _enable_percpu_irq(void *_irq)
+{
+ int *irq = _irq;
+
+ enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
static int mpam_cpu_online(unsigned int cpu)
{
int idx;
@@ -983,6 +1021,9 @@ static int mpam_cpu_online(unsigned int cpu)
if (!cpumask_test_cpu(cpu, &msc->accessibility))
continue;
+ if (msc->reenable_error_ppi)
+ _enable_percpu_irq(&msc->reenable_error_ppi);
+
if (atomic_fetch_inc(&msc->online_refs) == 0)
mpam_reset_msc(msc, true);
}
@@ -1031,6 +1072,9 @@ static int mpam_cpu_offline(unsigned int cpu)
if (!cpumask_test_cpu(cpu, &msc->accessibility))
continue;
+ if (msc->reenable_error_ppi)
+ disable_percpu_irq(msc->reenable_error_ppi);
+
if (atomic_dec_and_test(&msc->online_refs))
mpam_reset_msc(msc, false);
}
@@ -1057,6 +1101,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
mutex_unlock(&mpam_cpuhp_state_lock);
}
+static int __setup_ppi(struct mpam_msc *msc)
+{
+ int cpu;
+
+ msc->error_dev_id = alloc_percpu_gfp(struct mpam_msc *, GFP_KERNEL);
+ if (!msc->error_dev_id)
+ return -ENOMEM;
+
+ for_each_cpu(cpu, &msc->accessibility) {
+ struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
+
+ if (empty) {
+ pr_err_once("%s shares PPI with %s!\n",
+ dev_name(&msc->pdev->dev),
+ dev_name(&empty->pdev->dev));
+ return -EBUSY;
+ }
+ *per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+ }
+
+ return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+ int irq;
+
+ irq = platform_get_irq_byname_optional(msc->pdev, "error");
+ if (irq <= 0)
+ return 0;
+
+ /* Allocate and initialise the percpu device pointer for PPI */
+ if (irq_is_percpu(irq))
+ return __setup_ppi(msc);
+
+ /* sanity check: shared interrupts can be routed anywhere? */
+ if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+ pr_err_once("msc:%u is a private resource with a shared error interrupt",
+ msc->id);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int mpam_dt_count_msc(void)
{
int count = 0;
@@ -1265,6 +1354,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
break;
}
+ err = mpam_msc_setup_error_irq(msc);
+ if (err)
+ break;
+
if (device_property_read_u32(&pdev->dev, "pcc-channel",
&msc->pcc_subspace_id))
msc->iface = MPAM_IFACE_MMIO;
@@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
}
}
+static char *mpam_errcode_names[16] = {
+ [0] = "No error",
+ [1] = "PARTID_SEL_Range",
+ [2] = "Req_PARTID_Range",
+ [3] = "MSMONCFG_ID_RANGE",
+ [4] = "Req_PMG_Range",
+ [5] = "Monitor_Range",
+ [6] = "intPARTID_Range",
+ [7] = "Unexpected_INTERNAL",
+ [8] = "Undefined_RIS_PART_SEL",
+ [9] = "RIS_No_Control",
+ [10] = "Undefined_RIS_MON_SEL",
+ [11] = "RIS_No_Monitor",
+ [12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+ struct mpam_msc *msc = _msc;
+
+ __mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+ return 0;
+}
+
+static int mpam_disable_msc_ecr(void *_msc)
+{
+ struct mpam_msc *msc = _msc;
+
+ __mpam_write_reg(msc, MPAMF_ECR, 0);
+
+ return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+ u64 reg;
+ u16 partid;
+ u8 errcode, pmg, ris;
+
+ if (WARN_ON_ONCE(!msc) ||
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+ &msc->accessibility)))
+ return IRQ_NONE;
+
+ reg = mpam_msc_read_esr(msc);
+
+ errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+ if (!errcode)
+ return IRQ_NONE;
+
+ /* Clear level triggered irq */
+ mpam_msc_zero_esr(msc);
+
+ partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+ pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+ ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+ pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+ msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
+
+ if (irq_is_percpu(irq)) {
+ mpam_disable_msc_ecr(msc);
+ schedule_work(&mpam_broken_work);
+ return IRQ_HANDLED;
+ }
+
+ return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+ struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+ return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+ struct mpam_msc *msc = dev_id;
+
+ return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id);
+
+static int mpam_register_irqs(void)
+{
+ int err, irq;
+ struct mpam_msc *msc;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ irq = platform_get_irq_byname_optional(msc->pdev, "error");
+ if (irq <= 0)
+ continue;
+
+ /* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+ /* We anticipate sharing the interrupt with other MSCs */
+ if (irq_is_percpu(irq)) {
+ err = request_percpu_irq(irq, &mpam_ppi_handler,
+ "mpam:msc:error",
+ msc->error_dev_id);
+ if (err)
+ return err;
+
+ msc->reenable_error_ppi = irq;
+ smp_call_function_many(&msc->accessibility,
+ &_enable_percpu_irq, &irq,
+ true);
+ } else {
+ err = devm_request_threaded_irq(&msc->pdev->dev, irq,
+ &mpam_spi_handler,
+ &mpam_disable_thread,
+ IRQF_SHARED,
+ "mpam:msc:error", msc);
+ if (err)
+ return err;
+ }
+
+ msc->error_irq_requested = true;
+ mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+ msc->error_irq_hw_enabled = true;
+ }
+
+ return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+ int irq, idx;
+ struct mpam_msc *msc;
+
+ cpus_read_lock();
+ /* take the lock as free_irq() can sleep */
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+ irq = platform_get_irq_byname_optional(msc->pdev, "error");
+ if (irq <= 0)
+ continue;
+
+ if (msc->error_irq_hw_enabled) {
+ mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+ msc->error_irq_hw_enabled = false;
+ }
+
+ if (msc->error_irq_requested) {
+ if (irq_is_percpu(irq)) {
+ msc->reenable_error_ppi = 0;
+ free_percpu_irq(irq, msc->error_dev_id);
+ } else {
+ devm_free_irq(&msc->pdev->dev, irq, msc);
+ }
+ msc->error_irq_requested = false;
+ }
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+ cpus_read_unlock();
+}
+
static void mpam_enable_once(void)
{
- mutex_lock(&mpam_list_lock);
- mpam_enable_merge_features(&mpam_classes);
- mutex_unlock(&mpam_list_lock);
+ int err;
/*
* Once the cpuhp callbacks have been changed, mpam_partid_max can no
@@ -1561,6 +1814,27 @@ static void mpam_enable_once(void)
partid_max_published = true;
spin_unlock(&partid_max_lock);
+ /*
+ * If all the MSC have been probed, enabling the IRQs happens next.
+ * That involves cross-calling to a CPU that can reach the MSC, and
+ * the locks must be taken in this order:
+ */
+ cpus_read_lock();
+ mutex_lock(&mpam_list_lock);
+ mpam_enable_merge_features(&mpam_classes);
+
+ err = mpam_register_irqs();
+ if (err)
+ pr_warn("Failed to register irqs: %d\n", err);
+
+ mutex_unlock(&mpam_list_lock);
+ cpus_read_unlock();
+
+ if (err) {
+ schedule_work(&mpam_broken_work);
+ return;
+ }
+
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
* All of MPAMs errors indicate a software bug, restore any modified
* controls to their reset values.
*/
-void mpam_disable(void)
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
{
int idx;
struct mpam_class *class;
+ struct mpam_msc *msc, *tmp;
+
+ mutex_lock(&mpam_cpuhp_state_lock);
+ if (mpam_cpuhp_state) {
+ cpuhp_remove_state(mpam_cpuhp_state);
+ mpam_cpuhp_state = 0;
+ }
+ mutex_unlock(&mpam_cpuhp_state_lock);
+
+ mpam_unregister_irqs();
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(class, &mpam_classes, classes_list,
srcu_read_lock_held(&mpam_srcu))
mpam_reset_class(class);
srcu_read_unlock(&mpam_srcu, idx);
+
+ mutex_lock(&mpam_list_lock);
+ list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
+ mpam_msc_destroy(msc);
+ mutex_unlock(&mpam_list_lock);
+ mpam_free_garbage();
+
+ return IRQ_HANDLED;
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+ mpam_disable_thread(0, NULL);
}
/*
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b30fee2b7674..c9418c9cf9f2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -44,6 +44,11 @@ struct mpam_msc {
struct pcc_mbox_chan *pcc_chan;
u32 nrdy_usec;
cpumask_t accessibility;
+ bool has_extd_esr;
+
+ int reenable_error_ppi;
+ struct mpam_msc * __percpu *error_dev_id;
+
atomic_t online_refs;
/*
@@ -52,6 +57,8 @@ struct mpam_msc {
*/
struct mutex probe_lock;
bool probed;
+ bool error_irq_requested;
+ bool error_irq_hw_enabled;
u16 partid_max;
u8 pmg_max;
unsigned long ris_idxs[128 / BITS_PER_LONG];
@@ -281,7 +288,7 @@ extern u8 mpam_pmg_max;
/* Scheduled work callback to enable mpam once all MSC have been probed */
void mpam_enable(struct work_struct *work);
-void mpam_disable(void);
+void mpam_disable(struct work_struct *work);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (55 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
` (10 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.
After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.
Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in repsonse to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 8 ++++++++
drivers/resctrl/mpam_internal.h | 8 ++++++++
2 files changed, 16 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 210d64fad0b1..b424af666b1e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -33,6 +33,8 @@
#include "mpam_internal.h"
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* TODO: move to arch code */
+
/*
* mpam_list_lock protects the SRCU lists when writing. Once the
* mpam_enabled key is enabled these lists are read-only,
@@ -1039,6 +1041,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
struct mpam_msc *msc;
bool new_device_probed = false;
+ if (mpam_is_enabled())
+ return 0;
+
mutex_lock(&mpam_list_lock);
list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
if (!cpumask_test_cpu(cpu, &msc->accessibility))
@@ -1835,6 +1840,7 @@ static void mpam_enable_once(void)
return;
}
+ static_branch_enable(&mpam_enabled);
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1902,6 +1908,8 @@ static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
}
mutex_unlock(&mpam_cpuhp_state_lock);
+ static_branch_disable(&mpam_enabled);
+
mpam_unregister_irqs();
idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c9418c9cf9f2..3476ee97f8ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -8,6 +8,7 @@
#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/io.h>
+#include <linux/jump_label.h>
#include <linux/llist.h>
#include <linux/mailbox_client.h>
#include <linux/mutex.h>
@@ -15,6 +16,13 @@
#include <linux/sizes.h>
#include <linux/srcu.h>
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+ return static_branch_likely(&mpam_enabled);
+}
+
/*
* Structures protected by SRCU may not be freed for a surprising amount of
* time (especially if perf is running). To ensure the MPAM error interrupt can
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (56 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
` (9 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
When CPUs come online the original configuration should be restored.
Once the maximum partid is known, allocate an configuration array for
each component, and reprogram each RIS configuration from this.
The MPAM spec describes how multiple controls can interact. To prevent
this happening by accident, always reset controls that don't have a
valid configuration. This allows the same helper to be used for
configuration and reset.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Added a comment about the ordering around max_partid.
* Allocate configurations after interrupts are registered to reduce churn.
* Added mpam_assert_partid_sizes_fixed();
---
drivers/resctrl/mpam_devices.c | 253 +++++++++++++++++++++++++++++---
drivers/resctrl/mpam_internal.h | 26 +++-
2 files changed, 251 insertions(+), 28 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b424af666b1e..8f6df2406c22 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -112,6 +112,16 @@ LIST_HEAD(mpam_classes);
/* List of all objects that can be free()d after synchronise_srcu() */
static LLIST_HEAD(mpam_garbage);
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+ WARN_ON_ONCE(!partid_max_published);
+}
+
static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
{
WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
@@ -374,12 +384,16 @@ static void mpam_class_destroy(struct mpam_class *class)
add_to_garbage(class);
}
+static void __destroy_component_cfg(struct mpam_component *comp);
+
static void mpam_comp_destroy(struct mpam_component *comp)
{
struct mpam_class *class = comp->class;
lockdep_assert_held(&mpam_list_lock);
+ __destroy_component_cfg(comp);
+
list_del_rcu(&comp->class_list);
add_to_garbage(comp);
@@ -911,51 +925,90 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
__mpam_write_reg(msc, reg, bm);
}
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+ struct mpam_config *cfg)
{
u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
struct mpam_msc *msc = ris->vmsc->msc;
struct mpam_props *rprops = &ris->props;
- mpam_assert_srcu_read_lock_held();
-
mutex_lock(&msc->part_sel_lock);
__mpam_part_sel(ris->ris_idx, partid, msc);
- if (mpam_has_feature(mpam_feat_cpor_part, rprops))
- mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+ if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
+ if (mpam_has_feature(mpam_feat_cpor_part, cfg))
+ mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+ else
+ mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+ rprops->cpbm_wd);
+ }
- if (mpam_has_feature(mpam_feat_mbw_part, rprops))
- mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+ if (mpam_has_feature(mpam_feat_mbw_part, rprops)) {
+ if (mpam_has_feature(mpam_feat_mbw_part, cfg))
+ mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+ else
+ mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+ rprops->mbw_pbm_bits);
+ }
if (mpam_has_feature(mpam_feat_mbw_min, rprops))
mpam_write_partsel_reg(msc, MBW_MIN, 0);
- if (mpam_has_feature(mpam_feat_mbw_max, rprops))
- mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+ if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
+ if (mpam_has_feature(mpam_feat_mbw_max, cfg))
+ mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
+ else
+ mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+ }
if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
mutex_unlock(&msc->part_sel_lock);
}
+struct reprogram_ris {
+ struct mpam_msc_ris *ris;
+ struct mpam_config *cfg;
+};
+
+/* Call with MSC lock held */
+static int mpam_reprogram_ris(void *_arg)
+{
+ u16 partid, partid_max;
+ struct reprogram_ris *arg = _arg;
+ struct mpam_msc_ris *ris = arg->ris;
+ struct mpam_config *cfg = arg->cfg;
+
+ if (ris->in_reset_state)
+ return 0;
+
+ spin_lock(&partid_max_lock);
+ partid_max = mpam_partid_max;
+ spin_unlock(&partid_max_lock);
+ for (partid = 0; partid <= partid_max; partid++)
+ mpam_reprogram_ris_partid(ris, partid, cfg);
+
+ return 0;
+}
+
/*
* Called via smp_call_on_cpu() to prevent migration, while still being
* pre-emptible.
*/
static int mpam_reset_ris(void *arg)
{
- u16 partid, partid_max;
struct mpam_msc_ris *ris = arg;
+ struct reprogram_ris reprogram_arg;
+ struct mpam_config empty_cfg = { 0 };
if (ris->in_reset_state)
return 0;
- spin_lock(&partid_max_lock);
- partid_max = mpam_partid_max;
- spin_unlock(&partid_max_lock);
- for (partid = 0; partid < partid_max; partid++)
- mpam_reset_ris_partid(ris, partid);
+ reprogram_arg.ris = ris;
+ reprogram_arg.cfg = &empty_cfg;
+
+ mpam_reprogram_ris(&reprogram_arg);
return 0;
}
@@ -986,13 +1039,11 @@ static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
static void mpam_reset_msc(struct mpam_msc *msc, bool online)
{
- int idx;
struct mpam_msc_ris *ris;
mpam_assert_srcu_read_lock_held();
mpam_mon_sel_outer_lock(msc);
- idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
mpam_touch_msc(msc, &mpam_reset_ris, ris);
@@ -1002,10 +1053,42 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
*/
ris->in_reset_state = online;
}
- srcu_read_unlock(&mpam_srcu, idx);
mpam_mon_sel_outer_unlock(msc);
}
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+ u16 partid;
+ bool reset;
+ struct mpam_config *cfg;
+ struct mpam_msc_ris *ris;
+
+ /*
+ * No lock for mpam_partid_max as partid_max_published has been
+ * set by mpam_enabled(), so the values can no longer change.
+ */
+ mpam_assert_partid_sizes_fixed();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_rcu(ris, &msc->ris, msc_list) {
+ if (!mpam_is_enabled() && !ris->in_reset_state) {
+ mpam_touch_msc(msc, &mpam_reset_ris, ris);
+ ris->in_reset_state = true;
+ continue;
+ }
+
+ reset = true;
+ for (partid = 0; partid <= mpam_partid_max; partid++) {
+ cfg = &ris->vmsc->comp->cfg[partid];
+ if (cfg->features)
+ reset = false;
+
+ mpam_reprogram_ris_partid(ris, partid, cfg);
+ }
+ ris->in_reset_state = reset;
+ }
+}
+
static void _enable_percpu_irq(void *_irq)
{
int *irq = _irq;
@@ -1027,7 +1110,7 @@ static int mpam_cpu_online(unsigned int cpu)
_enable_percpu_irq(&msc->reenable_error_ppi);
if (atomic_fetch_inc(&msc->online_refs) == 0)
- mpam_reset_msc(msc, true);
+ mpam_reprogram_msc(msc);
}
srcu_read_unlock(&mpam_srcu, idx);
@@ -1807,6 +1890,45 @@ static void mpam_unregister_irqs(void)
cpus_read_unlock();
}
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+ add_to_garbage(comp->cfg);
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+ mpam_assert_partid_sizes_fixed();
+
+ if (comp->cfg)
+ return 0;
+
+ comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+ if (!comp->cfg)
+ return -ENOMEM;
+ init_garbage(comp->cfg);
+
+ return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+ int err = 0;
+ struct mpam_class *class;
+ struct mpam_component *comp;
+
+ lockdep_assert_held(&mpam_list_lock);
+
+ list_for_each_entry(class, &mpam_classes, classes_list) {
+ list_for_each_entry(comp, &class->components, class_list) {
+ err = __allocate_component_cfg(comp);
+ if (err)
+ return err;
+ }
+ }
+
+ return 0;
+}
+
static void mpam_enable_once(void)
{
int err;
@@ -1826,12 +1948,21 @@ static void mpam_enable_once(void)
*/
cpus_read_lock();
mutex_lock(&mpam_list_lock);
- mpam_enable_merge_features(&mpam_classes);
+ do {
+ mpam_enable_merge_features(&mpam_classes);
- err = mpam_register_irqs();
- if (err)
- pr_warn("Failed to register irqs: %d\n", err);
+ err = mpam_register_irqs();
+ if (err) {
+ pr_warn("Failed to register irqs: %d\n", err);
+ break;
+ }
+ err = mpam_allocate_config();
+ if (err) {
+ pr_err("Failed to allocate configuration arrays.\n");
+ break;
+ }
+ } while (0);
mutex_unlock(&mpam_list_lock);
cpus_read_unlock();
@@ -1856,6 +1987,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
might_sleep();
lockdep_assert_cpus_held();
+ mpam_assert_partid_sizes_fixed();
+
+ memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
@@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
mpam_enable_once();
}
+struct mpam_write_config_arg {
+ struct mpam_msc_ris *ris;
+ struct mpam_component *comp;
+ u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+ struct mpam_write_config_arg *c = arg;
+
+ mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+ return 0;
+}
+
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+ if (mpam_has_feature(feature, newcfg) && \
+ (newcfg)->member != (cfg)->member) { \
+ (cfg)->member = (newcfg)->member; \
+ cfg->features |= (1 << feature); \
+ \
+ (changes) |= (1 << feature); \
+ } \
+} while (0)
+
+static mpam_features_t mpam_update_config(struct mpam_config *cfg,
+ const struct mpam_config *newcfg)
+{
+ mpam_features_t changes = 0;
+
+ maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
+ maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
+ maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
+
+ return changes;
+}
+
+/* TODO: split into write_config/sync_config */
+/* TODO: add config_dirty bitmap to drive sync_config */
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+ struct mpam_config *cfg)
+{
+ struct mpam_write_config_arg arg;
+ struct mpam_msc_ris *ris;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc *msc;
+ int idx;
+
+ lockdep_assert_cpus_held();
+
+ /* Don't pass in the current config! */
+ WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+ if (!mpam_update_config(&comp->cfg[partid], cfg))
+ return 0;
+
+ arg.comp = comp;
+ arg.partid = partid;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ arg.ris = ris;
+ mpam_touch_msc(msc, __write_config, &arg);
+ }
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
+ return 0;
+}
+
/*
* MSC that are hidden under caches are not created as platform devices
* as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 3476ee97f8ac..70cba9f22746 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -191,11 +191,7 @@ struct mpam_props {
u16 num_mbwu_mon;
};
-static inline bool mpam_has_feature(enum mpam_device_features feat,
- struct mpam_props *props)
-{
- return (1 << feat) & props->features;
-}
+#define mpam_has_feature(_feat, x) ((1 << (_feat)) & (x)->features)
static inline void mpam_set_feature(enum mpam_device_features feat,
struct mpam_props *props)
@@ -226,6 +222,17 @@ struct mpam_class {
struct mpam_garbage garbage;
};
+struct mpam_config {
+ /* Which configuration values are valid. 0 is used for reset */
+ mpam_features_t features;
+
+ u32 cpbm;
+ u32 mbw_pbm;
+ u16 mbw_max;
+
+ struct mpam_garbage garbage;
+};
+
struct mpam_component {
u32 comp_id;
@@ -234,6 +241,12 @@ struct mpam_component {
cpumask_t affinity;
+ /*
+ * Array of configuration values, indexed by partid.
+ * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+ */
+ struct mpam_config *cfg;
+
/* member of mpam_class:components */
struct list_head class_list;
@@ -298,6 +311,9 @@ extern u8 mpam_pmg_max;
void mpam_enable(struct work_struct *work);
void mpam_disable(struct work_struct *work);
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+ struct mpam_config *cfg);
+
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (57 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
` (8 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew,
Zeng Heng, Dave Martin
MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.
Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.
PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 175 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 16 ++-
2 files changed, 189 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8f6df2406c22..aedd743d6827 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -213,6 +213,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
__mpam_part_sel_raw(partsel, msc);
}
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+ u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+ FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+ MPAMCFG_PART_SEL_INTERNAL;
+
+ __mpam_part_sel_raw(partsel, msc);
+}
+
int mpam_register_requestor(u16 partid_max, u8 pmg_max)
{
int err = 0;
@@ -743,10 +752,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
int err;
struct mpam_msc *msc = ris->vmsc->msc;
struct mpam_props *props = &ris->props;
+ struct mpam_class *class = ris->vmsc->comp->class;
lockdep_assert_held(&msc->probe_lock);
lockdep_assert_held(&msc->part_sel_lock);
+ /* Cache Capacity Partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+ u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+ props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+ if (props->cmax_wd &&
+ FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+ if (props->cmax_wd &&
+ !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+ if (props->cmax_wd &&
+ FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+ props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+
+ if (props->cassoc_wd &&
+ FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+ mpam_set_feature(mpam_feat_cmax_cassoc, props);
+ }
+
/* Cache Portion partitioning */
if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -769,6 +803,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
mpam_set_feature(mpam_feat_mbw_max, props);
+
+ if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_min, props);
+
+ if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+ mpam_set_feature(mpam_feat_mbw_prop, props);
+ }
+
+ /* Priority partitioning */
+ if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+ u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+ props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+ if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+ mpam_set_feature(mpam_feat_intpri_part, props);
+ if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+ mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+ }
+
+ props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+ if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+ mpam_set_feature(mpam_feat_dspri_part, props);
+ if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+ mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+ }
}
/* Performance Monitoring */
@@ -832,6 +891,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
*/
}
}
+
+ /*
+ * RIS with PARTID narrowing don't have enough storage for one
+ * configuration per PARTID. If these are in a class we could use,
+ * reduce the supported partid_max to match the number of intpartid.
+ * If the class is unknown, just ignore it.
+ */
+ if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+ class->type != MPAM_CLASS_UNKNOWN) {
+ u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+ u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+ mpam_set_feature(mpam_feat_partid_nrw, props);
+ msc->partid_max = min(msc->partid_max, partid_max);
+ }
}
static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -929,13 +1003,29 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
struct mpam_config *cfg)
{
+ u32 pri_val = 0;
+ u16 cmax = MPAMCFG_CMAX_CMAX;
u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
struct mpam_msc *msc = ris->vmsc->msc;
struct mpam_props *rprops = &ris->props;
+ u16 dspri = GENMASK(rprops->dspri_wd, 0);
+ u16 intpri = GENMASK(rprops->intpri_wd, 0);
mutex_lock(&msc->part_sel_lock);
__mpam_part_sel(ris->ris_idx, partid, msc);
+ if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+ /* Update the intpartid mapping */
+ mpam_write_partsel_reg(msc, INTPARTID,
+ MPAMCFG_INTPARTID_INTERNAL | partid);
+
+ /*
+ * Then switch to the 'internal' partid to update the
+ * configuration.
+ */
+ __mpam_intpart_sel(ris->ris_idx, partid, msc);
+ }
+
if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
if (mpam_has_feature(mpam_feat_cpor_part, cfg))
mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
@@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+
+ if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+ mpam_write_partsel_reg(msc, CMAX, cmax);
+
+ if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+ mpam_write_partsel_reg(msc, CMIN, 0);
+
+ if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+ mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+ /* aces high? */
+ if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+ intpri = 0;
+ if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+ dspri = 0;
+
+ if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+ pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+ if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+ pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+ mpam_write_partsel_reg(msc, PRI, pri_val);
+ }
+
mutex_unlock(&msc->part_sel_lock);
}
@@ -1529,6 +1642,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
return false;
}
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+ if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+ return true;
+ if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+ return true;
+ return false;
+}
+
#define MISMATCHED_HELPER(parent, child, helper, field, alias) \
helper(parent) && \
((helper(child) && (parent)->field != (child)->field) || \
@@ -1583,6 +1706,23 @@ static void __props_mismatch(struct mpam_props *parent,
parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
}
+ if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+ parent->cmax_wd = child->cmax_wd;
+ } else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+ cmax_wd, alias)) {
+ pr_debug("%s took the min cmax_wd\n", __func__);
+ parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+ parent->cassoc_wd = child->cassoc_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+ cassoc_wd, alias)) {
+ pr_debug("%s cleared cassoc_wd\n", __func__);
+ mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
+ parent->cassoc_wd = 0;
+ }
+
/* For num properties, take the minimum */
if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
parent->num_csu_mon = child->num_csu_mon;
@@ -1600,6 +1740,41 @@ static void __props_mismatch(struct mpam_props *parent,
parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
}
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+ parent->intpri_wd = child->intpri_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+ intpri_wd, alias)) {
+ pr_debug("%s took the min intpri_wd\n", __func__);
+ parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+ }
+
+ if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+ parent->dspri_wd = child->dspri_wd;
+ } else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+ dspri_wd, alias)) {
+ pr_debug("%s took the min dspri_wd\n", __func__);
+ parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+ }
+
+ /* TODO: alias support for these two */
+ /* {int,ds}pri may not have differing 0-low behaviour */
+ if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+ (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+ mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+ mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+ pr_debug("%s cleared intpri_part\n", __func__);
+ mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
+ mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
+ }
+ if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+ (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+ mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+ mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+ pr_debug("%s cleared dspri_part\n", __func__);
+ mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
+ mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
+ }
+
if (alias) {
/* Merge features for aliased resources */
parent->features |= child->features;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 70cba9f22746..23445aedbabd 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -157,16 +157,23 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
* When we compact the supported features, we don't care what they are.
* Storing them as a bitmap makes life easy.
*/
-typedef u16 mpam_features_t;
+typedef u32 mpam_features_t;
/* Bits for mpam_features_t */
enum mpam_device_features {
- mpam_feat_ccap_part = 0,
+ mpam_feat_cmax_softlim,
+ mpam_feat_cmax_cmax,
+ mpam_feat_cmax_cmin,
+ mpam_feat_cmax_cassoc,
mpam_feat_cpor_part,
mpam_feat_mbw_part,
mpam_feat_mbw_min,
mpam_feat_mbw_max,
mpam_feat_mbw_prop,
+ mpam_feat_intpri_part,
+ mpam_feat_intpri_part_0_low,
+ mpam_feat_dspri_part,
+ mpam_feat_dspri_part_0_low,
mpam_feat_msmon,
mpam_feat_msmon_csu,
mpam_feat_msmon_csu_capture,
@@ -176,6 +183,7 @@ enum mpam_device_features {
mpam_feat_msmon_mbwu_rwbw,
mpam_feat_msmon_mbwu_hw_nrdy,
mpam_feat_msmon_capt,
+ mpam_feat_partid_nrw,
MPAM_FEATURE_LAST,
};
static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
@@ -187,6 +195,10 @@ struct mpam_props {
u16 cpbm_wd;
u16 mbw_pbm_bits;
u16 bwa_wd;
+ u16 cmax_wd;
+ u16 cassoc_wd;
+ u16 intpri_wd;
+ u16 dspri_wd;
u16 num_csu_mon;
u16 num_mbwu_mon;
};
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 26/33] arm_mpam: Add helpers to allocate monitors
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (58 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
` (7 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 2 ++
drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index aedd743d6827..e7e00c632512 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -348,6 +348,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
class->level = level_idx;
class->type = type;
INIT_LIST_HEAD_RCU(&class->classes_list);
+ ida_init(&class->ida_csu_mon);
+ ida_init(&class->ida_mbwu_mon);
list_add_rcu(&class->classes_list, &mpam_classes);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 23445aedbabd..4981de120869 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -231,6 +231,9 @@ struct mpam_class {
/* member of mpam_classes */
struct list_head classes_list;
+ struct ida ida_csu_mon;
+ struct ida ida_mbwu_mon;
+
struct mpam_garbage garbage;
};
@@ -306,6 +309,38 @@ struct mpam_msc_ris {
struct mpam_garbage garbage;
};
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+ return -EOPNOTSUPP;
+
+ return ida_alloc_range(&class->ida_csu_mon, 0, cprops->num_csu_mon - 1,
+ GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+ ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+ return -EOPNOTSUPP;
+
+ return ida_alloc_range(&class->ida_mbwu_mon, 0,
+ cprops->num_mbwu_mon - 1, GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+ ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
/* List of all classes - protected by srcu*/
extern struct srcu_struct mpam_srcu;
extern struct list_head mpam_classes;
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (59 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
` (6 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.
Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.
CC: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 222 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 18 +++
2 files changed, 240 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e7e00c632512..9ce771aaf671 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
return 0;
}
+struct mon_read {
+ struct mpam_msc_ris *ris;
+ struct mon_cfg *ctx;
+ enum mpam_device_features type;
+ u64 *val;
+ int err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+ u32 *flt_val)
+{
+ struct mon_cfg *ctx = m->ctx;
+
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ *ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
+ break;
+ case mpam_feat_msmon_mbwu:
+ *ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+ break;
+ default:
+ return;
+ }
+
+ /*
+ * For CSU counters its implementation-defined what happens when not
+ * filtering by partid.
+ */
+ *ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
+
+ *flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+ if (m->ctx->match_pmg) {
+ *ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+ *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
+ }
+
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+ *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+ u32 *flt_val)
+{
+ struct mpam_msc *msc = m->ris->vmsc->msc;
+
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ *ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+ *flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+ break;
+ case mpam_feat_msmon_mbwu:
+ *ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+ *flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+ break;
+ default:
+ return;
+ }
+}
+
+/* Remove values set by the hardware to prevent apparant mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+ *cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+ u32 flt_val)
+{
+ struct mpam_msc *msc = m->ris->vmsc->msc;
+
+ /*
+ * Write the ctl_val with the enable bit cleared, reset the counter,
+ * then enable counter.
+ */
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+ mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+ mpam_write_monsel_reg(msc, CSU, 0);
+ mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+ break;
+ case mpam_feat_msmon_mbwu:
+ mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+ mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+ mpam_write_monsel_reg(msc, MBWU, 0);
+ mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+ break;
+ default:
+ return;
+ }
+}
+
+/* Call with MSC lock held */
+static void __ris_msmon_read(void *arg)
+{
+ u64 now;
+ bool nrdy = false;
+ struct mon_read *m = arg;
+ struct mon_cfg *ctx = m->ctx;
+ struct mpam_msc_ris *ris = m->ris;
+ struct mpam_props *rprops = &ris->props;
+ struct mpam_msc *msc = m->ris->vmsc->msc;
+ u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+ if (!mpam_mon_sel_inner_lock(msc)) {
+ m->err = -EIO;
+ return;
+ }
+ mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+ FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+ mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+ /*
+ * Read the existing configuration to avoid re-writing the same values.
+ * This saves waiting for 'nrdy' on subsequent reads.
+ */
+ read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+ clean_msmon_ctl_val(&cur_ctl);
+ gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+ if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+ write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+ switch (m->type) {
+ case mpam_feat_msmon_csu:
+ now = mpam_read_monsel_reg(msc, CSU);
+ if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY;
+ break;
+ case mpam_feat_msmon_mbwu:
+ now = mpam_read_monsel_reg(msc, MBWU);
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY;
+ break;
+ default:
+ m->err = -EINVAL;
+ break;
+ }
+ mpam_mon_sel_inner_unlock(msc);
+
+ if (nrdy) {
+ m->err = -EBUSY;
+ return;
+ }
+
+ now = FIELD_GET(MSMON___VALUE, now);
+ *m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+ int err, idx;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ mpam_mon_sel_outer_lock(msc);
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ arg->ris = ris;
+
+ err = smp_call_function_any(&msc->accessibility,
+ __ris_msmon_read, arg,
+ true);
+ if (!err && arg->err)
+ err = arg->err;
+ if (err)
+ break;
+ }
+ mpam_mon_sel_outer_unlock(msc);
+ if (err)
+ break;
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+
+ return err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+ enum mpam_device_features type, u64 *val)
+{
+ int err;
+ struct mon_read arg;
+ u64 wait_jiffies = 0;
+ struct mpam_props *cprops = &comp->class->props;
+
+ might_sleep();
+
+ if (!mpam_is_enabled())
+ return -EIO;
+
+ if (!mpam_has_feature(type, cprops))
+ return -EOPNOTSUPP;
+
+ memset(&arg, 0, sizeof(arg));
+ arg.ctx = ctx;
+ arg.type = type;
+ arg.val = val;
+ *val = 0;
+
+ err = _msmon_read(comp, &arg);
+ if (err == -EBUSY && comp->class->nrdy_usec)
+ wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+ while (wait_jiffies)
+ wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+ if (err == -EBUSY) {
+ memset(&arg, 0, sizeof(arg));
+ arg.ctx = ctx;
+ arg.type = type;
+ arg.val = val;
+ *val = 0;
+
+ err = _msmon_read(comp, &arg);
+ }
+
+ return err;
+}
+
static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
{
u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4981de120869..76e406a2b0d1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -309,6 +309,21 @@ struct mpam_msc_ris {
struct mpam_garbage garbage;
};
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+ COUNT_BOTH = 0,
+ COUNT_WRITE = 1,
+ COUNT_READ = 2,
+};
+
+struct mon_cfg {
+ u16 mon;
+ u8 pmg;
+ bool match_pmg;
+ u32 partid;
+ enum mon_filter_options opts;
+};
+
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
@@ -361,6 +376,9 @@ void mpam_disable(struct work_struct *work);
int mpam_apply_config(struct mpam_component *comp, u16 partid,
struct mpam_config *cfg);
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+ enum mpam_device_features, u64 *val);
+
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (60 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-28 0:58 ` Fenghua Yu
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
` (5 subsequent siblings)
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Bandwidth counters need to run continuously to correctly reflect the
bandwidth.
The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 163 +++++++++++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 54 ++++++++---
2 files changed, 200 insertions(+), 17 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9ce771aaf671..11be34b54643 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1004,6 +1004,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+ *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
if (m->ctx->match_pmg) {
*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
@@ -1041,6 +1042,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
u32 flt_val)
{
+ struct msmon_mbwu_state *mbwu_state;
struct mpam_msc *msc = m->ris->vmsc->msc;
/*
@@ -1059,20 +1061,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
mpam_write_monsel_reg(msc, MBWU, 0);
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+
+ mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+ if (mbwu_state)
+ mbwu_state->prev_val = 0;
+
break;
default:
return;
}
}
+static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+{
+ /* TODO: scaling, and long counters */
+ return GENMASK_ULL(30, 0);
+}
+
/* Call with MSC lock held */
static void __ris_msmon_read(void *arg)
{
- u64 now;
bool nrdy = false;
struct mon_read *m = arg;
+ u64 now, overflow_val = 0;
struct mon_cfg *ctx = m->ctx;
struct mpam_msc_ris *ris = m->ris;
+ struct msmon_mbwu_state *mbwu_state;
struct mpam_props *rprops = &ris->props;
struct mpam_msc *msc = m->ris->vmsc->msc;
u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1100,11 +1114,30 @@ static void __ris_msmon_read(void *arg)
now = mpam_read_monsel_reg(msc, CSU);
if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
nrdy = now & MSMON___NRDY;
+ now = FIELD_GET(MSMON___VALUE, now);
break;
case mpam_feat_msmon_mbwu:
now = mpam_read_monsel_reg(msc, MBWU);
if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
nrdy = now & MSMON___NRDY;
+ now = FIELD_GET(MSMON___VALUE, now);
+
+ if (nrdy)
+ break;
+
+ mbwu_state = &ris->mbwu_state[ctx->mon];
+ if (!mbwu_state)
+ break;
+
+ /* Add any pre-overflow value to the mbwu_state->val */
+ if (mbwu_state->prev_val > now)
+ overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+
+ mbwu_state->prev_val = now;
+ mbwu_state->correction += overflow_val;
+
+ /* Include bandwidth consumed before the last hardware reset */
+ now += mbwu_state->correction;
break;
default:
m->err = -EINVAL;
@@ -1117,7 +1150,6 @@ static void __ris_msmon_read(void *arg)
return;
}
- now = FIELD_GET(MSMON___VALUE, now);
*m->val += now;
}
@@ -1329,6 +1361,72 @@ static int mpam_reprogram_ris(void *_arg)
return 0;
}
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+ int i;
+ struct mon_read mwbu_arg;
+ struct mpam_msc_ris *ris = _ris;
+ struct mpam_msc *msc = ris->vmsc->msc;
+
+ mpam_mon_sel_outer_lock(msc);
+
+ for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+ if (ris->mbwu_state[i].enabled) {
+ mwbu_arg.ris = ris;
+ mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+ mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+ __ris_msmon_read(&mwbu_arg);
+ }
+ }
+
+ mpam_mon_sel_outer_unlock(msc);
+
+ return 0;
+}
+
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_save_mbwu_state(void *arg)
+{
+ int i;
+ u64 val;
+ struct mon_cfg *cfg;
+ u32 cur_flt, cur_ctl, mon_sel;
+ struct mpam_msc_ris *ris = arg;
+ struct msmon_mbwu_state *mbwu_state;
+ struct mpam_msc *msc = ris->vmsc->msc;
+
+ for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+ mbwu_state = &ris->mbwu_state[i];
+ cfg = &mbwu_state->cfg;
+
+ if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+ return -EIO;
+
+ mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+ FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+ mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+ cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+ cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+ mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+ val = mpam_read_monsel_reg(msc, MBWU);
+ mpam_write_monsel_reg(msc, MBWU, 0);
+
+ cfg->mon = i;
+ cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
+ cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+ cfg->partid = FIELD_GET(MSMON_CFG_MBWU_FLT_PARTID, cur_flt);
+ mbwu_state->correction += val;
+ mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+ mpam_mon_sel_inner_unlock(msc);
+ }
+
+ return 0;
+}
+
/*
* Called via smp_call_on_cpu() to prevent migration, while still being
* pre-emptible.
@@ -1389,6 +1487,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
* for non-zero partid may be lost while the CPUs are offline.
*/
ris->in_reset_state = online;
+
+ if (mpam_is_enabled() && !online)
+ mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
}
mpam_mon_sel_outer_unlock(msc);
}
@@ -1423,6 +1524,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
mpam_reprogram_ris_partid(ris, partid, cfg);
}
ris->in_reset_state = reset;
+
+ if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+ mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
}
}
@@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
static void __destroy_component_cfg(struct mpam_component *comp)
{
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ lockdep_assert_held(&mpam_list_lock);
+
add_to_garbage(comp->cfg);
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ msc = vmsc->msc;
+
+ mpam_mon_sel_outer_lock(msc);
+ if (mpam_mon_sel_inner_lock(msc)) {
+ list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+ add_to_garbage(ris->mbwu_state);
+ mpam_mon_sel_inner_unlock(msc);
+ }
+ mpam_mon_sel_outer_lock(msc);
+ }
}
static int __allocate_component_cfg(struct mpam_component *comp)
{
+ int err = 0;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+ struct msmon_mbwu_state *mbwu_state;
+
+ lockdep_assert_held(&mpam_list_lock);
mpam_assert_partid_sizes_fixed();
if (comp->cfg)
@@ -2306,6 +2434,37 @@ static int __allocate_component_cfg(struct mpam_component *comp)
return -ENOMEM;
init_garbage(comp->cfg);
+ list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+ if (!vmsc->props.num_mbwu_mon)
+ continue;
+
+ msc = vmsc->msc;
+ mpam_mon_sel_outer_lock(msc);
+ list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+ if (!ris->props.num_mbwu_mon)
+ continue;
+
+ mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+ sizeof(*ris->mbwu_state),
+ GFP_KERNEL);
+ if (!mbwu_state) {
+ __destroy_component_cfg(comp);
+ err = -ENOMEM;
+ break;
+ }
+
+ if (mpam_mon_sel_inner_lock(msc)) {
+ init_garbage(mbwu_state);
+ ris->mbwu_state = mbwu_state;
+ mpam_mon_sel_inner_unlock(msc);
+ }
+ }
+ mpam_mon_sel_outer_unlock(msc);
+
+ if (err)
+ break;
+ }
+
return 0;
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 76e406a2b0d1..9a50a5432f4a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -271,6 +271,42 @@ struct mpam_component {
struct mpam_garbage garbage;
};
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+ COUNT_BOTH = 0,
+ COUNT_WRITE = 1,
+ COUNT_READ = 2,
+};
+
+struct mon_cfg {
+ /* mon is wider than u16 to hold an out of range 'USE_RMID_IDX' */
+ u32 mon;
+ u8 pmg;
+ bool match_pmg;
+ u32 partid;
+ enum mon_filter_options opts;
+};
+
+/*
+ * Changes to enabled and cfg are protected by the msc->lock.
+ * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ */
+struct msmon_mbwu_state {
+ bool enabled;
+ struct mon_cfg cfg;
+
+ /* The value last read from the hardware. Used to detect overflow. */
+ u64 prev_val;
+
+ /*
+ * The value to add to the new reading to account for power management,
+ * and shifts to trigger the overflow interrupt.
+ */
+ u64 correction;
+
+ struct mpam_garbage garbage;
+};
+
struct mpam_vmsc {
/* member of mpam_component:vmsc_list */
struct list_head comp_list;
@@ -306,22 +342,10 @@ struct mpam_msc_ris {
/* parent: */
struct mpam_vmsc *vmsc;
- struct mpam_garbage garbage;
-};
+ /* msmon mbwu configuration is preserved over reset */
+ struct msmon_mbwu_state *mbwu_state;
-/* The values for MSMON_CFG_MBWU_FLT.RWBW */
-enum mon_filter_options {
- COUNT_BOTH = 0,
- COUNT_WRITE = 1,
- COUNT_READ = 2,
-};
-
-struct mon_cfg {
- u16 mon;
- u8 pmg;
- bool match_pmg;
- u32 partid;
- enum mon_filter_options opts;
+ struct mpam_garbage garbage;
};
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (61 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
` (4 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
indicating support for long counters. As of now, a 44 bit counter
represented by HAS_LONG field (bit 30) and a 63 bit counter represented
by LWD (bit 29) can be optionally integrated. Probe for these counters
and set corresponding feature bits if any of these counters are present.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 23 ++++++++++++++++++++++-
drivers/resctrl/mpam_internal.h | 8 ++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 11be34b54643..2ab7f127baaa 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
}
if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
- bool hw_managed;
+ bool has_long, hw_managed;
u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
@@ -880,6 +880,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+ /*
+ * Treat long counter and its extension, lwd as mutually
+ * exclusive feature bits. Though these are dependent
+ * fields at the implementation level, there would never
+ * be a need for mpam_feat_msmon_mbwu_44counter (long
+ * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
+ * bits to be set together.
+ *
+ * mpam_feat_msmon_mbwu isn't treated as an exclusive
+ * bit as this feature bit would be used as the "front
+ * facing feature bit" for any checks related to mbwu
+ * monitors.
+ */
+ has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumonidr);
+ if (props->num_mbwu_mon && has_long) {
+ if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumonidr))
+ mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+ else
+ mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+ }
+
/* Is NRDY hardware managed? */
mpam_mon_sel_outer_lock(msc);
hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9a50a5432f4a..9f627b5f72a1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -178,7 +178,15 @@ enum mpam_device_features {
mpam_feat_msmon_csu,
mpam_feat_msmon_csu_capture,
mpam_feat_msmon_csu_hw_nrdy,
+
+ /*
+ * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
+ * counter would be used. The exact counter used is decided based on the
+ * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
+ */
mpam_feat_msmon_mbwu,
+ mpam_feat_msmon_mbwu_44counter,
+ mpam_feat_msmon_mbwu_63counter,
mpam_feat_msmon_mbwu_capture,
mpam_feat_msmon_mbwu_rwbw,
mpam_feat_msmon_mbwu_hw_nrdy,
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (62 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
` (3 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
the RIS, use long/LWD counter instead of the regular 31 bit mbwu
counter.
Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit]
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
* Commit message wrangling.
* Refer to 31 bit counters as opposed to 32 bit (registers).
---
drivers/resctrl/mpam_devices.c | 89 ++++++++++++++++++++++++++++++----
1 file changed, 80 insertions(+), 9 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2ab7f127baaa..8fbcf6eb946a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1002,6 +1002,48 @@ struct mon_read {
int err;
};
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+ return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+ mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+ int retry = 3;
+ u32 mbwu_l_low;
+ u64 mbwu_l_high1, mbwu_l_high2;
+
+ mpam_mon_sel_lock_held(msc);
+
+ WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+ do {
+ mbwu_l_high1 = mbwu_l_high2;
+ mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+ mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+ retry--;
+ } while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+ if (mbwu_l_high1 == mbwu_l_high2)
+ return (mbwu_l_high1 << 32) | mbwu_l_low;
+ return MSMON___NRDY_L;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+ mpam_mon_sel_lock_held(msc);
+
+ WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+ WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+ __mpam_write_reg(msc, MSMON_MBWU_L, 0);
+ __mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
u32 *flt_val)
{
@@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
static void clean_msmon_ctl_val(u32 *cur_ctl)
{
*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+ *cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
}
static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -1080,7 +1123,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
case mpam_feat_msmon_mbwu:
mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
- mpam_write_monsel_reg(msc, MBWU, 0);
+ if (mpam_ris_has_mbwu_long_counter(m->ris))
+ mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
+ else
+ mpam_write_monsel_reg(msc, MBWU, 0);
+
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
@@ -1095,8 +1142,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
{
- /* TODO: scaling, and long counters */
- return GENMASK_ULL(30, 0);
+ /* TODO: implement scaling counters */
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
+ return GENMASK_ULL(62, 0);
+ else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
+ return GENMASK_ULL(43, 0);
+ else
+ return GENMASK_ULL(30, 0);
}
/* Call with MSC lock held */
@@ -1138,10 +1190,24 @@ static void __ris_msmon_read(void *arg)
now = FIELD_GET(MSMON___VALUE, now);
break;
case mpam_feat_msmon_mbwu:
- now = mpam_read_monsel_reg(msc, MBWU);
- if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
- nrdy = now & MSMON___NRDY;
- now = FIELD_GET(MSMON___VALUE, now);
+ /*
+ * If long or lwd counters are supported, use them, else revert
+ * to the 31 bit counter.
+ */
+ if (mpam_ris_has_mbwu_long_counter(ris)) {
+ now = mpam_msc_read_mbwu_l(msc);
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY_L;
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
+ now = FIELD_GET(MSMON___LWD_VALUE, now);
+ else
+ now = FIELD_GET(MSMON___L_VALUE, now);
+ } else {
+ now = mpam_read_monsel_reg(msc, MBWU);
+ if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+ nrdy = now & MSMON___NRDY;
+ now = FIELD_GET(MSMON___VALUE, now);
+ }
if (nrdy)
break;
@@ -1433,8 +1499,13 @@ static int mpam_save_mbwu_state(void *arg)
cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
- val = mpam_read_monsel_reg(msc, MBWU);
- mpam_write_monsel_reg(msc, MBWU, 0);
+ if (mpam_ris_has_mbwu_long_counter(ris)) {
+ val = mpam_msc_read_mbwu_l(msc);
+ mpam_msc_zero_mbwu_l(msc);
+ } else {
+ val = mpam_read_monsel_reg(msc, MBWU);
+ mpam_write_monsel_reg(msc, MBWU, 0);
+ }
cfg->mon = i;
cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (63 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
` (2 subsequent siblings)
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
resctrl expects to reset the bandwidth counters when the filesystem
is mounted.
To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 49 +++++++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 5 +++-
2 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8fbcf6eb946a..65c30ebfe001 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1155,9 +1155,11 @@ static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
static void __ris_msmon_read(void *arg)
{
bool nrdy = false;
+ bool config_mismatch;
struct mon_read *m = arg;
u64 now, overflow_val = 0;
struct mon_cfg *ctx = m->ctx;
+ bool reset_on_next_read = false;
struct mpam_msc_ris *ris = m->ris;
struct msmon_mbwu_state *mbwu_state;
struct mpam_props *rprops = &ris->props;
@@ -1172,6 +1174,14 @@ static void __ris_msmon_read(void *arg)
FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+ if (m->type == mpam_feat_msmon_mbwu) {
+ mbwu_state = &ris->mbwu_state[ctx->mon];
+ if (mbwu_state) {
+ reset_on_next_read = mbwu_state->reset_on_next_read;
+ mbwu_state->reset_on_next_read = false;
+ }
+ }
+
/*
* Read the existing configuration to avoid re-writing the same values.
* This saves waiting for 'nrdy' on subsequent reads.
@@ -1179,7 +1189,10 @@ static void __ris_msmon_read(void *arg)
read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
clean_msmon_ctl_val(&cur_ctl);
gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
- if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+ config_mismatch = cur_flt != flt_val ||
+ cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+ if (config_mismatch || reset_on_next_read)
write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
switch (m->type) {
@@ -1212,7 +1225,6 @@ static void __ris_msmon_read(void *arg)
if (nrdy)
break;
- mbwu_state = &ris->mbwu_state[ctx->mon];
if (!mbwu_state)
break;
@@ -1314,6 +1326,39 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
return err;
}
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+ int idx;
+ struct mpam_msc *msc;
+ struct mpam_vmsc *vmsc;
+ struct mpam_msc_ris *ris;
+
+ if (!mpam_is_enabled())
+ return;
+
+ idx = srcu_read_lock(&mpam_srcu);
+ list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+ continue;
+
+ msc = vmsc->msc;
+ mpam_mon_sel_outer_lock(msc);
+ list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+ continue;
+
+ if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+ continue;
+
+ ris->mbwu_state[ctx->mon].correction = 0;
+ ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+ mpam_mon_sel_inner_unlock(msc);
+ }
+ mpam_mon_sel_outer_unlock(msc);
+ }
+ srcu_read_unlock(&mpam_srcu, idx);
+}
+
static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
{
u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f627b5f72a1..bbf0306abc82 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -297,10 +297,12 @@ struct mon_cfg {
/*
* Changes to enabled and cfg are protected by the msc->lock.
- * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ * Changes to reset_on_next_read, prev_val and correction are protected by the
+ * msc's mon_sel_lock.
*/
struct msmon_mbwu_state {
bool enabled;
+ bool reset_on_next_read;
struct mon_cfg cfg;
/* The value last read from the hardware. Used to detect overflow. */
@@ -410,6 +412,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (64 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-08-22 15:30 ` James Morse
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
2025-08-24 17:24 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Krzysztof Kozlowski
67 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich,
Jonathan Cameron
The bitmap reset code has been a source of bugs. Add a unit test.
This currently has to be built in, as the rest of the driver is
builtin.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/Kconfig | 13 ++++++
drivers/resctrl/mpam_devices.c | 4 ++
drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
3 files changed, 85 insertions(+)
create mode 100644 drivers/resctrl/test_mpam_devices.c
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index dff7b87280ab..f5e0609975e4 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
bool "MPAM driver for System IP, e,g. caches and memory controllers"
depends on ARM64_MPAM && EXPERT
+menu "ARM64 MPAM driver options"
+
config ARM64_MPAM_DRIVER_DEBUG
bool "Enable debug messages from the MPAM driver."
depends on ARM64_MPAM_DRIVER
help
Say yes here to enable debug messages from the MPAM driver.
+
+config MPAM_KUNIT_TEST
+ bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+ depends on KUNIT=y
+ default KUNIT_ALL_TESTS
+ help
+ Enable this option to run tests in the MPAM driver.
+
+ If unsure, say N.
+
+endmenu
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 65c30ebfe001..4cf5aae88c53 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2903,3 +2903,7 @@ static int __init mpam_msc_driver_init(void)
}
/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..8e9d6c88171c
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2024 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+ char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+ struct mpam_msc fake_msc;
+ u32 *test_result;
+
+ if (!buf)
+ return;
+
+ fake_msc.mapped_hwpage = buf;
+ fake_msc.mapped_hwpage_sz = SZ_16K;
+ cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+ mutex_init(&fake_msc.part_sel_lock);
+ mutex_lock(&fake_msc.part_sel_lock);
+
+ test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+ KUNIT_EXPECT_EQ(test, test_result[0], 1);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+ KUNIT_EXPECT_EQ(test, test_result[1], 0);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+ KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+ KUNIT_EXPECT_EQ(test, test_result[1], 1);
+ test_result[0] = 0;
+ test_result[1] = 0;
+
+ mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+ KUNIT_CASE(test_mpam_reset_msc_bitmap),
+ {}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+ .name = "mpam_devices_test_suite",
+ .test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (65 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-08-22 15:30 ` James Morse
2025-09-02 16:59 ` Fenghua Yu
2025-08-24 17:24 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Krzysztof Kozlowski
67 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.
Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 8 +-
drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
2 files changed, 329 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index bbf0306abc82..6e973be095f8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -18,6 +18,12 @@
DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
static inline bool mpam_is_enabled(void)
{
return static_branch_likely(&mpam_enabled);
@@ -209,7 +215,7 @@ struct mpam_props {
u16 dspri_wd;
u16 num_csu_mon;
u16 num_mbwu_mon;
-};
+} PACKED_FOR_KUNIT;
#define mpam_has_feature(_feat, x) ((1 << (_feat)) & (x)->features)
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 8e9d6c88171c..ef39696e7ff8 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,326 @@
#include <kunit/test.h>
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+ struct mpam_props parent = { 0 };
+ struct mpam_props child;
+
+ memset(&child, 0xff, sizeof(child));
+ __props_mismatch(&parent, &child, false);
+
+ memset(&child, 0, sizeof(child));
+ KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+ memset(&child, 0xff, sizeof(child));
+ __props_mismatch(&parent, &child, true);
+
+ KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+ /* o/` How deep is your stack? o/` */
+ struct list_head fake_classes_list;
+ struct mpam_class fake_class = { 0 };
+ struct mpam_component fake_comp1 = { 0 };
+ struct mpam_component fake_comp2 = { 0 };
+ struct mpam_vmsc fake_vmsc1 = { 0 };
+ struct mpam_vmsc fake_vmsc2 = { 0 };
+ struct mpam_msc fake_msc1 = { 0 };
+ struct mpam_msc fake_msc2 = { 0 };
+ struct mpam_msc_ris fake_ris1 = { 0 };
+ struct mpam_msc_ris fake_ris2 = { 0 };
+ struct platform_device fake_pdev = { 0 };
+
+#define RESET_FAKE_HIEARCHY() do { \
+ INIT_LIST_HEAD(&fake_classes_list); \
+ \
+ memset(&fake_class, 0, sizeof(fake_class)); \
+ fake_class.level = 3; \
+ fake_class.type = MPAM_CLASS_CACHE; \
+ INIT_LIST_HEAD_RCU(&fake_class.components); \
+ INIT_LIST_HEAD(&fake_class.classes_list); \
+ \
+ memset(&fake_comp1, 0, sizeof(fake_comp1)); \
+ memset(&fake_comp2, 0, sizeof(fake_comp2)); \
+ fake_comp1.comp_id = 1; \
+ fake_comp2.comp_id = 2; \
+ INIT_LIST_HEAD(&fake_comp1.vmsc); \
+ INIT_LIST_HEAD(&fake_comp1.class_list); \
+ INIT_LIST_HEAD(&fake_comp2.vmsc); \
+ INIT_LIST_HEAD(&fake_comp2.class_list); \
+ \
+ memset(&fake_vmsc1, 0, sizeof(fake_vmsc1)); \
+ memset(&fake_vmsc2, 0, sizeof(fake_vmsc2)); \
+ INIT_LIST_HEAD(&fake_vmsc1.ris); \
+ INIT_LIST_HEAD(&fake_vmsc1.comp_list); \
+ fake_vmsc1.msc = &fake_msc1; \
+ INIT_LIST_HEAD(&fake_vmsc2.ris); \
+ INIT_LIST_HEAD(&fake_vmsc2.comp_list); \
+ fake_vmsc2.msc = &fake_msc2; \
+ \
+ memset(&fake_ris1, 0, sizeof(fake_ris1)); \
+ memset(&fake_ris2, 0, sizeof(fake_ris2)); \
+ fake_ris1.ris_idx = 1; \
+ INIT_LIST_HEAD(&fake_ris1.msc_list); \
+ fake_ris2.ris_idx = 2; \
+ INIT_LIST_HEAD(&fake_ris2.msc_list); \
+ \
+ fake_msc1.pdev = &fake_pdev; \
+ fake_msc2.pdev = &fake_pdev; \
+ \
+ list_add(&fake_class.classes_list, &fake_classes_list); \
+} while (0)
+
+ RESET_FAKE_HIEARCHY();
+
+ mutex_lock(&mpam_list_lock);
+
+ /* One Class+Comp, two RIS in one vMSC with common features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = NULL;
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc1;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cpbm_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = NULL;
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc1;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cmax_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /* Multiple RIS within one MSC controlling the same resource can be mismatched */
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+ KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cpbm_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with non-overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cmax_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple RIS in different MSC can't the same resource, mismatched
+ * features can not be supported.
+ */
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with incompatible overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 5;
+ fake_ris2.props.cpbm_wd = 3;
+ fake_ris1.props.mbw_pbm_bits = 5;
+ fake_ris2.props.mbw_pbm_bits = 3;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple RIS in different MSC can't the same resource, mismatched
+ * features can not be supported.
+ */
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+ KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class+Comp, two MSC with overlapping features that need tweaking */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = NULL;
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp1;
+ list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+ mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+ fake_ris1.props.bwa_wd = 5;
+ fake_ris2.props.bwa_wd = 3;
+ fake_ris1.props.cmax_wd = 5;
+ fake_ris2.props.cmax_wd = 3;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple RIS in different MSC can't the same resource, mismatched
+ * features can not be supported.
+ */
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class Two Comp with overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = &fake_class;
+ list_add(&fake_comp2.class_list, &fake_class.components);
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp2;
+ list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cpbm_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+ RESET_FAKE_HIEARCHY();
+
+ /* One Class Two Comp with non-overlapping features */
+ fake_comp1.class = &fake_class;
+ list_add(&fake_comp1.class_list, &fake_class.components);
+ fake_comp2.class = &fake_class;
+ list_add(&fake_comp2.class_list, &fake_class.components);
+ fake_vmsc1.comp = &fake_comp1;
+ list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+ fake_vmsc2.comp = &fake_comp2;
+ list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+ fake_ris1.vmsc = &fake_vmsc1;
+ list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+ fake_ris2.vmsc = &fake_vmsc2;
+ list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+ mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+ mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+ fake_ris1.props.cpbm_wd = 4;
+ fake_ris2.props.cmax_wd = 4;
+
+ mpam_enable_merge_features(&fake_classes_list);
+
+ /*
+ * Multiple components can't control the same resource, mismatched features can
+ * not be supported.
+ */
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+ KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+ KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+ KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+ mutex_unlock(&mpam_list_lock);
+
+#undef RESET_FAKE_HIEARCHY
+}
+
static void test_mpam_reset_msc_bitmap(struct kunit *test)
{
char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -57,6 +377,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
static struct kunit_case mpam_devices_test_cases[] = {
KUNIT_CASE(test_mpam_reset_msc_bitmap),
+ KUNIT_CASE(test_mpam_enable_merge_features),
+ KUNIT_CASE(test__props_mismatch),
{}
};
--
2.20.1
^ permalink raw reply related [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-08-22 19:15 ` Markus Elfring
2025-08-22 19:55 ` Markus Elfring
` (4 subsequent siblings)
5 siblings, 0 replies; 130+ messages in thread
From: Markus Elfring @ 2025-08-22 19:15 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, linux-acpi, devicetree
Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
Will Deacon, Xin Hao
…
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
…
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
…
> + mutex_lock(&mpam_list_lock);
> + mpam_num_msc--;
…
> + devm_kfree(&pdev->dev, msc);
> + mutex_unlock(&mpam_list_lock);
> +}
…
Under which circumstances would you become interested to apply a statement
like “guard(mutex)(&mpam_list_lock);”?
https://elixir.bootlin.com/linux/v6.17-rc2/source/include/linux/mutex.h#L228
Regards,
Markus
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
2025-08-22 19:15 ` Markus Elfring
@ 2025-08-22 19:55 ` Markus Elfring
2025-08-23 6:41 ` Greg Kroah-Hartman
2025-08-27 13:03 ` Ben Horgan
` (3 subsequent siblings)
5 siblings, 1 reply; 130+ messages in thread
From: Markus Elfring @ 2025-08-22 19:55 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, linux-acpi, devicetree
Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
Will Deacon, Xin Hao
…
…
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
…
> + } while (0);
> + mutex_unlock(&mpam_list_lock);
> +
> + if (!err) {
> + /* Create RIS entries described by firmware */
> + if (!acpi_disabled)
> + err = acpi_mpam_parse_resources(msc, plat_data);
> + else
> + err = mpam_dt_parse_resources(msc, plat_data);
> + }
> +
> + if (!err && fw_num_msc == mpam_num_msc)
> + mpam_discovery_complete();
> +
> + if (err && msc)
> + mpam_msc_drv_remove(pdev);
> +
> + return err;
> +}
…
* Would you like to integrate anything from the following source code variant?
if (!err)
/* Create RIS entries described by firmware */
err = acpi_disabled
? mpam_dt_parse_resources(msc, plat_data)
: acpi_mpam_parse_resources(msc, plat_data);
if (err) {
if (msc)
mpam_msc_drv_remove(pdev);
} else {
if (fw_num_msc == mpam_num_msc)
mpam_discovery_complete();
}
* How do you think about to increase the application of scope-based resource management
at further places?
Regards,
Markus
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 19:55 ` Markus Elfring
@ 2025-08-23 6:41 ` Greg Kroah-Hartman
0 siblings, 0 replies; 130+ messages in thread
From: Greg Kroah-Hartman @ 2025-08-23 6:41 UTC (permalink / raw)
To: Markus Elfring
Cc: James Morse, linux-arm-kernel, linux-acpi, devicetree, LKML,
Amit Singh Tomar, Baisheng Gao, Baolin Wang, bobo.shaobowang,
Carl Worth, Catalin Marinas, Conor Dooley, Danilo Krummrich,
Dave Martin, David Hildenbrand, Drew Fustini, D Scott Phillips,
Fenghua Yu, Hanjun Guo, Jamie Iles, Jonathan Cameron, Koba Ko,
Krzysztof Kozlowski, Len Brown, Linu Cherian, Lorenzo Pieralisi,
Peter Newman, Rafael J. Wysocki, Rex Nie, Rob Herring,
Rohit Mathew, Shameer Kolothum, Shanker Donthineni, Shaopeng Tan,
Sudeep Holla, Will Deacon, Xin Hao
On Fri, Aug 22, 2025 at 09:55:33PM +0200, Markus Elfring wrote:
> …
> …
> > +static int mpam_msc_drv_probe(struct platform_device *pdev)
> > +{
> …
> > + } while (0);
> > + mutex_unlock(&mpam_list_lock);
> > +
> > + if (!err) {
> > + /* Create RIS entries described by firmware */
> > + if (!acpi_disabled)
> > + err = acpi_mpam_parse_resources(msc, plat_data);
> > + else
> > + err = mpam_dt_parse_resources(msc, plat_data);
> > + }
> > +
> > + if (!err && fw_num_msc == mpam_num_msc)
> > + mpam_discovery_complete();
> > +
> > + if (err && msc)
> > + mpam_msc_drv_remove(pdev);
> > +
> > + return err;
> > +}
> …
>
> * Would you like to integrate anything from the following source code variant?
>
> if (!err)
> /* Create RIS entries described by firmware */
> err = acpi_disabled
> ? mpam_dt_parse_resources(msc, plat_data)
> : acpi_mpam_parse_resources(msc, plat_data);
>
> if (err) {
> if (msc)
> mpam_msc_drv_remove(pdev);
> } else {
> if (fw_num_msc == mpam_num_msc)
> mpam_discovery_complete();
> }
>
> * How do you think about to increase the application of scope-based resource management
> at further places?
>
>
> Regards,
> Markus
Hi,
This is the semi-friendly patch-bot of Greg Kroah-Hartman.
Markus, you seem to have sent a nonsensical or otherwise pointless
review comment to a patch submission on a Linux kernel developer mailing
list. I strongly suggest that you not do this anymore. Please do not
bother developers who are actively working to produce patches and
features with comments that, in the end, are a waste of time.
Patch submitter, please ignore Markus's suggestion; you do not need to
follow it at all. The person/bot/AI that sent it is being ignored by
almost all Linux kernel maintainers for having a persistent pattern of
behavior of producing distracting and pointless commentary, and
inability to adapt to feedback. Please feel free to also ignore emails
from them.
thanks,
greg k-h's patch email bot
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-08-23 10:55 ` Markus Elfring
2025-08-27 16:05 ` Dave Martin
1 sibling, 0 replies; 130+ messages in thread
From: Markus Elfring @ 2025-08-23 10:55 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, linux-acpi, devicetree
Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
Will Deacon, Xin Hao
…
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,331 @@
…
> +static int __init acpi_mpam_parse(void)
> +{
> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> + char *table_end, *table_offset = (char *)(table + 1);
…
Please replace eight space characters by a tab character here.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v6.17-rc2#n18
Are further source code places similarly improvable?
Regards,
Markus
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-08-23 12:14 ` Markus Elfring
2025-08-28 15:57 ` James Morse
2025-08-27 9:25 ` Ben Horgan
2025-08-27 10:50 ` Dave Martin
2 siblings, 1 reply; 130+ messages in thread
From: Markus Elfring @ 2025-08-23 12:14 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, linux-acpi, devicetree
Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
Will Deacon, Xin Hao
…
> +++ b/include/linux/acpi.h
…
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
> void acpi_table_init_complete (void);
> int acpi_table_init (void);
…
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> +
> int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
…
How do you think about to offer the addition of such a special macro call
by another separate update step?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.17-rc2#n81
Regards,
Markus
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
` (66 preceding siblings ...)
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-08-24 17:24 ` Krzysztof Kozlowski
67 siblings, 0 replies; 130+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-24 17:24 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
On 22/08/2025 17:29, James Morse wrote:
> Hello,
>
> This is just enough MPAM driver for the ACPI and DT pre-requisites.
> It doesn't contain any of the resctrl code, meaning you can't actually drive it
> from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
> This will change once the user interface is connected up.
>
> This is the initial group of patches that allows the resctrl code to be built
> on top. Including that will increase the number of trees that may need to
> coordinate, so breaking it up make sense.
>
There was v1 of this, so that's a v2. Start using b4 to get it right,
because you just make it difficult for us to review.
Try yourself:
b4 diff <this-patchset>
Works? No.
Also, for some reason you sent it twice, so again: use b4.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-08-24 17:25 ` Krzysztof Kozlowski
2025-08-27 17:11 ` James Morse
2025-08-27 10:46 ` Dave Martin
1 sibling, 1 reply; 130+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-24 17:25 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
On 22/08/2025 17:29, James Morse wrote:
> MPAM needs to know the size of a cache associated with a particular CPU.
> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>
> Add a helper to do this.
>
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> Changes since v1:
You marked this as v1.
> * Converted to kdoc.
> * Simplified helper to use get_cpu_cacheinfo_level().
Please use consistent subject prefixes. Look at previous patch subject
prefix.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-08-26 14:45 ` Ben Horgan
2025-08-28 15:56 ` James Morse
2025-08-27 10:48 ` Dave Martin
1 sibling, 1 reply; 130+ messages in thread
From: Ben Horgan @ 2025-08-26 14:45 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
The patch logic update makes sense to me. Just a nit.
On 8/22/25 16:29, James Morse wrote:
> The PPTT describes CPUs and caches, as well as processor containers.
> The ACPI table for MPAM describes the set of CPUs that can access an MSC
> with the UID of a processor container.
>
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
>
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
> * Added missing : in kernel-doc
> * Made helper return void as this never actually returns an error.
> ---
> drivers/acpi/pptt.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 3 ++
> 2 files changed, 89 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..4791ca2bdfac 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
> return NULL;
> }
>
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
> + * @table_hdr: A reference to the PPTT table.
> + * @parent_node: A pointer to the processor node in the @table_hdr.
> + * @cpus: A cpumask to fill with the CPUs below @parent_node.
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> + struct acpi_pptt_processor *parent_node,
> + cpumask_t *cpus)
> +{
> + struct acpi_pptt_processor *cpu_node;
> + u32 acpi_id;
> + int cpu;
> +
> + cpumask_clear(cpus);
> +
> + for_each_possible_cpu(cpu) {
> + acpi_id = get_acpi_id_for_cpu(cpu);
> + cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
> +
> + while (cpu_node) {
> + if (cpu_node == parent_node) {
> + cpumask_set_cpu(cpu, cpus);
> + break;
> + }
> + cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> + }
> + }
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + * processor containers
> + * @acpi_cpu_id: The UID of the processor container.
> + * @cpus: The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> + struct acpi_pptt_processor *cpu_node;
> + struct acpi_table_header *table_hdr;
> + struct acpi_subtable_header *entry;
> + unsigned long table_end;
> + acpi_status status;
> + bool leaf_flag;
> + u32 proc_sz;
> +
> + cpumask_clear(cpus);
> +
> + status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
> + if (ACPI_FAILURE(status))
> + return;
> +
> + table_end = (unsigned long)table_hdr + table_hdr->length;
> + entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> + sizeof(struct acpi_table_pptt));
> + proc_sz = sizeof(struct acpi_pptt_processor);
> + while ((unsigned long)entry + proc_sz <= table_end) {
> + cpu_node = (struct acpi_pptt_processor *)entry;
> + if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> + cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> + leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
nit: Consider dropping the boolean leaf_flag and just using
acpi_pptt_leaf_node() in the condition. The name leaf_flag is slightly
overloaded to include the case when the acpi leaf flag is not supported
and dropping it would make the code more succinct.
> + if (!leaf_flag) {
> + if (cpu_node->acpi_processor_id == acpi_cpu_id)
> + acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> + }
> + }
> + entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> + entry->length);
> + }
> +
> + acpi_put_table(table_hdr);
> +}
> +
> static u8 acpi_cache_type(enum cache_type type)
> {
> switch (type) {
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 1c5bb1e887cd..f97a9ff678cc 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
> int find_acpi_cpu_topology_cluster(unsigned int cpu);
> int find_acpi_cpu_topology_package(unsigned int cpu);
> int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> #else
> static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
> {
> @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> {
> return -EINVAL;
> }
> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> + cpumask_t *cpus) { }
> #endif
>
> void acpi_arch_init(void);
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-08-27 8:53 ` Ben Horgan
2025-08-28 15:58 ` James Morse
2025-08-27 11:01 ` Dave Martin
1 sibling, 1 reply; 130+ messages in thread
From: Ben Horgan @ 2025-08-27 8:53 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
> platforms, that is where the Kconfig option makes the most sense.
>
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and registering the CPUs
> properties with the MPAM driver.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
> arch/arm64/Kconfig | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a6..658e47fc0c5a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> range of input addresses.
>
> +config ARM64_MPAM
> + bool "Enable support for MPAM"
> + help
> + Memory Partitioning and Monitoring is an optional extension
> + that allows the CPUs to mark load and store transactions with
> + labels for partition-id and performance-monitoring-group.
> + System components, such as the caches, can use the partition-id
> + to apply a performance policy. MPAM monitors can use the
> + partition-id and performance-monitoring-group to measure the
> + cache occupancy or data throughput.
> +
> + Use of this extension requires CPU support, support in the
> + memory system components (MSC), and a description from firmware
> + of where the MSC are in the address space.
> +
> + MPAM is exposed to user-space via the resctrl pseudo filesystem.
> +
> endmenu # "ARMv8.4 architectural features"
Should this be moved to "ARMv8.2 architectural features" rather than the
8.4 menu? In the arm reference manual, version L.b, I see FEAT_MPAM
listed in the section A2.2.3.1 Features added to the Armv8.2 extension
in later releases.
>
> menu "ARMv8.5 architectural features"
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
2025-08-23 12:14 ` Markus Elfring
@ 2025-08-27 9:25 ` Ben Horgan
2025-08-28 15:57 ` James Morse
2025-08-27 10:50 ` Dave Martin
2 siblings, 1 reply; 130+ messages in thread
From: Ben Horgan @ 2025-08-27 9:25 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
>
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
>
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> Add a cleanup based free-ing mechanism for acpi_get_table().
>
> CC: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * acpi_count_levels() now returns a value.
> * Converted the table-get stuff to use Jonathan's cleanup helper.
> * Dropped Sudeep's Review tag due to the cleanup change.
> ---
> drivers/acpi/pptt.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 17 ++++++++++++
> 2 files changed, 81 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 8f9b9508acba..660457644a5b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
> ACPI_PPTT_ACPI_IDENTICAL);
> }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> + u32 acpi_cpu_id;
> + int level, cpu, num_levels;
> + struct acpi_pptt_cache *cache;
> + struct acpi_pptt_cache_v1 *cache_v1;
> + struct acpi_pptt_processor *cpu_node;
> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> + if (IS_ERR(table))
> + return PTR_ERR(table);
> +
> + if (table->revision < 3)
> + return -ENOENT;
> +
> + /*
> + * If we found the cache first, we'd still need to walk from each CPU
> + * to find the level...
> + */
> + for_each_possible_cpu(cpu) {
> + acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> + cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> + if (!cpu_node)
> + return -ENOENT;
> + num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> + /* Start at 1 for L1 */
> + for (level = 1; level <= num_levels; level++) {
> + cache = acpi_find_cache_node(table, acpi_cpu_id,
> + ACPI_PPTT_CACHE_TYPE_UNIFIED,
> + level, &cpu_node);
> + if (!cache)
> + continue;
> +
> + cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> + cache,
> + sizeof(struct acpi_pptt_cache));
> +
> + if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> + cache_v1->cache_id == cache_id)
> + return level;
> + }
> + }
> +
> + return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..30c10b1dcdb2 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
> #ifndef _LINUX_ACPI_H
> #define _LINUX_ACPI_H
>
> +#include <linux/cleanup.h>
> #include <linux/errno.h>
> #include <linux/ioport.h> /* for struct resource */
> #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
> void acpi_table_init_complete (void);
> int acpi_table_init (void);
>
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> + struct acpi_table_header *table;
> + int status = acpi_get_table(signature, instance, &table);
> +
> + if (ACPI_FAILURE(status))
> + return ERR_PTR(-ENOENT);
> + return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
nit: Is it useful to change the condition from !IS_ERR(_T) to
!IS_ERR_OR_NULL(_T)? This seems to be the common pattern. I do note that
acpi_put_table() can take NULL, so there is no real danger.
> +
> int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
> int __init_or_acpilib acpi_table_parse_entries(char *id,
> unsigned long table_size, int entry_id,
> @@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
> int find_acpi_cpu_topology_package(unsigned int cpu);
> int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> +int find_acpi_cache_level_from_id(u32 cache_id);
> #else
> static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
> {
> @@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> }
> static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> cpumask_t *cpus) { }
> +static inline int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> + return -EINVAL;
> +}
> #endif
>
> void acpi_arch_init(void);
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-08-27 10:46 ` Dave Martin
2025-08-27 17:11 ` James Morse
0 siblings, 1 reply; 130+ messages in thread
From: Dave Martin @ 2025-08-27 10:46 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:42PM +0000, James Morse wrote:
> The MPAM driver identifies caches by id for use with resctrl. It
> needs to know the cache-id when probe-ing, but the value isn't set
> in cacheinfo until device_initcall().
>
> Expose the code that generates the cache-id. The parts of the MPAM
> driver that run early can use this to set up the resctrl structures
> before cacheinfo is ready in device_initcall().
Why can't the MPAM driver just consume the precomputed cache-id
information?
Possible reasons are that the MPAM driver probes too early, or that it
must parse the PPTT directly (which is true) and needs to label caches
consistently with the way the kernel does it.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
> * Renamed cache_of_get_id() cache_of_calculate_id().
> ---
> drivers/base/cacheinfo.c | 19 +++++++++++++------
> include/linux/cacheinfo.h | 1 +
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 613410705a47..f6289d142ba9 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
> #define arch_compact_of_hwid(_x) (_x)
> #endif
>
> -static void cache_of_set_id(struct cacheinfo *this_leaf,
> - struct device_node *cache_node)
> +unsigned long cache_of_calculate_id(struct device_node *cache_node)
> {
> struct device_node *cpu;
> - u32 min_id = ~0;
> + unsigned long min_id = ~0UL;
Why the change of type here?
This does mean that 0xffffffff can now be generated as a valid cache-id,
but if that is necessary then this patch is also fixing a bug in the
code -- but the commit message doesn't say anything about that.
For a patch that is just exposing an internal result, it may be
better to keep the original type. ~(u32)0 is already used as an
exceptional value.
[...]
Otherwise, this looks reasonable to me.
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
2025-08-24 17:25 ` Krzysztof Kozlowski
@ 2025-08-27 10:46 ` Dave Martin
2025-08-27 17:11 ` James Morse
1 sibling, 1 reply; 130+ messages in thread
From: Dave Martin @ 2025-08-27 10:46 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
> MPAM needs to know the size of a cache associated with a particular CPU.
> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>
> Add a helper to do this.
>
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> Changes since v1:
> * Converted to kdoc.
> * Simplified helper to use get_cpu_cacheinfo_level().
> ---
> include/linux/cacheinfo.h | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 2dcbb69139e9..e12d6f2c6a57 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
> return ci ? ci->id : -1;
> }
>
> +/**
> + * get_cpu_cacheinfo_size() - Get the size of the cache.
> + * @cpu: The cpu that is associated with the cache.
> + * @level: The level of the cache as seen by @cpu.
> + *
> + * Callers must hold the cpuhp lock.
> + * Returns the cache-size on success, or 0 for an error.
> + */
Nit: Maybe use the wording
cpuhp lock must be held.
in the kerneldoc here, to match the other helpers it sits alongside.
Otherwise, looks reasonable.
> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> +{
> + struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
> +
> + return ci ? ci->size : 0;
> +}
> +
Orphaned function?
Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
If so, this wouldn't just be dead code in this series.
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
2025-08-26 14:45 ` Ben Horgan
@ 2025-08-27 10:48 ` Dave Martin
2025-08-28 15:57 ` James Morse
1 sibling, 1 reply; 130+ messages in thread
From: Dave Martin @ 2025-08-27 10:48 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:44PM +0000, James Morse wrote:
> The PPTT describes CPUs and caches, as well as processor containers.
> The ACPI table for MPAM describes the set of CPUs that can access an MSC
> with the UID of a processor container.
>
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
Nit: The motivation for the change is not clear here.
I guess this boils down to the need to map the MSC topology information
in the the ACPI MPAM table to a cpumask for each MSC.
If so, a possible rearrangement and rewording might be, say:
--8<--
The ACPI MPAM table uses the UID of a processor container specified in
the PPTT, to indicate the subset of CPUs and upstream cache topology
that can access each MPAM Memory System Component (MSC).
This information is not directly useful to the kernel. The equivalent
cpumask is needed instead.
Add a helper to find the processor container by its id, then [...]
-->8--
>
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
> * Added missing : in kernel-doc
> * Made helper return void as this never actually returns an error.
> ---
> drivers/acpi/pptt.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 3 ++
> 2 files changed, 89 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..4791ca2bdfac 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
> return NULL;
> }
>
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
> + * @table_hdr: A reference to the PPTT table.
> + * @parent_node: A pointer to the processor node in the @table_hdr.
> + * @cpus: A cpumask to fill with the CPUs below @parent_node.
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> + struct acpi_pptt_processor *parent_node,
> + cpumask_t *cpus)
> +{
> + struct acpi_pptt_processor *cpu_node;
> + u32 acpi_id;
> + int cpu;
> +
> + cpumask_clear(cpus);
> +
> + for_each_possible_cpu(cpu) {
> + acpi_id = get_acpi_id_for_cpu(cpu);
^ Presumably this can't fail?
> + cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
> +
> + while (cpu_node) {
> + if (cpu_node == parent_node) {
> + cpumask_set_cpu(cpu, cpus);
> + break;
> + }
> + cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> + }
> + }
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + * processor containers
Nit: "containers" -> "container" ?
> + * @acpi_cpu_id: The UID of the processor container.
> + * @cpus: The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
(Revise this if dropping the leaf/non-leaf distinction -- see below.)
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> + struct acpi_pptt_processor *cpu_node;
> + struct acpi_table_header *table_hdr;
> + struct acpi_subtable_header *entry;
> + unsigned long table_end;
> + acpi_status status;
> + bool leaf_flag;
> + u32 proc_sz;
> +
> + cpumask_clear(cpus);
> +
> + status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
> + if (ACPI_FAILURE(status))
> + return;
Is acpi_get_pptt() applicable here?
(That function is not thread-safe, but then, perhaps most/all of these
functions are not thread safe. If we are still on the boot CPU at this
point (?) then this wouldn't be a concern.)
> +
> + table_end = (unsigned long)table_hdr + table_hdr->length;
> + entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> + sizeof(struct acpi_table_pptt));
> + proc_sz = sizeof(struct acpi_pptt_processor);
> + while ((unsigned long)entry + proc_sz <= table_end) {
Ack that this matches the bounds check in functions that are already
present.
> + cpu_node = (struct acpi_pptt_processor *)entry;
> + if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> + cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> + leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
> + if (!leaf_flag) {
> + if (cpu_node->acpi_processor_id == acpi_cpu_id)
Is there any need to distinguish processor containers from (leaf) CPU
nodes, here? If not, dropping the distinction might simplify the code
here (even if callers do not care).
Otherwise, maybe eliminate leaf_flag and collapse these into a single
if(), as suggested by Ben [1].
> + acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
Can there ever be multiple matches?
The possibility of duplicate processor IDs in the PPTT sounds weird to
me, but then I'm not an ACPI expert.
If there can only be a single match, though, then we may as well break
out of the loop here, unless we want to be paranoid and report
duplicates as an error -- but that would require extra implementation,
so I'm not sure that would be worth it.
> + }
> + }
> + entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> + entry->length);
> + }
> +
> + acpi_put_table(table_hdr);
> +}
[...]
[1] Ben Horgan, Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
https://lore.kernel.org/lkml/b032775e-1729-441a-8ec4-dd85f70055e8@arm.com/
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-08-27 10:49 ` Dave Martin
2025-08-28 15:57 ` James Morse
0 siblings, 1 reply; 130+ messages in thread
From: Dave Martin @ 2025-08-27 10:49 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:45PM +0000, James Morse wrote:
> acpi_count_levels() passes the number of levels back via a pointer argument.
> It also passes this to acpi_find_cache_level() as the starting_level, and
> preserves this value as it walks up the cpu_node tree counting the levels.
>
> This means the caller must initialise 'levels' due to acpi_count_levels()
> internals. The only caller acpi_get_cache_info() happens to have already
> initialised levels to zero, which acpi_count_levels() depends on to get the
> correct result.
>
> Two results are passed back from acpi_count_levels(), unlike split_levels,
> levels is not optional.
>
> Split these two results up. The mandatory 'levels' is always returned,
> which hides the internal details from the caller, and avoids having
> duplicated initialisation in all callers. split_levels remains an
> optional argument passed back.
Nit: I found all this a bit hard to follow.
This seems to boil down to:
--8<--
In acpi_count_levels(), the initial value of *levels passed by the
caller is really an implementation detail of acpi_count_levels(), so it
is unreasonable to expect the callers of this function to know what to
pass in for this parameter. The only sensible initial value is 0,
which is what the only upstream caller (acpi_get_cache_info()) passes.
Use a local variable for the starting cache level in acpi_count_levels(),
and pass the result back to the caller via the function return value.
Gid rid of the levels parameter, which has no remaining purpose.
Fix acpi_get_cache_info() to match.
-->8--
split_levels is orthogonal to this refactoring (as evinced by the diff).
I think mentioning it in the commit message at all may just add to the
confusion...
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> Changes since RFC:
> * Made acpi_count_levels() return the levels value.
> ---
> drivers/acpi/pptt.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 4791ca2bdfac..8f9b9508acba 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
> * levels and split cache levels (data/instruction).
> * @table_hdr: Pointer to the head of the PPTT table
> * @cpu_node: processor node we wish to count caches for
> - * @levels: Number of levels if success.
> * @split_levels: Number of split cache levels (data/instruction) if
> - * success. Can by NULL.
> + * success. Can be NULL.
> *
> + * Returns number of levels.
Nit: the prevailing convention in this file would be
Return: number of levels
(I don't know whether kerneldoc cares.)
Maybe also say "total number of levels" in place of "level", to make it
clearer that the split levels (if any) are included in this count.
> * Given a processor node containing a processing unit, walk into it and count
> * how many levels exist solely for it, and then walk up each level until we hit
> * the root node (ignore the package level because it may be possible to have
> @@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
> * split cache levels (data/instruction) that exist at each level on the way
> * up.
> */
> -static void acpi_count_levels(struct acpi_table_header *table_hdr,
> - struct acpi_pptt_processor *cpu_node,
> - unsigned int *levels, unsigned int *split_levels)
> +static int acpi_count_levels(struct acpi_table_header *table_hdr,
> + struct acpi_pptt_processor *cpu_node,
> + unsigned int *split_levels)
> {
> + int starting_level = 0;
> +
> do {
> - acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
> + acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
> cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> } while (cpu_node);
> +
> + return starting_level;
> }
>
> /**
> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
> if (!cpu_node)
> return -ENOENT;
>
> - acpi_count_levels(table, cpu_node, levels, split_levels);
> + *levels = acpi_count_levels(table, cpu_node, split_levels);
>
> pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
> *levels, split_levels ? *split_levels : -1);
Otherwise, looks reasonable to me.
(But see my comments on the next patches re whether we really need this.)
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
2025-08-23 12:14 ` Markus Elfring
2025-08-27 9:25 ` Ben Horgan
@ 2025-08-27 10:50 ` Dave Martin
2025-08-28 15:58 ` James Morse
2 siblings, 1 reply; 130+ messages in thread
From: Dave Martin @ 2025-08-27 10:50 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
>
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
>
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> Add a cleanup based free-ing mechanism for acpi_get_table().
Does this mean that the early secondaries must be spread out across the
whole topology so that everything can be probed?
(i.e., a random subset is no good?)
If so, is this documented somewhere, such as in booting.rst?
Maybe this is not a new requirement -- it's not an area that I'm very
familiar with.
>
> CC: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * acpi_count_levels() now returns a value.
> * Converted the table-get stuff to use Jonathan's cleanup helper.
> * Dropped Sudeep's Review tag due to the cleanup change.
> ---
> drivers/acpi/pptt.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 17 ++++++++++++
> 2 files changed, 81 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 8f9b9508acba..660457644a5b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
> ACPI_PPTT_ACPI_IDENTICAL);
> }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
Nit: doesn't exist or its revision is too old.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> + u32 acpi_cpu_id;
> + int level, cpu, num_levels;
> + struct acpi_pptt_cache *cache;
> + struct acpi_pptt_cache_v1 *cache_v1;
> + struct acpi_pptt_processor *cpu_node;
> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
acpi_get_pptt() ? (See comment on patch 3.)
Comments there also suggest that the acpi_put_table() may be
unnecessary, at least on some paths.
I haven't tried to understand the ins and outs of this.
> +
> + if (IS_ERR(table))
> + return PTR_ERR(table);
> +
> + if (table->revision < 3)
> + return -ENOENT;
> +
> + /*
> + * If we found the cache first, we'd still need to walk from each CPU
> + * to find the level...
> + */
^ Possibly confusing comment? The cache id is the starting point for
calling this function. Is there a world in which we are at this point
without first having found the cache node?
(If the comment is just a restatement of part of the kerneldoc
description, maybe just drop it.)
> + for_each_possible_cpu(cpu) {
> + acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> + cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> + if (!cpu_node)
> + return -ENOENT;
> + num_levels = acpi_count_levels(table, cpu_node, NULL);
Is the initial call to acpi_count_levels() really needed here?
It feels a bit like we end up enumerating the whole topology two or
three times here; once to count how many levels there are, and then
again to examine the nodes, and once more inside acpi_find_cache_node().
Why can't we just walk until we run out of levels?
I may be missing some details of how these functions interact -- if
this is only run at probe time, compact, well-factored code is
more important than making things as fast as possible.
> +
> + /* Start at 1 for L1 */
> + for (level = 1; level <= num_levels; level++) {
> + cache = acpi_find_cache_node(table, acpi_cpu_id,
> + ACPI_PPTT_CACHE_TYPE_UNIFIED,
> + level, &cpu_node);
> + if (!cache)
> + continue;
> +
> + cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> + cache,
> + sizeof(struct acpi_pptt_cache));
> +
> + if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> + cache_v1->cache_id == cache_id)
> + return level;
> + }
> + }
> +
> + return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..30c10b1dcdb2 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
[...]
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
> void acpi_table_init_complete (void);
> int acpi_table_init (void);
>
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> + struct acpi_table_header *table;
> + int status = acpi_get_table(signature, instance, &table);
> +
> + if (ACPI_FAILURE(status))
> + return ERR_PTR(-ENOENT);
> + return table;
> +}
This feels like something that ought to exist already. If not, why
not? If so, are there open-coded versions of this spread around the
ACPI tree that should be ported to use it?
[...]
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-08-27 10:53 ` Dave Martin
2025-08-28 15:58 ` James Morse
0 siblings, 1 reply; 130+ messages in thread
From: Dave Martin @ 2025-08-27 10:53 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:47PM +0000, James Morse wrote:
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>
> The driver needs to know which CPUs are associated with the cache,
> the CPUs may not all be online, so cacheinfo does not have the
> information.
Nit: cacheinfo lacking the information is not a consequence of the
driver needing it.
Maybe split the sentence:
-> "[...] associated with the cache. The CPUs may not [...]"
>
> Add a helper to pull this information out of the PPTT.
>
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
> Changes since RFC:
> * acpi_count_levels() now returns a value.
> * Converted the table-get stuff to use Jonathan's cleanup helper.
> * Dropped Sudeep's Review tag due to the cleanup change.
> ---
> drivers/acpi/pptt.c | 62 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/acpi.h | 6 +++++
> 2 files changed, 68 insertions(+)
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 660457644a5b..cb93a9a7f9b6 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>
> return -ENOENT;
> }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + * specified cache
> + * @cache_id: The id field of the unified cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> + u32 acpi_cpu_id;
> + int level, cpu, num_levels;
> + struct acpi_pptt_cache *cache;
> + struct acpi_pptt_cache_v1 *cache_v1;
> + struct acpi_pptt_processor *cpu_node;
> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> + cpumask_clear(cpus);
> +
> + if (IS_ERR(table))
> + return -ENOENT;
> +
> + if (table->revision < 3)
> + return -ENOENT;
> +
> + /*
> + * If we found the cache first, we'd still need to walk from each cpu.
> + */
> + for_each_possible_cpu(cpu) {
> + acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> + cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> + if (!cpu_node)
> + return 0;
> + num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> + /* Start at 1 for L1 */
> + for (level = 1; level <= num_levels; level++) {
> + cache = acpi_find_cache_node(table, acpi_cpu_id,
> + ACPI_PPTT_CACHE_TYPE_UNIFIED,
> + level, &cpu_node);
> + if (!cache)
> + continue;
> +
> + cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> + cache,
> + sizeof(struct acpi_pptt_cache));
> +
> + if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> + cache_v1->cache_id == cache_id)
> + cpumask_set_cpu(cpu, cpus);
Again, it feels like we are repeating the same walk multiple times to
determine how deep the table is (on which point the table is self-
describing anyway), and then again to derive some static property, and
then we are then doing all of that work multiple times to derive
different static properties, etc.
Can we not just walk over the tables once and stash the derived
properties somewhere?
I'm still getting my head around this parsing code, so I'm not saying
that the approach is incorrect here -- just wondering whether there is
a way to make it simpler.
[...]
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
2025-08-27 8:53 ` Ben Horgan
@ 2025-08-27 11:01 ` Dave Martin
1 sibling, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-08-27 11:01 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi,
<super-pedantic mode enabled>
(Since this likely be people's go-to patch for understanding what MPAM
is, it is probably worth going the extra mile.)
On Fri, Aug 22, 2025 at 03:29:48PM +0000, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
Prefer -> "[...] to enable it. As MPAM is only [...]"
> platforms, that is where the Kconfig option makes the most sense.
It could be clearer what "where" refers to, here.
Maybe reword from ", that is [...]" -> ", the arm64 tree is the most
natural home for the Kconfig option."
(Or something like that.)
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and registering the CPUs
Nit: "registering" -> "to register"
> properties with the MPAM driver.
Nit: "CPUs properties" -> "properties of CPUs" ?
(Maybe there was just a missed apostrophe, but it may be more readable
here if written out longhand.)
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
> arch/arm64/Kconfig | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a6..658e47fc0c5a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
> range of input addresses.
>
> +config ARM64_MPAM
> + bool "Enable support for MPAM"
> + help
<pedantic mode on>
> + Memory Partitioning and Monitoring is an optional extension
> + that allows the CPUs to mark load and store transactions with
Nit: "memory transactions" ?
(I'm wondering whether there are some transactions such as atomic
exchanges that are not neatly characterised as "load" or "store".
Possibly MPAM labels some transactions that really are neither.)
> + labels for partition-id and performance-monitoring-group.
Nit: the hyphenation suggests that these are known terms (in this
specific, hyphenated, form) with specific definitions somewhere.
I don't think that this is the case? At least, I have not seen the
terms presented in this way anywhere else.
Also, the partition ID is itself a label, so "label for partition-id"
is a tautology.
How about:
--8<--
Memory System Resource Partitioning and Monitoring (MPAM) is an
optional extension to the Arm architecture that allows each
transaction issued to the memory system to be labelled with a
Partition identifier (PARTID) and Performance Monitoring Group
identifier (PMG).
-->8--
(Yes, that really seems to be what MPAM stands for in the published
specs. That's quite a mounthful, and news to me... I can't say I paid
much attention to the document titles beyond "MPAM"!)
> + System components, such as the caches, can use the partition-id
> + to apply a performance policy. MPAM monitors can use the
What is a "performance policy"?
The MPAM specs talk about resource controls; it's probably best to
stick to the same terminology.
> + partition-id and performance-monitoring-group to measure the
> + cache occupancy or data throughput.
So, how about something like:
--8<--
Memory system components, such as the caches, can be configured with
policies to control how much of various physical resources (such as
memory bandwidth or cache memory) the transactions labelled with each
PARTID can consume. Depending on the capabilities of the hardware,
the PARTID and PMG can also be used as filtering criteria to measure
the memory system resource consumption of different parts of a
workload.
-->8--
(Where "Memory system components" is used in a generic sense and so not
capitalised.)
> +
> + Use of this extension requires CPU support, support in the
> + memory system components (MSC), and a description from firmware
But here, we are explicitly using an architectural term now, so
"Memory System Components" (MSC)
makes sense.
> + of where the MSC are in the address space.
Prefer "MSCs" ? (Not everyone agrees about whether TLAs are
pluralisable but it is easier on the reader if "are" has an obviously
plural noun to bind to.)
> +
> + MPAM is exposed to user-space via the resctrl pseudo filesystem.
> +
> endmenu # "ARMv8.4 architectural features"
>
> menu "ARMv8.5 architectural features"
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
2025-08-22 19:15 ` Markus Elfring
2025-08-22 19:55 ` Markus Elfring
@ 2025-08-27 13:03 ` Ben Horgan
2025-08-27 15:39 ` Rob Herring
` (2 subsequent siblings)
5 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-27 13:03 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
>
> Start with driver probe/remove and mapping the MSC.
>
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Check for status=broken DT devices.
> * Moved all the files around.
> * Made Kconfig symbols depend on EXPERT
> ---
> arch/arm64/Kconfig | 1 +
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/resctrl/Kconfig | 11 ++
> drivers/resctrl/Makefile | 4 +
> drivers/resctrl/mpam_devices.c | 336 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 62 ++++++
> 7 files changed, 417 insertions(+)
> create mode 100644 drivers/resctrl/Kconfig
> create mode 100644 drivers/resctrl/Makefile
> create mode 100644 drivers/resctrl/mpam_devices.c
> create mode 100644 drivers/resctrl/mpam_internal.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>
> config ARM64_MPAM
> bool "Enable support for MPAM"
> + select ARM64_MPAM_DRIVER
> select ACPI_MPAM if ACPI
> help
> Memory Partitioning and Monitoring is an optional extension
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>
> source "drivers/cdx/Kconfig"
>
> +source "drivers/resctrl/Kconfig"
> +
> endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE) += hte/
> obj-$(CONFIG_DRM_ACCEL) += accel/
> obj-$(CONFIG_CDX_BUS) += cdx/
> obj-$(CONFIG_DPLL) += dpll/
> +obj-y += resctrl/
>
> obj-$(CONFIG_S390) += s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> + bool "MPAM driver for System IP, e,g. caches and memory controllers"
> + depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> + bool "Enable debug messages from the MPAM driver."
> + depends on ARM64_MPAM_DRIVER
> + help
> + Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
> +mpam-y += mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
> +static u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void)
> +{
> + pr_err("Discovered all MSC\n");
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> + int count = 0;
> + struct device_node *np;
> +
> + for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> + if (of_device_is_available(np))
> + count++;
> + }
> +
> + return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> + u32 ris_idx)
> +{
> + int err = 0;
> + u32 level = 0;
> + unsigned long cache_id;
> + struct device_node *cache;
> +
> + do {
> + if (of_device_is_compatible(np, "arm,mpam-cache")) {
> + cache = of_parse_phandle(np, "arm,mpam-device", 0);
> + if (!cache) {
> + pr_err("Failed to read phandle\n");
> + break;
> + }
This looks like this allows "arm,mpam-cache" and "arm,mpam-device" to be
used on an msc node when there are no ris children. This usage could be
reasonable but doesn't match the schema in the previous patch. Should
this usage be rejected or the schema extended?
> + } else if (of_device_is_compatible(np->parent, "cache")) {
> + cache = of_node_get(np->parent);
> + } else {
> + /* For now, only caches are supported */
> + cache = NULL;
> + break;
> + }
> +
> + err = of_property_read_u32(cache, "cache-level", &level);
> + if (err) {
> + pr_err("Failed to read cache-level\n");
> + break;
> + }
> +
> + cache_id = cache_of_calculate_id(cache);
> + if (cache_id == ~0UL) {
> + err = -ENOENT;
> + break;
> + }
> +
> + err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> + cache_id);
> + } while (0);
> + of_node_put(cache);
> +
> + return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> + int err, num_ris = 0;
> + const u32 *ris_idx_p;
> + struct device_node *iter, *np;
> +
> + np = msc->pdev->dev.of_node;
> + for_each_child_of_node(np, iter) {
> + ris_idx_p = of_get_property(iter, "reg", NULL);
> + if (ris_idx_p) {
> + num_ris++;
> + err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> + if (err) {
> + of_node_put(iter);
> + return err;
> + }
> + }
> + }
> +
> + if (!num_ris)
> + mpam_dt_parse_resource(msc, np, 0);
> +
> + return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> + struct device_node *parent;
> + u32 affinity_id;
> + int err;
> +
> + if (!acpi_disabled) {
> + err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> + &affinity_id);
> + if (err)
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + else
> + acpi_pptt_get_cpus_from_container(affinity_id,
> + &msc->accessibility);
> +
> + return 0;
> + }
> +
> + /* This depends on the path to of_node */
> + parent = of_get_parent(msc->pdev->dev.of_node);
> + if (parent == of_root) {
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + err = 0;
> + } else {
> + err = -EINVAL;
> + pr_err("Cannot determine accessibility of MSC: %s\n",
> + dev_name(&msc->pdev->dev));
> + }
> + of_node_put(parent);
> +
> + return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> + /* TODO: wake up tasks blocked on this MSC's PCC channel */
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> + struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> + if (!msc)
> + return;
> +
> + mutex_lock(&mpam_list_lock);
> + mpam_num_msc--;
> + platform_set_drvdata(pdev, NULL);
> + list_del_rcu(&msc->glbl_list);
> + synchronize_srcu(&mpam_srcu);
> + devm_kfree(&pdev->dev, msc);
> + mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> + int err;
> + struct mpam_msc *msc;
> + struct resource *msc_res;
> + void *plat_data = pdev->dev.platform_data;
> +
> + mutex_lock(&mpam_list_lock);
> + do {
> + msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> + if (!msc) {
> + err = -ENOMEM;
> + break;
> + }
> +
> + mutex_init(&msc->probe_lock);
> + mutex_init(&msc->part_sel_lock);
> + mutex_init(&msc->outer_mon_sel_lock);
> + raw_spin_lock_init(&msc->inner_mon_sel_lock);
> + msc->id = mpam_num_msc++;
> + msc->pdev = pdev;
> + INIT_LIST_HEAD_RCU(&msc->glbl_list);
> + INIT_LIST_HEAD_RCU(&msc->ris);
> +
> + err = update_msc_accessibility(msc);
> + if (err)
> + break;
> + if (cpumask_empty(&msc->accessibility)) {
> + pr_err_once("msc:%u is not accessible from any CPU!",
> + msc->id);
> + err = -EINVAL;
> + break;
> + }
> +
> + if (device_property_read_u32(&pdev->dev, "pcc-channel",
> + &msc->pcc_subspace_id))
> + msc->iface = MPAM_IFACE_MMIO;
> + else
> + msc->iface = MPAM_IFACE_PCC;
> +
> + if (msc->iface == MPAM_IFACE_MMIO) {
> + void __iomem *io;
> +
> + io = devm_platform_get_and_ioremap_resource(pdev, 0,
> + &msc_res);
> + if (IS_ERR(io)) {
> + pr_err("Failed to map MSC base address\n");
> + err = PTR_ERR(io);
> + break;
> + }
> + msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> + msc->mapped_hwpage = io;
> + } else if (msc->iface == MPAM_IFACE_PCC) {
> + msc->pcc_cl.dev = &pdev->dev;
> + msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> + msc->pcc_cl.tx_block = false;
> + msc->pcc_cl.tx_tout = 1000; /* 1s */
> + msc->pcc_cl.knows_txdone = false;
> +
> + msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> + msc->pcc_subspace_id);
> + if (IS_ERR(msc->pcc_chan)) {
> + pr_err("Failed to request MSC PCC channel\n");
> + err = PTR_ERR(msc->pcc_chan);
> + break;
> + }
I don't see pcc support added in this series. Should we fail the probe
if this interface is specified?
(If keeping, there is a missing pcc_mbox_free_channel() on the error path.)
> + }
> +
> + list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> + platform_set_drvdata(pdev, msc);
> + } while (0);
> + mutex_unlock(&mpam_list_lock);
> +
> + if (!err) {
> + /* Create RIS entries described by firmware */
> + if (!acpi_disabled)
> + err = acpi_mpam_parse_resources(msc, plat_data);
> + else
> + err = mpam_dt_parse_resources(msc, plat_data);
> + }
> +
> + if (!err && fw_num_msc == mpam_num_msc)
> + mpam_discovery_complete();
> +
> + if (err && msc)
> + mpam_msc_drv_remove(pdev);
> +
> + return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> + { .compatible = "arm,mpam-msc", },
> + {},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> + .driver = {
> + .name = "mpam_msc",
> + .of_match_table = of_match_ptr(mpam_of_match),
> + },
> + .probe = mpam_msc_drv_probe,
> + .remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> + int err;
> + struct device_node *cache;
> +
> + for_each_compatible_node(cache, NULL, "cache") {
> + err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> + if (err)
> + pr_err("Failed to create MSC devices under caches\n");
> + }
> +}
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> + if (!system_supports_mpam())
> + return -EOPNOTSUPP;
> +
> + init_srcu_struct(&mpam_srcu);
> +
> + if (!acpi_disabled)
> + fw_num_msc = acpi_mpam_count_msc();
> + else
> + fw_num_msc = mpam_dt_count_msc();
> +
> + if (fw_num_msc <= 0) {
> + pr_err("No MSC devices found in firmware\n");
> + return -EINVAL;
> + }
> +
> + if (acpi_disabled)
> + mpam_dt_create_foundling_msc();
> +
> + return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> + /* member of mpam_all_msc */
> + struct list_head glbl_list;
> +
> + int id;
> + struct platform_device *pdev;
> +
> + /* Not modified after mpam_is_enabled() becomes true */
> + enum mpam_msc_iface iface;
> + u32 pcc_subspace_id;
> + struct mbox_client pcc_cl;
> + struct pcc_mbox_chan *pcc_chan;
> + u32 nrdy_usec;
> + cpumask_t accessibility;
> +
> + /*
> + * probe_lock is only take during discovery. After discovery these
nit: s/take/taken/
> + * properties become read-only and the lists are protected by SRCU.
> + */
> + struct mutex probe_lock;
> + unsigned long ris_idxs[128 / BITS_PER_LONG];
> + u32 ris_max;
> +
> + /* mpam_msc_ris of this component */
> + struct list_head ris;
> +
> + /*
> + * part_sel_lock protects access to the MSC hardware registers that are
> + * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> + * by RIS).
> + * If needed, take msc->lock first.
> + */
> + struct mutex part_sel_lock;
> +
> + /*
> + * mon_sel_lock protects access to the MSC hardware registers that are
> + * affeted by MPAMCFG_MON_SEL.
nit: s/affeted/affected/
> + * If needed, take msc->lock first.
> + */
> + struct mutex outer_mon_sel_lock;
> + raw_spinlock_t inner_mon_sel_lock;
> + unsigned long inner_mon_sel_flags;
> +
> + void __iomem *mapped_hwpage;
> + size_t mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
` (2 preceding siblings ...)
2025-08-27 13:03 ` Ben Horgan
@ 2025-08-27 15:39 ` Rob Herring
2025-08-27 16:16 ` Rob Herring
2025-09-01 9:11 ` Ben Horgan
2025-09-01 11:21 ` Dave Martin
5 siblings, 1 reply; 130+ messages in thread
From: Rob Herring @ 2025-08-27 15:39 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
>
> Start with driver probe/remove and mapping the MSC.
>
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Check for status=broken DT devices.
No such status... 'disabled' can be for a variety of reasons.
> * Moved all the files around.
> * Made Kconfig symbols depend on EXPERT
> ---
> arch/arm64/Kconfig | 1 +
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/resctrl/Kconfig | 11 ++
> drivers/resctrl/Makefile | 4 +
> drivers/resctrl/mpam_devices.c | 336 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 62 ++++++
> 7 files changed, 417 insertions(+)
> create mode 100644 drivers/resctrl/Kconfig
> create mode 100644 drivers/resctrl/Makefile
> create mode 100644 drivers/resctrl/mpam_devices.c
> create mode 100644 drivers/resctrl/mpam_internal.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>
> config ARM64_MPAM
> bool "Enable support for MPAM"
> + select ARM64_MPAM_DRIVER
> select ACPI_MPAM if ACPI
> help
> Memory Partitioning and Monitoring is an optional extension
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>
> source "drivers/cdx/Kconfig"
>
> +source "drivers/resctrl/Kconfig"
> +
> endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE) += hte/
> obj-$(CONFIG_DRM_ACCEL) += accel/
> obj-$(CONFIG_CDX_BUS) += cdx/
> obj-$(CONFIG_DPLL) += dpll/
> +obj-y += resctrl/
>
> obj-$(CONFIG_S390) += s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> + bool "MPAM driver for System IP, e,g. caches and memory controllers"
> + depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> + bool "Enable debug messages from the MPAM driver."
> + depends on ARM64_MPAM_DRIVER
> + help
> + Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
> +mpam-y += mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
Given the 2024 below, should this be 2024-2025?
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
> +static u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void)
> +{
> + pr_err("Discovered all MSC\n");
Perhaps print out how many MSCs.
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> + int count = 0;
> + struct device_node *np;
> +
> + for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> + if (of_device_is_available(np))
> + count++;
> + }
> +
> + return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> + u32 ris_idx)
> +{
> + int err = 0;
> + u32 level = 0;
> + unsigned long cache_id;
> + struct device_node *cache;
> +
> + do {
> + if (of_device_is_compatible(np, "arm,mpam-cache")) {
> + cache = of_parse_phandle(np, "arm,mpam-device", 0);
> + if (!cache) {
> + pr_err("Failed to read phandle\n");
> + break;
> + }
> + } else if (of_device_is_compatible(np->parent, "cache")) {
Don't access device_node members. I'm trying to make it opaque. And
technically it can be racy to access parent ptr when/if nodes are
dynamic. I think this should suffice:
else {
cache = of_get_parent(np);
if (!of_device_is_compatible(cache, "cache")) {
cache = NULL;
break;
}
}
> + cache = of_node_get(np->parent);
> + } else {
> + /* For now, only caches are supported */
> + cache = NULL;
> + break;
> + }
> +
> + err = of_property_read_u32(cache, "cache-level", &level);
> + if (err) {
> + pr_err("Failed to read cache-level\n");
> + break;
> + }
> +
> + cache_id = cache_of_calculate_id(cache);
> + if (cache_id == ~0UL) {
> + err = -ENOENT;
> + break;
> + }
> +
> + err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> + cache_id);
> + } while (0);
> + of_node_put(cache);
> +
> + return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> + int err, num_ris = 0;
> + const u32 *ris_idx_p;
> + struct device_node *iter, *np;
> +
> + np = msc->pdev->dev.of_node;
> + for_each_child_of_node(np, iter) {
Use for_each_available_child_of_node_scoped()
> + ris_idx_p = of_get_property(iter, "reg", NULL);
This is broken on big endian and new users of of_get_property() are
discouraged. Use of_property_read_reg().
> + if (ris_idx_p) {
> + num_ris++;
> + err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> + if (err) {
> + of_node_put(iter);
And then drop the put.
> + return err;
> + }
> + }
> + }
> +
> + if (!num_ris)
> + mpam_dt_parse_resource(msc, np, 0);
> +
> + return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> + struct device_node *parent;
> + u32 affinity_id;
> + int err;
> +
> + if (!acpi_disabled) {
> + err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> + &affinity_id);
> + if (err)
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + else
> + acpi_pptt_get_cpus_from_container(affinity_id,
> + &msc->accessibility);
> +
> + return 0;
> + }
> +
> + /* This depends on the path to of_node */
I'm failing to understand what has to be at the root node?
> + parent = of_get_parent(msc->pdev->dev.of_node);
> + if (parent == of_root) {
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + err = 0;
> + } else {
> + err = -EINVAL;
> + pr_err("Cannot determine accessibility of MSC: %s\n",
> + dev_name(&msc->pdev->dev));
> + }
> + of_node_put(parent);
> +
> + return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> + /* TODO: wake up tasks blocked on this MSC's PCC channel */
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> + struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> + if (!msc)
> + return;
> +
> + mutex_lock(&mpam_list_lock);
> + mpam_num_msc--;
> + platform_set_drvdata(pdev, NULL);
> + list_del_rcu(&msc->glbl_list);
> + synchronize_srcu(&mpam_srcu);
> + devm_kfree(&pdev->dev, msc);
This should happen automagically.
> + mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> + int err;
> + struct mpam_msc *msc;
> + struct resource *msc_res;
> + void *plat_data = pdev->dev.platform_data;
> +
> + mutex_lock(&mpam_list_lock);
> + do {
> + msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> + if (!msc) {
> + err = -ENOMEM;
> + break;
> + }
> +
> + mutex_init(&msc->probe_lock);
> + mutex_init(&msc->part_sel_lock);
> + mutex_init(&msc->outer_mon_sel_lock);
> + raw_spin_lock_init(&msc->inner_mon_sel_lock);
> + msc->id = mpam_num_msc++;
Multiple probe functions can run in parallel, so this needs to be
atomic. Maybe it is with mpam_list_lock, but then the name of the
mutex is misleading given this is not the list. It's not really clear
to me what all needs the mutex here. Certainly a lot of it doesn't.
Like everything else above here except the increment.
> + msc->pdev = pdev;
> + INIT_LIST_HEAD_RCU(&msc->glbl_list);
> + INIT_LIST_HEAD_RCU(&msc->ris);
> +
> + err = update_msc_accessibility(msc);
> + if (err)
> + break;
> + if (cpumask_empty(&msc->accessibility)) {
> + pr_err_once("msc:%u is not accessible from any CPU!",
> + msc->id);
> + err = -EINVAL;
> + break;
> + }
> +
> + if (device_property_read_u32(&pdev->dev, "pcc-channel",
Does this property apply to DT? It would as the code is written. It is
not documented though.
> + &msc->pcc_subspace_id))
> + msc->iface = MPAM_IFACE_MMIO;
> + else
> + msc->iface = MPAM_IFACE_PCC;
> +
> + if (msc->iface == MPAM_IFACE_MMIO) {
> + void __iomem *io;
> +
> + io = devm_platform_get_and_ioremap_resource(pdev, 0,
> + &msc_res);
> + if (IS_ERR(io)) {
> + pr_err("Failed to map MSC base address\n");
> + err = PTR_ERR(io);
> + break;
> + }
> + msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> + msc->mapped_hwpage = io;
> + } else if (msc->iface == MPAM_IFACE_PCC) {
> + msc->pcc_cl.dev = &pdev->dev;
> + msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> + msc->pcc_cl.tx_block = false;
> + msc->pcc_cl.tx_tout = 1000; /* 1s */
> + msc->pcc_cl.knows_txdone = false;
> +
> + msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> + msc->pcc_subspace_id);
> + if (IS_ERR(msc->pcc_chan)) {
> + pr_err("Failed to request MSC PCC channel\n");
> + err = PTR_ERR(msc->pcc_chan);
> + break;
> + }
> + }
> +
> + list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> + platform_set_drvdata(pdev, msc);
> + } while (0);
> + mutex_unlock(&mpam_list_lock);
> +
> + if (!err) {
> + /* Create RIS entries described by firmware */
> + if (!acpi_disabled)
> + err = acpi_mpam_parse_resources(msc, plat_data);
> + else
> + err = mpam_dt_parse_resources(msc, plat_data);
Isn't there a race here if an error occurs since you already added the
MSC to the list? Something like this sequence with 2 MSCs:
device 1 probe
device 1 added
device 2 probe
device 2 added
device 1 calls mpam_discovery_complete()
device 2 error on parse_resources
device 2 removed
> + }
> +
> + if (!err && fw_num_msc == mpam_num_msc)
> + mpam_discovery_complete();
> +
> + if (err && msc)
> + mpam_msc_drv_remove(pdev);
> +
> + return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> + { .compatible = "arm,mpam-msc", },
> + {},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> + .driver = {
> + .name = "mpam_msc",
> + .of_match_table = of_match_ptr(mpam_of_match),
> + },
> + .probe = mpam_msc_drv_probe,
> + .remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> + int err;
> + struct device_node *cache;
> +
> + for_each_compatible_node(cache, NULL, "cache") {
> + err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
This is going to create platform devices for all caches (except L1)
regardless of whether they support MPAM or not. Isn't it likely or
possible that only L3 or SLC caches support MPAM?
> + if (err)
> + pr_err("Failed to create MSC devices under caches\n");
> + }
> +}
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> + if (!system_supports_mpam())
> + return -EOPNOTSUPP;
> +
> + init_srcu_struct(&mpam_srcu);
> +
> + if (!acpi_disabled)
> + fw_num_msc = acpi_mpam_count_msc();
> + else
> + fw_num_msc = mpam_dt_count_msc();
> +
> + if (fw_num_msc <= 0) {
> + pr_err("No MSC devices found in firmware\n");
> + return -EINVAL;
> + }
> +
> + if (acpi_disabled)
> + mpam_dt_create_foundling_msc();
> +
> + return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
It's 2025.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> + /* member of mpam_all_msc */
> + struct list_head glbl_list;
> +
> + int id;
> + struct platform_device *pdev;
> +
> + /* Not modified after mpam_is_enabled() becomes true */
> + enum mpam_msc_iface iface;
> + u32 pcc_subspace_id;
> + struct mbox_client pcc_cl;
> + struct pcc_mbox_chan *pcc_chan;
> + u32 nrdy_usec;
> + cpumask_t accessibility;
> +
> + /*
> + * probe_lock is only take during discovery. After discovery these
s/take/taken/
> + * properties become read-only and the lists are protected by SRCU.
> + */
> + struct mutex probe_lock;
> + unsigned long ris_idxs[128 / BITS_PER_LONG];
> + u32 ris_max;
> +
> + /* mpam_msc_ris of this component */
> + struct list_head ris;
> +
> + /*
> + * part_sel_lock protects access to the MSC hardware registers that are
> + * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> + * by RIS).
> + * If needed, take msc->lock first.
Stale comment? I don't see any 'lock' member.
> + */
> + struct mutex part_sel_lock;
> +
> + /*
> + * mon_sel_lock protects access to the MSC hardware registers that are
> + * affeted by MPAMCFG_MON_SEL.
> + * If needed, take msc->lock first.
> + */
> + struct mutex outer_mon_sel_lock;
> + raw_spinlock_t inner_mon_sel_lock;
> + unsigned long inner_mon_sel_flags;
> +
> + void __iomem *mapped_hwpage;
> + size_t mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
2025-08-23 10:55 ` Markus Elfring
@ 2025-08-27 16:05 ` Dave Martin
1 sibling, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-08-27 16:05 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi,
[Note, looks like I crossed over with Rob here -- apologies for any
duplicate or conflicting comments.]
On Fri, Aug 22, 2025 at 03:29:49PM +0000, James Morse wrote:
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
Might be worth mentioning that the hook for feeding the parsed factoids
into the driver (mpam_ris_create()) is not implemented for now.
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>
>
> ---
> Changes since RFC:
> * Used DEFINE_RES_IRQ_NAMED() and friends macros.
> * Additional error handling.
> * Check for zero sized MSC.
> * Allow table revisions greater than 1. (no spec for revision 0!)
> * Use cleanup helpers to retrive ACPI tables, which allows some functions
> to be folded together.
> ---
> arch/arm64/Kconfig | 1 +
> drivers/acpi/arm64/Kconfig | 3 +
> drivers/acpi/arm64/Makefile | 1 +
> drivers/acpi/arm64/mpam.c | 331 ++++++++++++++++++++++++++++++++++++
> drivers/acpi/tables.c | 2 +-
> include/linux/arm_mpam.h | 46 +++++
> 6 files changed, 383 insertions(+), 1 deletion(-)
> create mode 100644 drivers/acpi/arm64/mpam.c
> create mode 100644 include/linux/arm_mpam.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 658e47fc0c5a..e51ccf1da102 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>
> config ARM64_MPAM
> bool "Enable support for MPAM"
> + select ACPI_MPAM if ACPI
> help
> Memory Partitioning and Monitoring is an optional extension
> that allows the CPUs to mark load and store transactions with
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>
> config ACPI_APMT
> bool
> +
> +config ACPI_MPAM
> + bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) += apmt.o
> obj-$(CONFIG_ACPI_FFH) += ffh.o
> obj-$(CONFIG_ACPI_GTDT) += gtdt.o
> obj-$(CONFIG_ACPI_IORT) += iort.o
> +obj-$(CONFIG_ACPI_MPAM) += mpam.o
> obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
> obj-$(CONFIG_ARM_AMBA) += amba.o
> obj-y += dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..e55fc2729ac5
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,331 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED 0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID BIT(4)
> +
> +static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
> + int *irq, u32 processor_container_uid)
Can this have a name, please?
> +{
> + int sense;
> +
> + if (!intid)
> + return false;
> +
> + if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> + ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> + return false;
> +
> + sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
Do we handle cross-endian ACPI tables?
ACPI defers to the relevant specification regarding the endianness of
externally defined tables, but as of v3.0 (beta) of for MPAM ACPI
spec [1], no statement is made about this.
Following the spirit of the ACPI core specs, I suspect that the
"correct" answer is that MPAM tables are always little-endian, even if
it not written down anywhere.
If the kernel is big-endian, we lose.
Maybe it is sufficient to make CONFIG_ACPI_MPAM depend on
!CONFIG_CPU_BIG_ENDIAN for now.
I haven't tried to understand how this is handled for other tables.
> +
> + /*
> + * If the GSI is in the GIC's PPI range, try and create a partitioned
> + * percpu interrupt.
But actually we don't even try? Or did I miss something?
> + */
> + if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
checkpatch.pl says:
| WARNING: Comparisons should place the constant on the right side of the test
| #108: FILE: drivers/acpi/arm64/mpam.c:45:
| + if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
(Dubious whether this is "wrong" IMHO, but still probably best avoided
since it is not what people are used to seeing.)
> + pr_err_once("Partitioned interrupts not supported\n");
> + return false;
> + }
> +
> + *irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> + if (*irq <= 0) {
> + pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> + intid);
Are we going to get a lot of duplicate error messages with the same
interrupt? If not, perhaps make this a pr_err() so that all the
affected interrupts are notified?
(Either way, hopefully the user will take the hint that they messed
something up though.)
> + return false;
> + }
> +
> + return true;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> + struct acpi_mpam_msc_node *tbl_msc,
> + struct resource *res, int *res_idx)
> +{
> + u32 flags, aff;
> + int irq;
We may still get in here if MPAMF_IDR.HAS_ERR_MSI and/or
MPAMF_MSMON_IDR.HAS_OFLW_MSI is set. If so, there is no wired
interrupt. Does it matter if we still parse and allocate the wired
interrupts here?
> +
> + flags = tbl_msc->overflow_interrupt_flags;
> + if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> + flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> + aff = tbl_msc->overflow_interrupt_affinity;
> + else
> + aff = ~0;
(u32)~0 is used as an exceptional UID all over the place. If this is
not a pre-existing convention, it could be worth having a #define for
this. (grep '~0' drivers/acpi/ suggests that this is new.)
> + if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> + res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
I couldn't find a statement in the spec of how the table can specify
that there is no interrupt.
Are the interrupts always required for ACPI-based MPAM systems?
overflow_interrupt and error_interrupt are GSIVs, which seems to be an
ACPI thing. The examples in the ACPI spec suggest that 0 can be a
valid value. No exceptional value seems to be defined.
The flags fields have some invalid encodings, but no explicit "no
interrupt" encoding that I can see.
> +
> + flags = tbl_msc->error_interrupt_flags;
> + if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> + flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> + aff = tbl_msc->error_interrupt_affinity;
> + else
> + aff = ~0;
> + if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> + res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> + struct acpi_mpam_resource_node *res)
> +{
> + int level, nid;
> + u32 cache_id;
> +
> + switch (res->locator_type) {
> + case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> + cache_id = res->locator.cache_locator.cache_reference;
> + level = find_acpi_cache_level_from_id(cache_id);
> + if (level <= 0) {
> + pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> + return -EINVAL;
> + }
> + return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> + level, cache_id);
> + case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> + nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> + if (nid == NUMA_NO_NODE)
> + nid = 0;
> + return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> + 255, nid);
> + default:
> + /* These get discovered later and treated as unknown */
> + return 0;
> + }
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> + struct acpi_mpam_msc_node *tbl_msc)
> +{
> + int i, err;
> + struct acpi_mpam_resource_node *resources;
> +
> + resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
Should we check that we don't go out of the bounds of the MSC node
(or, at the very least, of the MPAM table)?
If tbl_msc->length was already validated, that can be used for the
bounds check.
> + for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> + err = acpi_mpam_parse_resource(msc, &resources[i]);
Isn't the length of each resource node variable? According to [2],
the length depends on the num_functional_deps field. It looks like the
functional dependency descriptors (if any) are appended contiguously to
the resource node, unless I've misunderstood something.
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> + struct platform_device *pdev,
> + u32 *acpi_id)
> +{
> + bool acpi_id_valid = false;
> + struct acpi_device *buddy;
> + char hid[16], uid[16];
> + int err;
> +
> + memset(&hid, 0, sizeof(hid));
> + memcpy(hid, &tbl_msc->hardware_id_linked_device,
> + sizeof(tbl_msc->hardware_id_linked_device));
This is safe by semi-accident, since 16 > 8.
It might be cleaner to declare
char hid[sizeof(tbl_msc->hardware_id_linked_device)];
which can never be wrong.
memset()+memcpy() might be better replaced with strscpy() (or just use
snprintf again, since this avoids having to think about multiple
different ways of avoiding buffer overflows at the same time. This is
not a fast path.)
> +
> + if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> + *acpi_id = tbl_msc->instance_id_linked_device;
> + acpi_id_valid = true;
> + }
> +
> + err = snprintf(uid, sizeof(uid), "%u",
char uid[11]; would be sufficient, here. The instance ID is strictly
32-bit. Adding a safety margin is worthless here, since snprintf()
checks the bounds -- either the size is sufficient for all possible u32
values, or it isn't.
> + tbl_msc->instance_id_linked_device);
Can snprintf() return < 0 on error?
I don't know, but elsewhere you do check for this. I tend to the view
that is cleaner to assume that the kernel's snprintf() is just as
hostile as C's version (if not more so).
(-1 >= sizeof(foo) is always true thanks to the C arithmetic conversion
rules, but it's probably best not to rely on it.)
> + if (err >= sizeof(uid))
> + return acpi_id_valid;
Possibly return true on error? Why?
> +
> + buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> + if (buddy)
> + device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> +
> + return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> + enum mpam_msc_iface *iface)
> +{
> + switch (tbl_msc->interface_type) {
> + case 0:
> + *iface = MPAM_IFACE_MMIO;
> + return 0;
> + case 0xa:
> + *iface = MPAM_IFACE_PCC;
> + return 0;
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int __init acpi_mpam_parse(void)
> +{
> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
checkpatch.pl says:
| ERROR: code indent should use tabs where possible
| #240: FILE: drivers/acpi/arm64/mpam.c:177:
| + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
|
| WARNING: please, no spaces at the start of a line
| #240: FILE: drivers/acpi/arm64/mpam.c:177:
| + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> + char *table_end, *table_offset = (char *)(table + 1);
> + struct property_entry props[4]; /* needs a sentinel */
> + struct acpi_mpam_msc_node *tbl_msc;
> + int next_res, next_prop, err = 0;
> + struct acpi_device *companion;
> + struct platform_device *pdev;
> + enum mpam_msc_iface iface;
> + struct resource res[3];
> + char uid[16];
> + u32 acpi_id;
> +
> + if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> + return 0;
> +
> + if (IS_ERR(table))
> + return 0;
> +
> + if (table->revision < 1)
> + return 0;
> +
> + table_end = (char *)table + table->length;
> +
> + while (table_offset < table_end) {
> + tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> + table_offset += tbl_msc->length;
> +
> + /*
> + * If any of the reserved fields are set, make no attempt to
> + * parse the msc structure. This will prevent the driver from
> + * probing all the MSC, meaning it can't discover the system
> + * wide supported partid and pmg ranges. This avoids whatever
> + * this MSC is truncating the partids and creating a screaming
Mangled sentence?
I have not so far found any reference in [2] to the reset value of the
MPAMF_ECR.INTEN bit. Do we rely on the error interrupt(s) for all MSCs
to be disabled at the interrupt controller? If the same interrupt may
be shared by multiple MSCs, that's bad.
> + * error interrupt.
> + */
> + if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
> + continue;
The specs are not clear about how backwards compatibility is supposed
to work.
I would feel a bit uneasy about silently throwing away MSCs based on
critera that may not indicate incompatibility, and without even a
diagnostic.
> +
> + if (!tbl_msc->mmio_size)
> + continue;
> +
> + if (decode_interface_type(tbl_msc, &iface))
> + continue;
Ditto regarding diagnostics.
> +
> + next_res = 0;
> + next_prop = 0;
> + memset(res, 0, sizeof(res));
> + memset(props, 0, sizeof(props));
> +
> + pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
If the tbl_msc->identifier values contain duplicates, we will get a
platform device with a duplicate name here. I don't know whether it
matters.
> + if (!pdev) {
> + err = -ENOMEM;
> + break;
> + }
> +
> + if (tbl_msc->length < sizeof(*tbl_msc)) {
> + err = -EINVAL;
> + break;
> + }
No check for oversized tbl_msc->length? (See also
acpi_mpam_count_msc().)
> +
> + /* Some power management is described in the namespace: */
> + err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> + if (err > 0 && err < sizeof(uid)) {
> + companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
Diagnostic?
> + if (companion)
> + ACPI_COMPANION_SET(&pdev->dev, companion);
> + }
> +
> + if (iface == MPAM_IFACE_MMIO) {
> + res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> + tbl_msc->mmio_size,
> + "MPAM:MSC");
> + } else if (iface == MPAM_IFACE_PCC) {
> + props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> + tbl_msc->base_address);
> + next_prop++;
> + }
> +
> + acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> + err = platform_device_add_resources(pdev, res, next_res);
> + if (err)
> + break;
> +
> + props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> + tbl_msc->max_nrdy_usec);
> +
> + /*
> + * The MSC's CPU affinity is described via its linked power
> + * management device, but only if it points at a Processor or
> + * Processor Container.
> + */
> + if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
> + props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
> + acpi_id);
> + }
> +
> + err = device_create_managed_software_node(&pdev->dev, props,
> + NULL);
> + if (err)
> + break;
> +
> + /* Come back later if you want the RIS too */
> + err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> + if (err)
> + break;
> +
> + err = platform_device_add(pdev);
> + if (err)
> + break;
> + }
> +
> + if (err)
> + platform_device_put(pdev);
> +
> + return err;
> +}
> +
> +int acpi_mpam_count_msc(void)
> +{
> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
checkpatch.pl says:
| ERROR: code indent should use tabs where possible
| #359: FILE: drivers/acpi/arm64/mpam.c:296:
| + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
|
| WARNING: please, no spaces at the start of a line
| #359: FILE: drivers/acpi/arm64/mpam.c:296:
| + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> + char *table_end, *table_offset = (char *)(table + 1);
> + struct acpi_mpam_msc_node *tbl_msc;
> + int count = 0;
> +
> + if (IS_ERR(table))
> + return 0;
> +
> + if (table->revision < 1)
> + return 0;
> +
> + tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
Can this be moved into the loop? It looks like it just duplicates the
update to tbl_msc at the end of the loop; the loop termination
condition does not depend on this variable.
> + table_end = (char *)table + table->length;
> +
> + while (table_offset < table_end) {
> + if (!tbl_msc->mmio_size)
> + continue;
This is 0 for non-usable PCC-based MSCs, right?
(Why explicitly unusable MSCs are listed in the table at all is a
mystery to me, but that's what the spec says. I guess there must be a
reason.)
> +
> + if (tbl_msc->length < sizeof(*tbl_msc))
> + return -EINVAL;
Should we also have something like (not tested):
if (tbl_msc->length > table_end - table_offset)
return -EINVAL;
Also, is it an error if a length is not a multiple of four bytes?
(I'm guessing that the core ACPI code doesn't try to understand the
contents of the MPAM table and so doesn't check this.)
> +
> + count++;
> +
> + table_offset += tbl_msc->length;
> + tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> + }
> +
> + return count;
> +}
> +
> +/*
> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index fa9bb8c8ce95..835e3795ede3 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
> ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
> ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
> ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> - ACPI_SIG_NBFT };
> + ACPI_SIG_NBFT, ACPI_SIG_MPAM };
>
> #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..0edefa6ba019
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
checkpatch.pl says:
| WARNING: Improper SPDX comment style for 'include/linux/arm_mpam.h', please use '/*' instead
| #414: FILE: include/linux/arm_mpam.h:1:
| +// SPDX-License-Identifier: GPL-2.0
|
| WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
| #414: FILE: include/linux/arm_mpam.h:1:
| +// SPDX-License-Identifier: GPL-2.0
(That's probably the same error twice.)
(I never understood why the SPDX folks couldn't have allowed either
type of comment -- or at least, the same style in .c and .h files.
But I'm sure they had a reason that they believed was good.)
[...]
Cheers
---Dave
[1] ACPI for Memory System Resource Partitioning and Monitoring, 3.0 beta
https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
[2] Arm Memory System Resource Partitioning and Monitoring (MPAM)
System Component Specification, ARM IHI 0099A.a
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-08-27 16:08 ` Rob Herring
0 siblings, 0 replies; 130+ messages in thread
From: Rob Herring @ 2025-08-27 16:08 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich, Lecopzer Chen
On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
>
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed.
>
> Later once MPAM is enabled, this cpuhp callback will be replaced by
> one that avoids the global list.
>
> Enabling a static key will also take the cpuhp lock, so can't be done
> from the cpuhp callback. Whenever a new MSC has been probed schedule
> work to test if all the MSCs have now been probed.
>
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 144 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 8 +-
> 2 files changed, 147 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 5baf2a8786fb..9d6516f98acf 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -4,6 +4,7 @@
> #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>
> #include <linux/acpi.h>
> +#include <linux/atomic.h>
> #include <linux/arm_mpam.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpu.h>
> @@ -21,6 +22,7 @@
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> #include <linux/types.h>
> +#include <linux/workqueue.h>
>
> #include <acpi/pcc.h>
>
> @@ -39,6 +41,16 @@ struct srcu_struct mpam_srcu;
> /* MPAM isn't available until all the MSC have been probed. */
> static u32 mpam_num_msc;
>
> +static int mpam_cpuhp_state;
> +static DEFINE_MUTEX(mpam_cpuhp_state_lock);
> +
> +/*
> + * mpam is enabled once all devices have been probed from CPU online callbacks,
> + * scheduled via this work_struct. If access to an MSC depends on a CPU that
> + * was not brought online at boot, this can happen surprisingly late.
> + */
> +static DECLARE_WORK(mpam_enable_work, &mpam_enable);
> +
> /*
> * An MSC is a physical container for controls and monitors, each identified by
> * their RIS index. These share a base-address, interrupts and some MMIO
> @@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
> /* List of all objects that can be free()d after synchronise_srcu() */
> static LLIST_HEAD(mpam_garbage);
>
> +static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
> +{
> + WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
> + WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
These either make __mpam_read_reg uninlined or add 2 checks to every
register read. Neither seems very good.
> +
> + return readl_relaxed(msc->mapped_hwpage + reg);
> +}
> +
> +static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
> +{
> + lockdep_assert_held_once(&msc->part_sel_lock);
Similar thing here.
> + return __mpam_read_reg(msc, reg);
> +}
> +
> +#define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
> +
> #define init_garbage(x) init_llist_node(&(x)->garbage.llist)
>
> static struct mpam_vmsc *
> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> return err;
> }
>
> -static void mpam_discovery_complete(void)
It is annoying to review things which disappear in later patches...
> +static int mpam_msc_hw_probe(struct mpam_msc *msc)
> +{
> + u64 idr;
> + int err;
> +
> + lockdep_assert_held(&msc->probe_lock);
> +
> + mutex_lock(&msc->part_sel_lock);
> + idr = mpam_read_partsel_reg(msc, AIDR);
I don't think AIDR access depends on PART_SEL.
> + if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
> + pr_err_once("%s does not match MPAM architecture v1.x\n",
> + dev_name(&msc->pdev->dev));
> + err = -EIO;
> + } else {
> + msc->probed = true;
> + err = 0;
> + }
> + mutex_unlock(&msc->part_sel_lock);
> +
> + return err;
> +}
> +
> +static int mpam_cpu_online(unsigned int cpu)
> {
> - pr_err("Discovered all MSC\n");
> + return 0;
> +}
> +
> +/* Before mpam is enabled, try to probe new MSC */
> +static int mpam_discovery_cpu_online(unsigned int cpu)
> +{
> + int err = 0;
> + struct mpam_msc *msc;
> + bool new_device_probed = false;
> +
> + mutex_lock(&mpam_list_lock);
> + list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> + if (!cpumask_test_cpu(cpu, &msc->accessibility))
> + continue;
> +
> + mutex_lock(&msc->probe_lock);
> + if (!msc->probed)
> + err = mpam_msc_hw_probe(msc);
> + mutex_unlock(&msc->probe_lock);
> +
> + if (!err)
> + new_device_probed = true;
> + else
> + break; // mpam_broken
> + }
> + mutex_unlock(&mpam_list_lock);
> +
> + if (new_device_probed && !err)
> + schedule_work(&mpam_enable_work);
> +
> + return err;
> +}
> +
> +static int mpam_cpu_offline(unsigned int cpu)
> +{
> + return 0;
> +}
> +
> +static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
> + int (*offline)(unsigned int offline))
> +{
> + mutex_lock(&mpam_cpuhp_state_lock);
> + if (mpam_cpuhp_state) {
> + cpuhp_remove_state(mpam_cpuhp_state);
> + mpam_cpuhp_state = 0;
> + }
> +
> + mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
> + online, offline);
> + if (mpam_cpuhp_state <= 0) {
> + pr_err("Failed to register cpuhp callbacks");
> + mpam_cpuhp_state = 0;
> + }
> + mutex_unlock(&mpam_cpuhp_state_lock);
> }
>
> static int mpam_dt_count_msc(void)
> @@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
> }
>
> if (!err && fw_num_msc == mpam_num_msc)
> - mpam_discovery_complete();
> + mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
>
> if (err && msc)
> mpam_msc_drv_remove(pdev);
> @@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
> .remove = mpam_msc_drv_remove,
> };
>
> +static void mpam_enable_once(void)
> +{
> + mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
> +
> + pr_info("MPAM enabled\n");
> +}
> +
> +/*
> + * Enable mpam once all devices have been probed.
> + * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
> + * Also scheduled when new devices are probed when new CPUs come online.
> + */
> +void mpam_enable(struct work_struct *work)
> +{
> + static atomic_t once;
> + struct mpam_msc *msc;
> + bool all_devices_probed = true;
> +
> + /* Have we probed all the hw devices? */
> + mutex_lock(&mpam_list_lock);
> + list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> + mutex_lock(&msc->probe_lock);
> + if (!msc->probed)
> + all_devices_probed = false;
> + mutex_unlock(&msc->probe_lock);
> +
> + if (!all_devices_probed)
> + break;
> + }
> + mutex_unlock(&mpam_list_lock);
> +
> + if (all_devices_probed && !atomic_fetch_inc(&once))
> + mpam_enable_once();
> +}
> +
> /*
> * MSC that are hidden under caches are not created as platform devices
> * as there is no cache driver. Caches are also special-cased in
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 6e0982a1a9ac..a98cca08a2ef 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -49,6 +49,7 @@ struct mpam_msc {
> * properties become read-only and the lists are protected by SRCU.
> */
> struct mutex probe_lock;
> + bool probed;
> unsigned long ris_idxs[128 / BITS_PER_LONG];
> u32 ris_max;
>
> @@ -59,14 +60,14 @@ struct mpam_msc {
> * part_sel_lock protects access to the MSC hardware registers that are
> * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> * by RIS).
> - * If needed, take msc->lock first.
> + * If needed, take msc->probe_lock first.
Humm. I think this belongs in patch 10.
> */
> struct mutex part_sel_lock;
>
> /*
> * mon_sel_lock protects access to the MSC hardware registers that are
> * affeted by MPAMCFG_MON_SEL.
> - * If needed, take msc->lock first.
> + * If needed, take msc->probe_lock first.
> */
> struct mutex outer_mon_sel_lock;
> raw_spinlock_t inner_mon_sel_lock;
> @@ -147,6 +148,9 @@ struct mpam_msc_ris {
> extern struct srcu_struct mpam_srcu;
> extern struct list_head mpam_classes;
>
> +/* Scheduled work callback to enable mpam once all MSC have been probed */
> +void mpam_enable(struct work_struct *work);
> +
> int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> cpumask_t *affinity);
>
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-27 15:39 ` Rob Herring
@ 2025-08-27 16:16 ` Rob Herring
0 siblings, 0 replies; 130+ messages in thread
From: Rob Herring @ 2025-08-27 16:16 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
On Wed, Aug 27, 2025 at 10:39 AM Rob Herring <robh@kernel.org> wrote:
>
> On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
> >
> > Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> > only be accessible from those CPUs, and they may not be online.
> > Touching the hardware early is pointless as MPAM can't be used until
> > the system-wide common values for num_partid and num_pmg have been
> > discovered.
[...]
> > +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> > +{
> > + int err, num_ris = 0;
> > + const u32 *ris_idx_p;
> > + struct device_node *iter, *np;
> > +
> > + np = msc->pdev->dev.of_node;
> > + for_each_child_of_node(np, iter) {
>
> Use for_each_available_child_of_node_scoped()
>
> > + ris_idx_p = of_get_property(iter, "reg", NULL);
>
> This is broken on big endian and new users of of_get_property() are
> discouraged. Use of_property_read_reg().
Err, this is broken on little endian as the DT is big endian.
So this was obviously not tested as I'm confident you didn't test on BE.
Rob
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-08-27 16:19 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-27 16:19 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> When a CPU comes online, it may bring a newly accessible MSC with
> it. Only the default partid has its value reset by hardware, and
> even then the MSC might not have been reset since its config was
> previously dirtyied. e.g. Kexec.
>
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
>
> MSC are also reset when CPUs are taken offline to cover cases where
> firmware doesn't reset the MSC over reboot using UEFI, or kexec
> where there is no firmware involvement.
>
> If the configuration for a RIS has not been touched since it was
> brought online, it does not need resetting again.
>
> To reset, write the maximum values for all discovered controls.
>
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Last bitmap write will always be non-zero.
> * Dropped READ_ONCE() - teh value can no longer change.
> ---
> drivers/resctrl/mpam_devices.c | 121 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 8 +++
> 2 files changed, 129 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index bb62de6d3847..c1f01dd748ad 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -7,6 +7,7 @@
> #include <linux/atomic.h>
> #include <linux/arm_mpam.h>
> #include <linux/bitfield.h>
> +#include <linux/bitmap.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpu.h>
> #include <linux/cpumask.h>
> @@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> return 0;
> }
>
> +static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> +{
> + u32 num_words, msb;
> + u32 bm = ~0;
> + int i;
> +
> + lockdep_assert_held(&msc->part_sel_lock);
> +
> + if (wd == 0)
> + return;
> +
> + /*
> + * Write all ~0 to all but the last 32bit-word, which may
> + * have fewer bits...
> + */
> + num_words = DIV_ROUND_UP(wd, 32);
> + for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
> + __mpam_write_reg(msc, reg, bm);
> +
> + /*
> + * ....and then the last (maybe) partial 32bit word. When wd is a
> + * multiple of 32, msb should be 31 to write a full 32bit word.
> + */
> + msb = (wd - 1) % 32;
> + bm = GENMASK(msb, 0);
> + __mpam_write_reg(msc, reg, bm);
> +}
> +
> +static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +{
> + u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
> + struct mpam_msc *msc = ris->vmsc->msc;
> + struct mpam_props *rprops = &ris->props;
> +
> + mpam_assert_srcu_read_lock_held();
> +
> + mutex_lock(&msc->part_sel_lock);
> + __mpam_part_sel(ris->ris_idx, partid, msc);
> +
> + if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> + mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> +
> + if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> + mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> +
> + if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> + mpam_write_partsel_reg(msc, MBW_MIN, 0);
> +
> + if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> + mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
MPAMCFG_MBW_MAX_MAX can be used directly instead of bwa_fract.
> +
> + if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
> + mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
Shouldn't this reset to 0? STRIDEM1 is a cost.
> + mutex_unlock(&msc->part_sel_lock);
> +}
> +
> +static void mpam_reset_ris(struct mpam_msc_ris *ris)
> +{
> + u16 partid, partid_max;
> +
> + mpam_assert_srcu_read_lock_held();
> +
> + if (ris->in_reset_state)
> + return;
> +
> + spin_lock(&partid_max_lock);
> + partid_max = mpam_partid_max;
> + spin_unlock(&partid_max_lock);
> + for (partid = 0; partid < partid_max; partid++)
> + mpam_reset_ris_partid(ris, partid);
> +}
> +
> +static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> +{
> + int idx;
> + struct mpam_msc_ris *ris;
> +
> + mpam_assert_srcu_read_lock_held();
> +
> + mpam_mon_sel_outer_lock(msc);
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
> + mpam_reset_ris(ris);
> +
> + /*
> + * Set in_reset_state when coming online. The reset state
> + * for non-zero partid may be lost while the CPUs are offline.
> + */
> + ris->in_reset_state = online;
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> + mpam_mon_sel_outer_unlock(msc);
> +}
> +
> static int mpam_cpu_online(unsigned int cpu)
> {
> + int idx;
> + struct mpam_msc *msc;
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> + if (!cpumask_test_cpu(cpu, &msc->accessibility))
> + continue;
> +
> + if (atomic_fetch_inc(&msc->online_refs) == 0)
> + mpam_reset_msc(msc, true);
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> +
> return 0;
> }
>
> @@ -886,6 +994,19 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
>
> static int mpam_cpu_offline(unsigned int cpu)
> {
> + int idx;
> + struct mpam_msc *msc;
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> + if (!cpumask_test_cpu(cpu, &msc->accessibility))
> + continue;
> +
> + if (atomic_dec_and_test(&msc->online_refs))
> + mpam_reset_msc(msc, false);
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> +
> return 0;
> }
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a2b0ff411138..466d670a01eb 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -5,6 +5,7 @@
> #define MPAM_INTERNAL_H
>
> #include <linux/arm_mpam.h>
> +#include <linux/atomic.h>
> #include <linux/cpumask.h>
> #include <linux/io.h>
> #include <linux/llist.h>
> @@ -43,6 +44,7 @@ struct mpam_msc {
> struct pcc_mbox_chan *pcc_chan;
> u32 nrdy_usec;
> cpumask_t accessibility;
> + atomic_t online_refs;
>
> /*
> * probe_lock is only take during discovery. After discovery these
> @@ -248,6 +250,7 @@ struct mpam_msc_ris {
> u8 ris_idx;
> u64 idr;
> struct mpam_props props;
> + bool in_reset_state;
>
> cpumask_t affinity;
>
> @@ -267,6 +270,11 @@ struct mpam_msc_ris {
> extern struct srcu_struct mpam_srcu;
> extern struct list_head mpam_classes;
>
> +static inline void mpam_assert_srcu_read_lock_held(void)
> +{
> + WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
> +}
> +
> /* System wide partid/pmg values */
> extern u16 mpam_partid_max;
> extern u8 mpam_pmg_max;
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
@ 2025-08-27 16:22 ` Dave Martin
0 siblings, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-08-27 16:22 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:50PM +0000, James Morse wrote:
> From: Rob Herring <robh@kernel.org>
>
> The binding is designed around the assumption that an MSC will be a
> sub-block of something else such as a memory controller, cache controller,
> or IOMMU. However, it's certainly possible a design does not have that
> association or has a mixture of both, so the binding illustrates how we can
> support that with RIS child nodes.
>
> A key part of MPAM is we need to know about all of the MSCs in the system
> before it can be enabled. This drives the need for the genericish
> 'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
> until a h/w specific driver potentially enables the h/w.
I'll leave detailed review to other people for now, since I'm not so up
to speed on all things DT.
A few random comments, below.
[...]
> diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
[...]
> @@ -0,0 +1,200 @@
[...]
> +title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
> +
> +description: |
> + The Arm MPAM specification can be found here:
> +
> + https://developer.arm.com/documentation/ddi0598/latest
> +
> +maintainers:
> + - Rob Herring <robh@kernel.org>
> +
> +properties:
> + compatible:
> + items:
> + - const: arm,mpam-msc # Further details are discoverable
> + - const: arm,mpam-memory-controller-msc
There seems to be no clear statement about how these differ.
> + reg:
> + maxItems: 1
> + description: A memory region containing registers as defined in the MPAM
> + specification.
There seems to be no handling of PCC-based MSCs here. Should there be?
If this can be added later in a backwards-compatible way, I guess
that's not a problem (and this is what compatible strings are for, if
all else fails.)
An explicit statement that PCC is not supported here might be helpful,
though.
> + interrupts:
> + minItems: 1
> + items:
> + - description: error (optional)
> + - description: overflow (optional, only for monitoring)
> +
> + interrupt-names:
> + oneOf:
> + - items:
> + - enum: [ error, overflow ]
> + - items:
> + - const: error
> + - const: overflow
Yeugh. Is this really the only way to say "one or both of foo"?
(I don't know the answer to this -- though I can believe that it's
true. Perhaps just not describing this property is another option.
Many bindings seem not to bother.)
> +
> + arm,not-ready-us:
> + description: The maximum time in microseconds for monitoring data to be
> + accurate after a settings change. For more information, see the
> + Not-Ready (NRDY) bit description in the MPAM specification.
> +
> + numa-node-id: true # see NUMA binding
> +
> + '#address-cells':
> + const: 1
> +
> + '#size-cells':
> + const: 0
> +
> +patternProperties:
> + '^ris@[0-9a-f]$':
It this supposed to be '^ris@[0-9a-f]+$' ?
Currently MPAMF_IDR.RIS_MAX is only 4 bits in size and so cannot be
greater than 0xf. But it is not inconceivable that a future revision
of the architecture might enable more -- and the are 4 RES0 bits
looming over the RIS_MAX field, just waiting to be used...
(In any case, it feels wrong to try to enforce numeric bounds with a
regex, even in the cases where it happens to work straightforwardly.)
> + type: object
> + additionalProperties: false
> + description:
> + RIS nodes for each RIS in an MSC. These nodes are required for each RIS
The architectural term is "resource instance", not "RIS".
But "RIS nodes" is fine for describing the DT nodes, since we can call
them what we like, and "ris" is widely used inside the MPAM driver.
People writing DTs should not need to be familiar with the driver's
internal naming conventions, though.
(There are other instances, but I won't comment on them all
individually.)
> + implementing known MPAM controls
> +
> + properties:
> + compatible:
> + enum:
> + # Bulk storage for cache
Nit: What is "bulk storage"?
The MPAM spec just refers to "cache" or "cache memory".
> + - arm,mpam-cache
> + # Memory bandwidth
> + - arm,mpam-memory
> +
> + reg:
> + minimum: 0
> + maximum: 0xf
> +
> + cpus:
> + description:
> + Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
> + device's affinity is used.
> +
> + arm,mpam-device:
> + $ref: /schemas/types.yaml#/definitions/phandle
> + description:
> + By default, the MPAM enabled device associated with a RIS is the MSC's
Associated how? Is this the device where the physical resources
managed by the MSC are located?
> + parent node. It is possible for each RIS to be associated with different
> + devices in which case 'arm,mpam-device' should be used.
[...]
> +examples:
> + - |
> + L3: cache-controller@30000000 {
> + compatible = "arm,dsu-l3-cache", "cache";
> + cache-level = <3>;
> + cache-unified;
> +
> + ranges = <0x0 0x30000000 0x800000>;
> + #address-cells = <1>;
> + #size-cells = <1>;
> +
> + msc@10000 {
> + compatible = "arm,mpam-msc";
> +
> + /* CPU affinity implied by parent cache node's */
"node's" -> "nodes".
(or it this supposed to be in the singular -- i.e., the immediately
parent cache node only?)
Anyway, it looks like this is commenting on the "reg" property, which
doesn't seem right.
Is this commnent supposed instead to explain the omission of the "cpus"
property? If so, that should be made clearer.
> + reg = <0x10000 0x2000>;
> + interrupts = <1>, <2>;
> + interrupt-names = "error", "overflow";
> + arm,not-ready-us = <1>;
> + };
> + };
[...]
(Examples otherwise not reviewed in detail.)
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
2025-08-27 10:46 ` Dave Martin
@ 2025-08-27 17:11 ` James Morse
2025-08-28 14:08 ` Dave Martin
0 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-27 17:11 UTC (permalink / raw)
To: Dave Martin
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 11:46, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:42PM +0000, James Morse wrote:
>> The MPAM driver identifies caches by id for use with resctrl. It
>> needs to know the cache-id when probe-ing, but the value isn't set
>> in cacheinfo until device_initcall().
>>
>> Expose the code that generates the cache-id. The parts of the MPAM
>> driver that run early can use this to set up the resctrl structures
>> before cacheinfo is ready in device_initcall().
> Why can't the MPAM driver just consume the precomputed cache-id
> information?
Because it would need to wait until cacheinfo was ready, and it would still
need a way of getting the cache-id for caches where all the CPUs are offline.
The resctrl glue code has a waitqueue to wait for device_initcall_sync(), but that is
asynchronous to driver probing, its triggered by the schedule_work() from the cpuhp
callbacks. This bit is about the driver's use, which just gets probed whenever the core
code feels like it.
I toyed with always using cacheinfo for everything, and just waiting - but the MPAM driver
already has to parse the PPTT to find the information it needs on ACPI platforms, so the
wait would only happen on DT.
It seemed simpler to grab what the value would be, instead of waiting (or probe defer) -
especially as this is also needed for caches where all the CPUs are offline.
(I'll add the offline-cpus angle to the commit message)
> Possible reasons are that the MPAM driver probes too early,
yup,
> or that it
> must parse the PPTT directly (which is true) and needs to label caches
> consistently with the way the kernel does it.
It needs to match what will be exposed to user-space from cacheinfo.
This isn't about the PPTT, its the value that is generated for DT systems.
The driver has to know if its ACPI or DT to call the appropriate thing to get cache-ids
before cacheinfo is ready.
>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index 613410705a47..f6289d142ba9 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
>> #define arch_compact_of_hwid(_x) (_x)
>> #endif
>>
>> -static void cache_of_set_id(struct cacheinfo *this_leaf,
>> - struct device_node *cache_node)
>> +unsigned long cache_of_calculate_id(struct device_node *cache_node)
>> {
>> struct device_node *cpu;
>> - u32 min_id = ~0;
>> + unsigned long min_id = ~0UL;
> Why the change of type here?
This is a hang over from Rob's approach of making the cache-id 64 bit.
> This does mean that 0xffffffff can now be generated as a valid cache-id,
> but if that is necessary then this patch is also fixing a bug in the
> code -- but the commit message doesn't say anything about that.
>
> For a patch that is just exposing an internal result, it may be
> better to keep the original type. ~(u32)0 is already used as an
> exceptional value.
Yup, I'll fix that.
Thanks!
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-24 17:25 ` Krzysztof Kozlowski
@ 2025-08-27 17:11 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-27 17:11 UTC (permalink / raw)
To: Krzysztof Kozlowski, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Krzysztof,
On 24/08/2025 18:25, Krzysztof Kozlowski wrote:
> On 22/08/2025 17:29, James Morse wrote:
>> MPAM needs to know the size of a cache associated with a particular CPU.
>> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>>
>> Add a helper to do this.
>> ---
>> Changes since v1:
>
> You marked this as v1.
Oops - that should say RFC. I'll fix all those.
>> * Converted to kdoc.
>> * Simplified helper to use get_cpu_cacheinfo_level().
> Please use consistent subject prefixes. Look at previous patch subject
> prefix.
Presumably the previous patch in my series - this is a side effect of multiple branches
that were written at different times getting combined! I'll change it to 'cacheinfo:' as
that seems to be the most popular recently.
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-27 10:46 ` Dave Martin
@ 2025-08-27 17:11 ` James Morse
2025-08-28 14:10 ` Dave Martin
0 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-27 17:11 UTC (permalink / raw)
To: Dave Martin
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 11:46, Dave Martin wrote:
> Hi,
>
> On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
>> MPAM needs to know the size of a cache associated with a particular CPU.
>> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>>
>> Add a helper to do this.
>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 2dcbb69139e9..e12d6f2c6a57 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
>> return ci ? ci->id : -1;
>> }
>>
>> +/**
>> + * get_cpu_cacheinfo_size() - Get the size of the cache.
>> + * @cpu: The cpu that is associated with the cache.
>> + * @level: The level of the cache as seen by @cpu.
>> + *
>> + * Callers must hold the cpuhp lock.
>> + * Returns the cache-size on success, or 0 for an error.
>> + */
>
> Nit: Maybe use the wording
>
> cpuhp lock must be held.
>
> in the kerneldoc here, to match the other helpers it sits alongside.
>
> Otherwise, looks reasonable.
Sure,
>> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
>> +{
>> + struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
>> +
>> + return ci ? ci->size : 0;
>> +}
>> +
>
> Orphaned function?
>
> Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
> If so, this wouldn't just be dead code in this series.
Ah - I thought the MPAM driver was pulling this value in, but its the resctrl glue code.
I was trying to reduce the number of trees this touches - its probably best to kick this
into the next series that adds the resctrl code as its pretty trivial.
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-28 0:58 ` Fenghua Yu
0 siblings, 0 replies; 130+ messages in thread
From: Fenghua Yu @ 2025-08-28 0:58 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi, James,
On 8/22/25 08:30, James Morse wrote:
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
>
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
>
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
>
> Signed-off-by: James Morse <james.morse@arm.com>
[SNIP]
> @@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
>
> static void __destroy_component_cfg(struct mpam_component *comp)
> {
> + struct mpam_msc *msc;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> add_to_garbage(comp->cfg);
> + list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> + msc = vmsc->msc;
> +
> + mpam_mon_sel_outer_lock(msc);
> + if (mpam_mon_sel_inner_lock(msc)) {
> + list_for_each_entry(ris, &vmsc->ris, vmsc_list)
> + add_to_garbage(ris->mbwu_state);
> + mpam_mon_sel_inner_unlock(msc);
> + }
> + mpam_mon_sel_outer_lock(msc);
s/mpam_mon_sel_outer_lock(msc);/mpam_mon_sel_outer_unlock(msc);/
Or this will hit a dead lock.
[SNIP]
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-28 1:29 ` Fenghua Yu
2025-09-01 11:09 ` Dave Martin
1 sibling, 0 replies; 130+ messages in thread
From: Fenghua Yu @ 2025-08-28 1:29 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
Hi, James,
On 8/22/25 08:29, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
>
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
>
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented
> as different RIS. Re-combining these RIS allows their feature bits to
> be or-ed. This structure is not visible outside mpam_devices.c
>
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need
> to be configured the same as the hardware is not able to distribute the
> configuration.
>
> Add support for creating and destroying these structures.
>
> A gfp is passed as the structures may need creating when a new RIS entry
> is discovered when probing the MSC.
>
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * removed a pr_err() debug message that crept in.
> ---
> drivers/resctrl/mpam_devices.c | 488 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 91 ++++++
> include/linux/arm_mpam.h | 8 +-
> 3 files changed, 574 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
[SNIP]
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
> +{
> + struct mpam_vmsc *vmsc;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + vmsc = kzalloc(sizeof(*vmsc), gfp);
> + if (!comp)
s/if (!cmp)/if (!vmsc)/
> + return ERR_PTR(-ENOMEM);
> + init_garbage(vmsc);
> +
> + INIT_LIST_HEAD_RCU(&vmsc->ris);
> + INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> + vmsc->comp = comp;
> + vmsc->msc = msc;
> +
> + list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> + return vmsc;
> +}
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-08-28 10:11 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-28 10:11 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich, Zeng Heng
Hi James,
On 8/22/25 16:30, James Morse wrote:
> MPAM supports more features than are going to be exposed to resctrl.
> For partid other than 0, the reset values of these controls isn't
> known.
>
> Discover the rest of the features so they can be reset to avoid any
> side effects when resctrl is in use.
>
> PARTID narrowing allows MSC/RIS to support less configuration space than
> is usable. If this feature is found on a class of device we are likely
> to use, then reduce the partid_max to make it usable. This allows us
> to map a PARTID to itself.
>
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> CC: Zeng Heng <zengheng4@huawei.com>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 175 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 16 ++-
> 2 files changed, 189 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 8f6df2406c22..aedd743d6827 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -213,6 +213,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
> __mpam_part_sel_raw(partsel, msc);
> }
>
> +static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
> +{
> + u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
> + FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
> + MPAMCFG_PART_SEL_INTERNAL;
> +
> + __mpam_part_sel_raw(partsel, msc);
> +}
> +
> int mpam_register_requestor(u16 partid_max, u8 pmg_max)
> {
> int err = 0;
> @@ -743,10 +752,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> int err;
> struct mpam_msc *msc = ris->vmsc->msc;
> struct mpam_props *props = &ris->props;
> + struct mpam_class *class = ris->vmsc->comp->class;
>
> lockdep_assert_held(&msc->probe_lock);
> lockdep_assert_held(&msc->part_sel_lock);
>
> + /* Cache Capacity Partitioning */
> + if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
> + u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
> +
> + props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
> + if (props->cmax_wd &&
> + FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
> + mpam_set_feature(mpam_feat_cmax_softlim, props);
> +
> + if (props->cmax_wd &&
> + !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
> + mpam_set_feature(mpam_feat_cmax_cmax, props);
> +
> + if (props->cmax_wd &&
> + FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
> + mpam_set_feature(mpam_feat_cmax_cmin, props);
> +
> + props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
> +
> + if (props->cassoc_wd &&
> + FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
> + mpam_set_feature(mpam_feat_cmax_cassoc, props);
> + }
> +
> /* Cache Portion partitioning */
> if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
> u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
> @@ -769,6 +803,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
> if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
> mpam_set_feature(mpam_feat_mbw_max, props);
> +
> + if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
> + mpam_set_feature(mpam_feat_mbw_min, props);
> +
> + if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
> + mpam_set_feature(mpam_feat_mbw_prop, props);
> + }
> +
> + /* Priority partitioning */
> + if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
> + u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
> +
> + props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
> + if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
> + mpam_set_feature(mpam_feat_intpri_part, props);
> + if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
> + mpam_set_feature(mpam_feat_intpri_part_0_low, props);
> + }
> +
> + props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
> + if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
> + mpam_set_feature(mpam_feat_dspri_part, props);
> + if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
> + mpam_set_feature(mpam_feat_dspri_part_0_low, props);
> + }
> }
>
> /* Performance Monitoring */
> @@ -832,6 +891,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> */
> }
> }
> +
> + /*
> + * RIS with PARTID narrowing don't have enough storage for one
> + * configuration per PARTID. If these are in a class we could use,
> + * reduce the supported partid_max to match the number of intpartid.
> + * If the class is unknown, just ignore it.
> + */
> + if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
> + class->type != MPAM_CLASS_UNKNOWN) {
> + u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
> + u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
> +
> + mpam_set_feature(mpam_feat_partid_nrw, props);
> + msc->partid_max = min(msc->partid_max, partid_max);
> + }
> }
>
> static int mpam_msc_hw_probe(struct mpam_msc *msc)
> @@ -929,13 +1003,29 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> struct mpam_config *cfg)
> {
> + u32 pri_val = 0;
> + u16 cmax = MPAMCFG_CMAX_CMAX;
> u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
> struct mpam_msc *msc = ris->vmsc->msc;
> struct mpam_props *rprops = &ris->props;
> + u16 dspri = GENMASK(rprops->dspri_wd, 0);
> + u16 intpri = GENMASK(rprops->intpri_wd, 0);
>
> mutex_lock(&msc->part_sel_lock);
> __mpam_part_sel(ris->ris_idx, partid, msc);
>
> + if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
> + /* Update the intpartid mapping */
> + mpam_write_partsel_reg(msc, INTPARTID,
> + MPAMCFG_INTPARTID_INTERNAL | partid);
> +
> + /*
> + * Then switch to the 'internal' partid to update the
> + * configuration.
> + */
> + __mpam_intpart_sel(ris->ris_idx, partid, msc);
> + }
> +
> if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
> if (mpam_has_feature(mpam_feat_cpor_part, cfg))
> mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> @@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>
> if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
> mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
> +
> + if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
> + mpam_write_partsel_reg(msc, CMAX, cmax);
> +
> + if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
> + mpam_write_partsel_reg(msc, CMIN, 0);
Missing reset for cmax_cassoc. I wonder if it makes sense to have
separate enums for partitioning features, which require reset, and the rest.
> +
> + if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
> + mpam_has_feature(mpam_feat_dspri_part, rprops)) {
> + /* aces high? */
> + if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
> + intpri = 0;
> + if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
> + dspri = 0;
> +
> + if (mpam_has_feature(mpam_feat_intpri_part, rprops))
> + pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
> + if (mpam_has_feature(mpam_feat_dspri_part, rprops))
> + pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
> +
> + mpam_write_partsel_reg(msc, PRI, pri_val);
> + }
> +
> mutex_unlock(&msc->part_sel_lock);
> }
>
> @@ -1529,6 +1642,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
> return false;
> }
>
> +/* Any of these features mean the CMAX_WD field is valid. */
> +static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
> +{
> + if (mpam_has_feature(mpam_feat_cmax_cmax, props))
> + return true;
> + if (mpam_has_feature(mpam_feat_cmax_cmin, props))
> + return true;
> + return false;
> +}
> +
> #define MISMATCHED_HELPER(parent, child, helper, field, alias) \
> helper(parent) && \
> ((helper(child) && (parent)->field != (child)->field) || \
> @@ -1583,6 +1706,23 @@ static void __props_mismatch(struct mpam_props *parent,
> parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
> }
>
> + if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
> + parent->cmax_wd = child->cmax_wd;
> + } else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
> + cmax_wd, alias)) {
> + pr_debug("%s took the min cmax_wd\n", __func__);
> + parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
> + }
> +
> + if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
> + parent->cassoc_wd = child->cassoc_wd;
> + } else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
> + cassoc_wd, alias)) {
> + pr_debug("%s cleared cassoc_wd\n", __func__);
> + mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
> + parent->cassoc_wd = 0;
> + }
> +
> /* For num properties, take the minimum */
> if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
> parent->num_csu_mon = child->num_csu_mon;
> @@ -1600,6 +1740,41 @@ static void __props_mismatch(struct mpam_props *parent,
> parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
> }
>
> + if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
> + parent->intpri_wd = child->intpri_wd;
> + } else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
> + intpri_wd, alias)) {
> + pr_debug("%s took the min intpri_wd\n", __func__);
> + parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
> + }
> +
> + if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
> + parent->dspri_wd = child->dspri_wd;
> + } else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
> + dspri_wd, alias)) {
> + pr_debug("%s took the min dspri_wd\n", __func__);
> + parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
> + }
> +
> + /* TODO: alias support for these two */
> + /* {int,ds}pri may not have differing 0-low behaviour */
> + if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
> + (!mpam_has_feature(mpam_feat_intpri_part, child) ||
> + mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
> + mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
> + pr_debug("%s cleared intpri_part\n", __func__);
> + mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
> + mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
> + }
> + if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
> + (!mpam_has_feature(mpam_feat_dspri_part, child) ||
> + mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
> + mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
> + pr_debug("%s cleared dspri_part\n", __func__);
> + mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
> + mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
> + }
> +
> if (alias) {
> /* Merge features for aliased resources */
> parent->features |= child->features;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 70cba9f22746..23445aedbabd 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -157,16 +157,23 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> * When we compact the supported features, we don't care what they are.
> * Storing them as a bitmap makes life easy.
> */
> -typedef u16 mpam_features_t;
> +typedef u32 mpam_features_t;
>
> /* Bits for mpam_features_t */
> enum mpam_device_features {
> - mpam_feat_ccap_part = 0,
> + mpam_feat_cmax_softlim,
> + mpam_feat_cmax_cmax,
> + mpam_feat_cmax_cmin,
> + mpam_feat_cmax_cassoc,
> mpam_feat_cpor_part,
> mpam_feat_mbw_part,
> mpam_feat_mbw_min,
> mpam_feat_mbw_max,
> mpam_feat_mbw_prop,
> + mpam_feat_intpri_part,
> + mpam_feat_intpri_part_0_low,
> + mpam_feat_dspri_part,
> + mpam_feat_dspri_part_0_low,
> mpam_feat_msmon,
> mpam_feat_msmon_csu,
> mpam_feat_msmon_csu_capture,
> @@ -176,6 +183,7 @@ enum mpam_device_features {
> mpam_feat_msmon_mbwu_rwbw,
> mpam_feat_msmon_mbwu_hw_nrdy,
> mpam_feat_msmon_capt,
> + mpam_feat_partid_nrw,
> MPAM_FEATURE_LAST,
> };
> static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
> @@ -187,6 +195,10 @@ struct mpam_props {
> u16 cpbm_wd;
> u16 mbw_pbm_bits;
> u16 bwa_wd;
> + u16 cmax_wd;
> + u16 cassoc_wd;
> + u16 intpri_wd;
> + u16 dspri_wd;
> u16 num_csu_mon;
> u16 num_mbwu_mon;
> };
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
@ 2025-08-28 13:12 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-28 13:12 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
>
> While doing this, RIS entries that firmware didn't describe are create
> under MPAM_CLASS_UNKNOWN.
>
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 158 ++++++++++++++++++++++++++++++--
> drivers/resctrl/mpam_internal.h | 6 ++
> include/linux/arm_mpam.h | 14 +++
> 3 files changed, 171 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 9d6516f98acf..012e09e80300 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -6,6 +6,7 @@
> #include <linux/acpi.h>
> #include <linux/atomic.h>
> #include <linux/arm_mpam.h>
> +#include <linux/bitfield.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpu.h>
> #include <linux/cpumask.h>
> @@ -44,6 +45,15 @@ static u32 mpam_num_msc;
> static int mpam_cpuhp_state;
> static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>
> +/*
> + * The smallest common values for any CPU or MSC in the system.
> + * Generating traffic outside this range will result in screaming interrupts.
> + */
> +u16 mpam_partid_max;
> +u8 mpam_pmg_max;
> +static bool partid_max_init, partid_max_published;
> +static DEFINE_SPINLOCK(partid_max_lock);
> +
> /*
> * mpam is enabled once all devices have been probed from CPU online callbacks,
> * scheduled via this work_struct. If access to an MSC depends on a CPU that
> @@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>
> #define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
>
> +static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> + WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> + WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> + writel_relaxed(val, msc->mapped_hwpage + reg);
> +}
> +
> +static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> + lockdep_assert_held_once(&msc->part_sel_lock);
> + __mpam_write_reg(msc, reg, val);
> +}
> +#define mpam_write_partsel_reg(msc, reg, val) _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
> +
> +static u64 mpam_msc_read_idr(struct mpam_msc *msc)
> +{
> + u64 idr_high = 0, idr_low;
> +
> + lockdep_assert_held(&msc->part_sel_lock);
> +
> + idr_low = mpam_read_partsel_reg(msc, IDR);
> + if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
> + idr_high = mpam_read_partsel_reg(msc, IDR + 4);
> +
> + return (idr_high << 32) | idr_low;
> +}
> +
> +static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
> +{
> + lockdep_assert_held(&msc->part_sel_lock);
> +
> + mpam_write_partsel_reg(msc, PART_SEL, partsel);
> +}
> +
> +static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
> +{
> + u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
> + FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
> +
> + __mpam_part_sel_raw(partsel, msc);
> +}
> +
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
> +{
> + int err = 0;
> +
> + lockdep_assert_irqs_enabled();
> +
> + spin_lock(&partid_max_lock);
> + if (!partid_max_init) {
> + mpam_partid_max = partid_max;
> + mpam_pmg_max = pmg_max;
> + partid_max_init = true;
> + } else if (!partid_max_published) {
> + mpam_partid_max = min(mpam_partid_max, partid_max);
> + mpam_pmg_max = min(mpam_pmg_max, pmg_max);
Do we really need to reduce these maximum here? If, say, we add an SMMU
requester which supports fewer partids than the cpus don't we want to be
able to carry on using those partids from the cpus. In this case the
SMMU requestor can, without risk of error interrupts, just use all the
partids it supports.
> + } else {
> + /* New requestors can't lower the values */
> + if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
> + err = -EBUSY;
> + }
> + spin_unlock(&partid_max_lock);
> +
> + return err;
> +}
> +EXPORT_SYMBOL(mpam_register_requestor);
> +
> #define init_garbage(x) init_llist_node(&(x)->garbage.llist)
>
> static struct mpam_vmsc *
> @@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> + list_add_rcu(&ris->msc_list, &msc->ris);
>
> return 0;
> }
> @@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> return err;
> }
>
> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> + u8 ris_idx)
> +{
> + int err;
> + struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + if (!test_bit(ris_idx, msc->ris_idxs)) {
> + err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> + 0, 0, GFP_ATOMIC);
> + if (err)
> + return ERR_PTR(err);
> + }
> +
> + list_for_each_entry(ris, &msc->ris, msc_list) {
> + if (ris->ris_idx == ris_idx) {
> + found = ris;
> + break;
> + }
> + }
> +
> + return found;
> +}
> +
> static int mpam_msc_hw_probe(struct mpam_msc *msc)
> {
> u64 idr;
> - int err;
> + u16 partid_max;
> + u8 ris_idx, pmg_max;
> + struct mpam_msc_ris *ris;
>
> lockdep_assert_held(&msc->probe_lock);
>
> @@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
> pr_err_once("%s does not match MPAM architecture v1.x\n",
> dev_name(&msc->pdev->dev));
> - err = -EIO;
> - } else {
> - msc->probed = true;
> - err = 0;
> + mutex_unlock(&msc->part_sel_lock);
> + return -EIO;
> }
> +
> + idr = mpam_msc_read_idr(msc);
> mutex_unlock(&msc->part_sel_lock);
> + msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
> +
> + /* Use these values so partid/pmg always starts with a valid value */
> + msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> + msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +
> + for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
> + mutex_lock(&msc->part_sel_lock);
> + __mpam_part_sel(ris_idx, 0, msc);
> + idr = mpam_msc_read_idr(msc);
> + mutex_unlock(&msc->part_sel_lock);
> +
> + partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> + pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> + msc->partid_max = min(msc->partid_max, partid_max);
> + msc->pmg_max = min(msc->pmg_max, pmg_max);
> +
> + ris = mpam_get_or_create_ris(msc, ris_idx);
> + if (IS_ERR(ris))
> + return PTR_ERR(ris);
> + }
>
> - return err;
> + spin_lock(&partid_max_lock);
> + mpam_partid_max = min(mpam_partid_max, msc->partid_max);
> + mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> + spin_unlock(&partid_max_lock);
> +
> + msc->probed = true;
> +
> + return 0;
> }
>
> static int mpam_cpu_online(unsigned int cpu)
> @@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
>
> static void mpam_enable_once(void)
> {
> + /*
> + * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> + * longer change.
> + */
> + spin_lock(&partid_max_lock);
> + partid_max_published = true;
> + spin_unlock(&partid_max_lock);
> +
> mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>
> - pr_info("MPAM enabled\n");
> + printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
> + mpam_partid_max + 1, mpam_pmg_max + 1);
> }
>
> /*
> @@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
>
> return platform_driver_register(&mpam_msc_driver);
> }
> +/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
> subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a98cca08a2ef..a623f405ddd8 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -50,6 +50,8 @@ struct mpam_msc {
> */
> struct mutex probe_lock;
> bool probed;
> + u16 partid_max;
> + u8 pmg_max;
> unsigned long ris_idxs[128 / BITS_PER_LONG];
> u32 ris_max;
>
> @@ -148,6 +150,10 @@ struct mpam_msc_ris {
> extern struct srcu_struct mpam_srcu;
> extern struct list_head mpam_classes;
>
> +/* System wide partid/pmg values */
> +extern u16 mpam_partid_max;
> +extern u8 mpam_pmg_max;
> +
> /* Scheduled work callback to enable mpam once all MSC have been probed */
> void mpam_enable(struct work_struct *work);
>
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 406a77be68cb..8af93794c7a2 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
> int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> enum mpam_class_types type, u8 class_id, int component_id);
>
> +/**
> + * mpam_register_requestor() - Register a requestor with the MPAM driver
> + * @partid_max: The maximum PARTID value the requestor can generate.
> + * @pmg_max: The maximum PMG value the requestor can generate.
> + *
> + * Registers a requestor with the MPAM driver to ensure the chosen system-wide
> + * minimum PARTID and PMG values will allow the requestors features to be used.
> + *
> + * Returns an error if the registration is too late, and a larger PARTID/PMG
> + * value has been advertised to user-space. In this case the requestor should
> + * not use its MPAM features. Returns 0 on success.
> + */
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max);
> +
> #endif /* __LINUX_ARM_MPAM_H */
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-08-28 13:44 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-28 13:44 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Expand the probing support with the control and monitor types
> we can use with resctrl.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Made mpam_ris_hw_probe_hw_nrdy() more in C.
> * Added static assert on features bitmap size.
> ---
> drivers/resctrl/mpam_devices.c | 156 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 54 +++++++++++
> 2 files changed, 209 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 012e09e80300..290a04f8654f 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
>
> static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
> {
> - WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
> + WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
Update in the patch that introduced this line.
> WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
>
> return readl_relaxed(msc->mapped_hwpage + reg);
> @@ -131,6 +131,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
> }
> #define mpam_write_partsel_reg(msc, reg, val) _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
>
> +static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
> +{
> + mpam_mon_sel_lock_held(msc);
> + return __mpam_read_reg(msc, reg);
> +}
> +#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
> +
> +static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> + mpam_mon_sel_lock_held(msc);
> + __mpam_write_reg(msc, reg, val);
> +}
> +#define mpam_write_monsel_reg(msc, reg, val) _mpam_write_monsel_reg(msc, MSMON_##reg, val)
> +
> static u64 mpam_msc_read_idr(struct mpam_msc *msc)
> {
> u64 idr_high = 0, idr_low;
> @@ -643,6 +657,139 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> return found;
> }
>
> +/*
> + * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
> + * of NRDY, software can use this bit for any purpose" - so hardware might not
> + * implement this - but it isn't RES0.
> + *
> + * Try and see what values stick in this bit. If we can write either value,
> + * its probably not implemented by hardware.
> + */
> +static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
> +{
> + u32 now;
> + u64 mon_sel;
> + bool can_set, can_clear;
> + struct mpam_msc *msc = ris->vmsc->msc;
> +
> + if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
> + return false;
> +
> + mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
> + FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> + _mpam_write_monsel_reg(msc, mon_reg, mon_sel);
> +
> + _mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
> + now = _mpam_read_monsel_reg(msc, mon_reg);
> + can_set = now & MSMON___NRDY;
> +
> + _mpam_write_monsel_reg(msc, mon_reg, 0);
> + now = _mpam_read_monsel_reg(msc, mon_reg);
> + can_clear = !(now & MSMON___NRDY);
> + mpam_mon_sel_inner_unlock(msc);
> +
> + return (!can_set || !can_clear);
> +}
> +
> +#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg) \
> + _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
> +
> +static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> +{
> + int err;
> + struct mpam_msc *msc = ris->vmsc->msc;
> + struct mpam_props *props = &ris->props;
> +
> + lockdep_assert_held(&msc->probe_lock);
> + lockdep_assert_held(&msc->part_sel_lock);
> +
> + /* Cache Portion partitioning */
> + if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
> + u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
> +
> + props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
> + if (props->cpbm_wd)
> + mpam_set_feature(mpam_feat_cpor_part, props);
> + }
> +
> + /* Memory bandwidth partitioning */
> + if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
> + u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
> +
> + /* portion bitmap resolution */
> + props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
> + if (props->mbw_pbm_bits &&
> + FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
> + mpam_set_feature(mpam_feat_mbw_part, props);
> +
> + props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
> + if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
> + mpam_set_feature(mpam_feat_mbw_max, props);
> + }
> +
> + /* Performance Monitoring */
> + if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
> + u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
> +
> + /*
> + * If the firmware max-nrdy-us property is missing, the
> + * CSU counters can't be used. Should we wait forever?
> + */
> + err = device_property_read_u32(&msc->pdev->dev,
> + "arm,not-ready-us",
> + &msc->nrdy_usec);
> +
> + if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
> + u32 csumonidr;
> +
> + csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
> + props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
> + if (props->num_csu_mon) {
> + bool hw_managed;
> +
> + mpam_set_feature(mpam_feat_msmon_csu, props);
> +
> + /* Is NRDY hardware managed? */
> + mpam_mon_sel_outer_lock(msc);
> + hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
> + mpam_mon_sel_outer_unlock(msc);
> + if (hw_managed)
> + mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
> + }
> +
> + /*
> + * Accept the missing firmware property if NRDY appears
> + * un-implemented.
> + */
> + if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
> + pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
> + }
> + if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
> + bool hw_managed;
> + u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
> +
> + props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
> + if (props->num_mbwu_mon)
> + mpam_set_feature(mpam_feat_msmon_mbwu, props);
> +
> + if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
> + mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
> +
> + /* Is NRDY hardware managed? */
> + mpam_mon_sel_outer_lock(msc);
> + hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
> + mpam_mon_sel_outer_unlock(msc);
> + if (hw_managed)
> + mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
> +
> + /*
> + * Don't warn about any missing firmware property for
> + * MBWU NRDY - it doesn't make any sense!
> + */
> + }
> + }
> +}
> +
> static int mpam_msc_hw_probe(struct mpam_msc *msc)
> {
> u64 idr;
> @@ -663,6 +810,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>
> idr = mpam_msc_read_idr(msc);
> mutex_unlock(&msc->part_sel_lock);
> +
> msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>
> /* Use these values so partid/pmg always starts with a valid value */
> @@ -683,6 +831,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> ris = mpam_get_or_create_ris(msc, ris_idx);
> if (IS_ERR(ris))
> return PTR_ERR(ris);
> + ris->idr = idr;
> +
> + mutex_lock(&msc->part_sel_lock);
> + __mpam_part_sel(ris_idx, 0, msc);
> + mpam_ris_hw_probe(ris);
> + mutex_unlock(&msc->part_sel_lock);
> }
>
> spin_lock(&partid_max_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index c6f087f9fa7d..9f6cd4a68cce 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> lockdep_assert_preemption_enabled();
> }
>
> +/*
> + * When we compact the supported features, we don't care what they are.
> + * Storing them as a bitmap makes life easy.
> + */
> +typedef u16 mpam_features_t;
> +
> +/* Bits for mpam_features_t */
> +enum mpam_device_features {
> + mpam_feat_ccap_part = 0,
> + mpam_feat_cpor_part,
> + mpam_feat_mbw_part,
> + mpam_feat_mbw_min,
> + mpam_feat_mbw_max,
> + mpam_feat_mbw_prop,
> + mpam_feat_msmon,
> + mpam_feat_msmon_csu,
> + mpam_feat_msmon_csu_capture,
> + mpam_feat_msmon_csu_hw_nrdy,
> + mpam_feat_msmon_mbwu,
> + mpam_feat_msmon_mbwu_capture,
> + mpam_feat_msmon_mbwu_rwbw,
> + mpam_feat_msmon_mbwu_hw_nrdy,
> + mpam_feat_msmon_capt,
> + MPAM_FEATURE_LAST,
This isn't all the features or just the features supported by resctrl.
Just add them all in this patch?
> +};
> +static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
> +#define MPAM_ALL_FEATURES ((1 << MPAM_FEATURE_LAST) - 1)
Unused?
> +
> +struct mpam_props {
> + mpam_features_t features;
> +
> + u16 cpbm_wd;
> + u16 mbw_pbm_bits;
> + u16 bwa_wd;
> + u16 num_csu_mon;
> + u16 num_mbwu_mon;
> +};
> +
> +static inline bool mpam_has_feature(enum mpam_device_features feat,
> + struct mpam_props *props)
> +{
> + return (1 << feat) & props->features;
> +}
> +
> +static inline void mpam_set_feature(enum mpam_device_features feat,
> + struct mpam_props *props)
> +{
> + props->features |= (1 << feat);
> +}
> +
> struct mpam_class {
> /* mpam_components in this class */
> struct list_head components;
> @@ -175,6 +225,8 @@ struct mpam_vmsc {
> /* mpam_msc_ris in this vmsc */
> struct list_head ris;
>
> + struct mpam_props props;
> +
> /* All RIS in this vMSC are members of this MSC */
> struct mpam_msc *msc;
>
> @@ -186,6 +238,8 @@ struct mpam_vmsc {
>
> struct mpam_msc_ris {
> u8 ris_idx;
> + u64 idr;
> + struct mpam_props props;
>
> cpumask_t affinity;
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
2025-08-27 17:11 ` James Morse
@ 2025-08-28 14:08 ` Dave Martin
0 siblings, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-08-28 14:08 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi James,
On Wed, Aug 27, 2025 at 06:11:25PM +0100, James Morse wrote:
> Hi Dave,
>
> On 27/08/2025 11:46, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:42PM +0000, James Morse wrote:
> >> The MPAM driver identifies caches by id for use with resctrl. It
> >> needs to know the cache-id when probe-ing, but the value isn't set
> >> in cacheinfo until device_initcall().
> >>
> >> Expose the code that generates the cache-id. The parts of the MPAM
> >> driver that run early can use this to set up the resctrl structures
> >> before cacheinfo is ready in device_initcall().
>
> > Why can't the MPAM driver just consume the precomputed cache-id
> > information?
>
> Because it would need to wait until cacheinfo was ready, and it would still
> need a way of getting the cache-id for caches where all the CPUs are offline.
>
> The resctrl glue code has a waitqueue to wait for device_initcall_sync(), but that is
> asynchronous to driver probing, its triggered by the schedule_work() from the cpuhp
> callbacks. This bit is about the driver's use, which just gets probed whenever the core
> code feels like it.
>
> I toyed with always using cacheinfo for everything, and just waiting - but the MPAM driver
> already has to parse the PPTT to find the information it needs on ACPI platforms, so the
> wait would only happen on DT.
>
> It seemed simpler to grab what the value would be, instead of waiting (or probe defer) -
> especially as this is also needed for caches where all the CPUs are offline.
>
> (I'll add the offline-cpus angle to the commit message)
Ack
> > Possible reasons are that the MPAM driver probes too early,
>
> yup,
>
> > or that it
> > must parse the PPTT directly (which is true) and needs to label caches
> > consistently with the way the kernel does it.
>
> It needs to match what will be exposed to user-space from cacheinfo.
> This isn't about the PPTT, its the value that is generated for DT systems.
Right -- confused myself there. From the point of view of this series,
the usage scenario isn't clear at this point.
> The driver has to know if its ACPI or DT to call the appropriate thing to get cache-ids
> before cacheinfo is ready.
I see. This might be worth stating in the commit message.
> >> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> >> index 613410705a47..f6289d142ba9 100644
> >> --- a/drivers/base/cacheinfo.c
> >> +++ b/drivers/base/cacheinfo.c
> >> @@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
> >> #define arch_compact_of_hwid(_x) (_x)
> >> #endif
> >>
> >> -static void cache_of_set_id(struct cacheinfo *this_leaf,
> >> - struct device_node *cache_node)
> >> +unsigned long cache_of_calculate_id(struct device_node *cache_node)
> >> {
> >> struct device_node *cpu;
> >> - u32 min_id = ~0;
> >> + unsigned long min_id = ~0UL;
>
> > Why the change of type here?
>
> This is a hang over from Rob's approach of making the cache-id 64 bit.
Ah, right.
(I have assumed that 0xffffffff is never going to clash with a valid
value.)
> > This does mean that 0xffffffff can now be generated as a valid cache-id,
> > but if that is necessary then this patch is also fixing a bug in the
> > code -- but the commit message doesn't say anything about that.
> >
> > For a patch that is just exposing an internal result, it may be
> > better to keep the original type. ~(u32)0 is already used as an
> > exceptional value.
>
> Yup, I'll fix that.
OK -- it works either way, of course, but this should make the patch a
little less noisy.
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
2025-08-27 17:11 ` James Morse
@ 2025-08-28 14:10 ` Dave Martin
0 siblings, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-08-28 14:10 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Wed, Aug 27, 2025 at 06:11:43PM +0100, James Morse wrote:
> Hi Dave,
>
> On 27/08/2025 11:46, Dave Martin wrote:
> > Hi,
> >
> > On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
> >> MPAM needs to know the size of a cache associated with a particular CPU.
> >> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
> >>
> >> Add a helper to do this.
>
> >> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> >> index 2dcbb69139e9..e12d6f2c6a57 100644
> >> --- a/include/linux/cacheinfo.h
> >> +++ b/include/linux/cacheinfo.h
> >> @@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
> >> return ci ? ci->id : -1;
> >> }
> >>
> >> +/**
> >> + * get_cpu_cacheinfo_size() - Get the size of the cache.
> >> + * @cpu: The cpu that is associated with the cache.
> >> + * @level: The level of the cache as seen by @cpu.
> >> + *
> >> + * Callers must hold the cpuhp lock.
> >> + * Returns the cache-size on success, or 0 for an error.
> >> + */
> >
> > Nit: Maybe use the wording
> >
> > cpuhp lock must be held.
> >
> > in the kerneldoc here, to match the other helpers it sits alongside.
> >
> > Otherwise, looks reasonable.
>
> Sure,
>
>
> >> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> >> +{
> >> + struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
> >> +
> >> + return ci ? ci->size : 0;
> >> +}
> >> +
> >
> > Orphaned function?
> >
> > Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
> > If so, this wouldn't just be dead code in this series.
>
> Ah - I thought the MPAM driver was pulling this value in, but its the resctrl glue code.
> I was trying to reduce the number of trees this touches - its probably best to kick this
> into the next series that adds the resctrl code as its pretty trivial.
>
>
> Thanks,
>
> James
Sure, that also works.
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
2025-08-26 14:45 ` Ben Horgan
@ 2025-08-28 15:56 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:56 UTC (permalink / raw)
To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 26/08/2025 15:45, Ben Horgan wrote:
> The patch logic update makes sense to me. Just a nit.
>
> On 8/22/25 16:29, James Morse wrote:
>> The PPTT describes CPUs and caches, as well as processor containers.
>> The ACPI table for MPAM describes the set of CPUs that can access an MSC
>> with the UID of a processor container.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 54676e3d82dd..4791ca2bdfac 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
>> + * processor containers
>> + * @acpi_cpu_id: The UID of the processor container.
>> + * @cpus: The resulting CPU mask.
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
>> + * Container, they may exist purely to describe a Private resource. CPUs
>> + * have to be leaves, so a Processor Container is a non-leaf that has the
>> + * 'ACPI Processor ID valid' flag set.
>> + *
>> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
>> + */
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>> +{
>> + struct acpi_pptt_processor *cpu_node;
>> + struct acpi_table_header *table_hdr;
>> + struct acpi_subtable_header *entry;
>> + unsigned long table_end;
>> + acpi_status status;
>> + bool leaf_flag;
>> + u32 proc_sz;
>> +
>> + cpumask_clear(cpus);
>> +
>> + status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
>> + if (ACPI_FAILURE(status))
>> + return;
>> +
>> + table_end = (unsigned long)table_hdr + table_hdr->length;
>> + entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> + sizeof(struct acpi_table_pptt));
>> + proc_sz = sizeof(struct acpi_pptt_processor);
>> + while ((unsigned long)entry + proc_sz <= table_end) {
>> + cpu_node = (struct acpi_pptt_processor *)entry;
>> + if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>> + cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
>> + leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
> nit: Consider dropping the boolean leaf_flag and just using
> acpi_pptt_leaf_node() in the condition. The name leaf_flag is slightly
> overloaded to include the case when the acpi leaf flag is not supported
> and dropping it would make the code more succinct.
Sure, this is a hangover from the earlier cleanup you suggested. It's readable enough
without giving the result a name.
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
2025-08-27 10:48 ` Dave Martin
@ 2025-08-28 15:57 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
To: Dave Martin
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi Dave,
On 27/08/2025 11:48, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:44PM +0000, James Morse wrote:
>> The PPTT describes CPUs and caches, as well as processor containers.
>> The ACPI table for MPAM describes the set of CPUs that can access an MSC
>> with the UID of a processor container.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
> Nit: The motivation for the change is not clear here.
>
> I guess this boils down to the need to map the MSC topology information
> in the the ACPI MPAM table to a cpumask for each MSC.
>
> If so, a possible rearrangement and rewording might be, say:
>
> --8<--
>
> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT, to indicate the subset of CPUs and upstream cache topology
> that can access each MPAM Memory System Component (MSC).
>
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
>
> Add a helper to find the processor container by its id, then [...]
>
> -->8--
Thanks, that is clearer!
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 54676e3d82dd..4791ca2bdfac 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>> return NULL;
>> }
>>
>> +/**
>> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
>> + * @table_hdr: A reference to the PPTT table.
>> + * @parent_node: A pointer to the processor node in the @table_hdr.
>> + * @cpus: A cpumask to fill with the CPUs below @parent_node.
>> + *
>> + * Walks up the PPTT from every possible CPU to find if the provided
>> + * @parent_node is a parent of this CPU.
>> + */
>> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
>> + struct acpi_pptt_processor *parent_node,
>> + cpumask_t *cpus)
>> +{
>> + struct acpi_pptt_processor *cpu_node;
>> + u32 acpi_id;
>> + int cpu;
>> +
>> + cpumask_clear(cpus);
>> +
>> + for_each_possible_cpu(cpu) {
>> + acpi_id = get_acpi_id_for_cpu(cpu);
> ^ Presumably this can't fail?
It'll return something! This could only be a problem if this raced with a CPU becoming
impossible, and there is no mechanism to do that.
>> + cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
>> +
>> + while (cpu_node) {
>> + if (cpu_node == parent_node) {
>> + cpumask_set_cpu(cpu, cpus);
>> + break;
>> + }
>> + cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> + }
>> + }
>> +}
>> +
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
>> + * processor containers
> Nit: "containers" -> "container" ?
Fixed,
>> + * @acpi_cpu_id: The UID of the processor container.
>> + * @cpus: The resulting CPU mask.
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
>> + * Container, they may exist purely to describe a Private resource. CPUs
>> + * have to be leaves, so a Processor Container is a non-leaf that has the
>> + * 'ACPI Processor ID valid' flag set.
>
> (Revise this if dropping the leaf/non-leaf distinction -- see below.)
>
>> + *
>> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
>> + */
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>> +{
>> + struct acpi_pptt_processor *cpu_node;
>> + struct acpi_table_header *table_hdr;
>> + struct acpi_subtable_header *entry;
>> + unsigned long table_end;
>> + acpi_status status;
>> + bool leaf_flag;
>> + u32 proc_sz;
>> +
>> + cpumask_clear(cpus);
>> +
>> + status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
>> + if (ACPI_FAILURE(status))
>> + return;
> Is acpi_get_pptt() applicable here?
Oh, that is new, and would let me chuck the reference counting.
I guess this replaces Jonthan's magic table free'ing cleanup thing!
> (That function is not thread-safe, but then, perhaps most/all of these
> functions are not thread safe. If we are still on the boot CPU at this
> point (?) then this wouldn't be a concern.)
I think that relies on the first caller being from somewhere that can't race.
In this case its the architecture's smp_prepare_cpus() call to setup the acpi topology.
That is sufficiently early its not a concern.
>> +
>> + table_end = (unsigned long)table_hdr + table_hdr->length;
>> + entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> + sizeof(struct acpi_table_pptt));
>> + proc_sz = sizeof(struct acpi_pptt_processor);
>> + while ((unsigned long)entry + proc_sz <= table_end) {
>
> Ack that this matches the bounds check in functions that are already
> present.
>
>> + cpu_node = (struct acpi_pptt_processor *)entry;
>> + if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>> + cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
>> + leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
>> + if (!leaf_flag) {
>> + if (cpu_node->acpi_processor_id == acpi_cpu_id)
> Is there any need to distinguish processor containers from (leaf) CPU
> nodes, here? If not, dropping the distinction might simplify the code
> here (even if callers do not care).
In the namespace the object types are different, so I assumed they have their own UID
space. The PPTT holds both - hence the check for which kind of thing it is. The risk is
looking for processor-container-4 and finding CPU-4 instead...
The relevant ACPI bit is "8.4.2.1 Processor Container Device", its says:
| A processor container declaration must supply a _UID method returning an ID that is
| unique in the processor container hierarchy.
Which doesn't quite let me combine them here.
> Otherwise, maybe eliminate leaf_flag and collapse these into a single
> if(), as suggested by Ben [1].
>
>> + acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
>
> Can there ever be multiple matches?
>
> The possibility of duplicate processor IDs in the PPTT sounds weird to
> me, but then I'm not an ACPI expert.
Multiple processor-containers with the same ID? That would be a corrupt table.
acpi_pptt_get_child_cpus() then walks the tree again to find the CPUs below this
processor-container - those have a different kind of id.
> If there can only be a single match, though, then we may as well break
> out of the loop here, unless we want to be paranoid and report
> duplicates as an error -- but that would require extra implementation,
> so I'm not sure that would be worth it.
Hmmm, the PPTT node should map to only one processor or processor-container.
I'll chuck the break in.
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
2025-08-27 10:49 ` Dave Martin
@ 2025-08-28 15:57 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
To: Dave Martin
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi Dave,
On 27/08/2025 11:49, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:45PM +0000, James Morse wrote:
>> acpi_count_levels() passes the number of levels back via a pointer argument.
>> It also passes this to acpi_find_cache_level() as the starting_level, and
>> preserves this value as it walks up the cpu_node tree counting the levels.
>>
>> This means the caller must initialise 'levels' due to acpi_count_levels()
>> internals. The only caller acpi_get_cache_info() happens to have already
>> initialised levels to zero, which acpi_count_levels() depends on to get the
>> correct result.
>>
>> Two results are passed back from acpi_count_levels(), unlike split_levels,
>> levels is not optional.
>>
>> Split these two results up. The mandatory 'levels' is always returned,
>> which hides the internal details from the caller, and avoids having
>> duplicated initialisation in all callers. split_levels remains an
>> optional argument passed back.
>
> Nit: I found all this a bit hard to follow.
>
> This seems to boil down to:
>
> --8<--
>
> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter. The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
>
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
>
> Gid rid of the levels parameter, which has no remaining purpose.
>
> Fix acpi_get_cache_info() to match.
>
> -->8--
I've taken this instead,
> split_levels is orthogonal to this refactoring (as evinced by the diff).
> I think mentioning it in the commit message at all may just add to the
> confusion...
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 4791ca2bdfac..8f9b9508acba 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>> * levels and split cache levels (data/instruction).
>> * @table_hdr: Pointer to the head of the PPTT table
>> * @cpu_node: processor node we wish to count caches for
>> - * @levels: Number of levels if success.
>> * @split_levels: Number of split cache levels (data/instruction) if
>> - * success. Can by NULL.
>> + * success. Can be NULL.
>> *
>> + * Returns number of levels.
>
> Nit: the prevailing convention in this file would be
>
> Return: number of levels
>
> (I don't know whether kerneldoc cares.)
>
> Maybe also say "total number of levels" in place of "level", to make it
> clearer that the split levels (if any) are included in this count.
Sure,
>> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
>> if (!cpu_node)
>> return -ENOENT;
>>
>> - acpi_count_levels(table, cpu_node, levels, split_levels);
>> + *levels = acpi_count_levels(table, cpu_node, split_levels);
>>
>> pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
>> *levels, split_levels ? *split_levels : -1);
>
> Otherwise, looks reasonable to me.
>
> (But see my comments on the next patches re whether we really need this.)
It was enough fun to debug that I'd like to save anyone else the trouble!
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-23 12:14 ` Markus Elfring
@ 2025-08-28 15:57 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
To: Markus Elfring, linux-arm-kernel, linux-acpi, devicetree
Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
Shanker Donthineni, Shaopeng Tan, Sudeep Holla, Will Deacon,
Xin Hao
Hi Markus,
On 23/08/2025 13:14, Markus Elfring wrote:
> …
>> +++ b/include/linux/acpi.h
> …
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>> void acpi_table_init_complete (void);
>> int acpi_table_init (void);
> …
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
>> +
>> int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
> …
>
> How do you think about to offer the addition of such a special macro call
> by another separate update step?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.17-rc2#n81
As it goes via the same tree I don't think there is a strong reason either way.
Dave points out on an earlier patch that the PPTT code doesn't care about the reference
counting anyway, so this stuff can go.
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-27 9:25 ` Ben Horgan
@ 2025-08-28 15:57 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Dave Martin,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 27/08/2025 10:25, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>> void acpi_table_init_complete (void);
>> int acpi_table_init (void);
>>
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> + struct acpi_table_header *table;
>> + int status = acpi_get_table(signature, instance, &table);
>> +
>> + if (ACPI_FAILURE(status))
>> + return ERR_PTR(-ENOENT);
>> + return table;
>> +}
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> nit: Is it useful to change the condition from !IS_ERR(_T) to
> !IS_ERR_OR_NULL(_T)? This seems to be the common pattern. I do note that
> acpi_put_table() can take NULL, so there is no real danger.
If it's the common pattern, sure.
But this code got dropped as Dave pointed out the PPTT doesn't care about the reference
counting anyway, its acpi_get_pptt() helper just uses the same reference for everything.
This might come back for the MPAM driver..
>> +
>> int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>> int __init_or_acpilib acpi_table_parse_entries(char *id,
>> unsigned long table_size, int entry_id,
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
2025-08-27 10:50 ` Dave Martin
@ 2025-08-28 15:58 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:58 UTC (permalink / raw)
To: Dave Martin
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 11:50, Dave Martin wrote:
> Hi,
>
> On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().
> Does this mean that the early secondaries must be spread out across the
> whole topology so that everything can be probed?
>
> (i.e., a random subset is no good?)
For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
associated with each cache needs to be online. Random is fine - provided you get lucky.
> If so, is this documented somewhere, such as in booting.rst?
booting.rst is for the bootloader.
Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
make much of a difference.
The concern is that if resctrl gets probed after user-space has started, whatever
user-space service is supposed to set it up will have concluded its not supported. Working
with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
enough so that every piece of hardware is reachable.
> Maybe this is not a new requirement -- it's not an area that I'm very
> familiar with.
Hard to say - its a potentially surprising side effect of glomming OS accessible registers
onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).
I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
change while they are offline. (presumably due to RAS or a firmware update)
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 8f9b9508acba..660457644a5b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>> return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>> ACPI_PPTT_ACPI_IDENTICAL);
>> }
>> +
>> +/**
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>
> Nit: doesn't exist or its revision is too old.
... its not old, but there is no published spec for that revision... unsupported?
>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> + u32 acpi_cpu_id;
>> + int level, cpu, num_levels;
>> + struct acpi_pptt_cache *cache;
>> + struct acpi_pptt_cache_v1 *cache_v1;
>> + struct acpi_pptt_processor *cpu_node;
>> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> acpi_get_pptt() ? (See comment on patch 3.)
Yup,
> Comments there also suggest that the acpi_put_table() may be
> unnecessary, at least on some paths.
>
> I haven't tried to understand the ins and outs of this.
It's grabbing one reference and using it for everything, because it needs to 'map' the
table in atomic context due to cpuhp, but can't.
Given how frequently its used, there is no problem just leaving it mapped.
>> +
>> + if (IS_ERR(table))
>> + return PTR_ERR(table);
>> +
>> + if (table->revision < 3)
>> + return -ENOENT;
>> +
>> + /*
>> + * If we found the cache first, we'd still need to walk from each CPU
>> + * to find the level...
>> + */
> ^ Possibly confusing comment? The cache id is the starting point for
> calling this function. Is there a world in which we are at this point
> without first having found the cache node?
>
> (If the comment is just a restatement of part of the kerneldoc
> description, maybe just drop it.)
It's describing the alternate world where the table is searched to find the cache first,
but then we'd still need to walk the table another NR_CPUs times, which can't be avoided.
I'll drop it - it was justifying why its done this way round...
>> + for_each_possible_cpu(cpu) {
>> + acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> + cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> + if (!cpu_node)
>> + return -ENOENT;
>> + num_levels = acpi_count_levels(table, cpu_node, NULL);
>
> Is the initial call to acpi_count_levels() really needed here?
>
> It feels a bit like we end up enumerating the whole topology two or
> three times here; once to count how many levels there are, and then
> again to examine the nodes, and once more inside acpi_find_cache_node().
>
> Why can't we just walk until we run out of levels?
This is looking for a unified cache - and we don't know where those start.
We could walk the first 100 caches, and stop once we start getting unified caches, then
they stop again ... but this seemed simpler.
> I may be missing some details of how these functions interact -- if
> this is only run at probe time, compact, well-factored code is
> more important than making things as fast as possible.
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>
> [...]
>
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>> void acpi_table_init_complete (void);
>> int acpi_table_init (void);
>>
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> + struct acpi_table_header *table;
>> + int status = acpi_get_table(signature, instance, &table);
>> +
>> + if (ACPI_FAILURE(status))
>> + return ERR_PTR(-ENOENT);
>> + return table;
>> +}
> This feels like something that ought to exist already. If not, why
> not? If so, are there open-coded versions of this spread around the
> ACPI tree that should be ported to use it?
It's a cleanup idiom helper that lets the compiler do this automagically - but its moot as
its not going to be needed in the pptt because of the acpi_get_pptt() thing.
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
2025-08-27 10:53 ` Dave Martin
@ 2025-08-28 15:58 ` James Morse
0 siblings, 0 replies; 130+ messages in thread
From: James Morse @ 2025-08-28 15:58 UTC (permalink / raw)
To: Dave Martin
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi Dave,
On 27/08/2025 11:53, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:47PM +0000, James Morse wrote:
>> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>>
>> The driver needs to know which CPUs are associated with the cache,
>> the CPUs may not all be online, so cacheinfo does not have the
>> information.
>
> Nit: cacheinfo lacking the information is not a consequence of the
> driver needing it.
>
> Maybe split the sentence:
>
> -> "[...] associated with the cache. The CPUs may not [...]"
Sure,
>> Add a helper to pull this information out of the PPTT.
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 660457644a5b..cb93a9a7f9b6 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>>
>> return -ENOENT;
>> }
>> +
>> +/**
>> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
>> + * specified cache
>> + * @cache_id: The id field of the unified cache
>> + * @cpus: Where to build the cpumask
>> + *
>> + * Determine which CPUs are below this cache in the PPTT. This allows the property
>> + * to be found even if the CPUs are offline.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
>> + */
>> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
>> +{
>> + u32 acpi_cpu_id;
>> + int level, cpu, num_levels;
>> + struct acpi_pptt_cache *cache;
>> + struct acpi_pptt_cache_v1 *cache_v1;
>> + struct acpi_pptt_processor *cpu_node;
>> + struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
>> +
>> + cpumask_clear(cpus);
>> +
>> + if (IS_ERR(table))
>> + return -ENOENT;
>> +
>> + if (table->revision < 3)
>> + return -ENOENT;
>> +
>> + /*
>> + * If we found the cache first, we'd still need to walk from each cpu.
>> + */
>> + for_each_possible_cpu(cpu) {
>> + acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> + cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> + if (!cpu_node)
>> + return 0;
>> + num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> + /* Start at 1 for L1 */
>> + for (level = 1; level <= num_levels; level++) {
>> + cache = acpi_find_cache_node(table, acpi_cpu_id,
>> + ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> + level, &cpu_node);
>> + if (!cache)
>> + continue;
>> +
>> + cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> + cache,
>> + sizeof(struct acpi_pptt_cache));
>> +
>> + if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> + cache_v1->cache_id == cache_id)
>> + cpumask_set_cpu(cpu, cpus);
> Again, it feels like we are repeating the same walk multiple times to
> determine how deep the table is (on which point the table is self-
> describing anyway), and then again to derive some static property, and
> then we are then doing all of that work multiple times to derive
> different static properties, etc.
>
> Can we not just walk over the tables once and stash the derived
> properties somewhere?
That is possible - but its a more invasive change to the PPTT parsing code.
Before the introduction of the leaf flag, the search for a processor also included a
search to check if the discovered node was a leaf.
I think this is trading time - walking over the table multiple times, against the memory
you'd need to de-serialise the tree to find the necessary properties quickly. I think the
reason Jeremy L went this way was because there may never be another request into this
code, so being ready with a quick answer was a waste of memory.
MPAM doesn't change this - all these things are done up front during driver probing, and
the values are cached by the driver.
> I'm still getting my head around this parsing code, so I'm not saying
> that the approach is incorrect here -- just wondering whether there is
> a way to make it simpler.
It's walked at boot, and on cpu-hotplug. Neither are particularly performance critical.
I agree that as platforms get bigger, there will be a tipping point ... I don't think
anyone has complained yet!
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
2025-08-27 8:53 ` Ben Horgan
@ 2025-08-28 15:58 ` James Morse
2025-08-29 8:20 ` Ben Horgan
0 siblings, 1 reply; 130+ messages in thread
From: James Morse @ 2025-08-28 15:58 UTC (permalink / raw)
To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 27/08/2025 09:53, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> The bulk of the MPAM driver lives outside the arch code because it
>> largely manages MMIO devices that generate interrupts. The driver
>> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
>> platforms, that is where the Kconfig option makes the most sense.
>>
>> This Kconfig option will later be used by the arch code to enable
>> or disable the MPAM context-switch code, and registering the CPUs
>> properties with the MPAM driver.
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index e9bbfacc35a6..658e47fc0c5a 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>> range of input addresses.
>>
>> +config ARM64_MPAM
>> + bool "Enable support for MPAM"
>> + help
>> + Memory Partitioning and Monitoring is an optional extension
>> + that allows the CPUs to mark load and store transactions with
>> + labels for partition-id and performance-monitoring-group.
>> + System components, such as the caches, can use the partition-id
>> + to apply a performance policy. MPAM monitors can use the
>> + partition-id and performance-monitoring-group to measure the
>> + cache occupancy or data throughput.
>> +
>> + Use of this extension requires CPU support, support in the
>> + memory system components (MSC), and a description from firmware
>> + of where the MSC are in the address space.
>> +
>> + MPAM is exposed to user-space via the resctrl pseudo filesystem.
>> +
>> endmenu # "ARMv8.4 architectural features"
> Should this be moved to "ARMv8.2 architectural features" rather than the
> 8.4 menu? In the arm reference manual, version L.b, I see FEAT_MPAM
> listed in the section A2.2.3.1 Features added to the Armv8.2 extension
> in later releases.
Hmmm, I don't think we've done that anywhere else. I'm only aware of one v8.2 platform
that had it, and those are not widely available. As it was a headline v8.4 feature I'd
prefer to keep it there.
I think its more confusing to put it under v8.2!
Thanks,
James
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-08-28 16:13 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-28 16:13 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Resetting RIS entries from the cpuhp callback is easy as the
> callback occurs on the correct CPU. This won't be true for any other
> caller that wants to reset or configure an MSC.
>
> Add a helper that schedules the provided function if necessary.
> Prevent the cpuhp callbacks from changing the MSC state by taking the
> cpuhp lock.
At first, I thought this was referring to something done in the patch.
Consider changing to something like:
Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.
Regardless, this looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
> 1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index c1f01dd748ad..759244966736 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -906,20 +906,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> mutex_unlock(&msc->part_sel_lock);
> }
>
> -static void mpam_reset_ris(struct mpam_msc_ris *ris)
> +/*
> + * Called via smp_call_on_cpu() to prevent migration, while still being
> + * pre-emptible.
> + */
> +static int mpam_reset_ris(void *arg)
> {
> u16 partid, partid_max;
> + struct mpam_msc_ris *ris = arg;
>
> mpam_assert_srcu_read_lock_held();
>
> if (ris->in_reset_state)
> - return;
> + return 0;
>
> spin_lock(&partid_max_lock);
> partid_max = mpam_partid_max;
> spin_unlock(&partid_max_lock);
> for (partid = 0; partid < partid_max; partid++)
> mpam_reset_ris_partid(ris, partid);
> +
> + return 0;
> +}
> +
> +/*
> + * Get the preferred CPU for this MSC. If it is accessible from this CPU,
> + * this CPU is preferred. This can be preempted/migrated, it will only result
> + * in more work.
> + */
> +static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
> +{
> + int cpu = raw_smp_processor_id();
> +
> + if (cpumask_test_cpu(cpu, &msc->accessibility))
> + return cpu;
> +
> + return cpumask_first_and(&msc->accessibility, cpu_online_mask);
> +}
> +
> +static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
> +{
> + lockdep_assert_irqs_enabled();
> + lockdep_assert_cpus_held();
> + mpam_assert_srcu_read_lock_held();
> +
> + return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
> }
>
> static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> @@ -932,7 +963,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> mpam_mon_sel_outer_lock(msc);
> idx = srcu_read_lock(&mpam_srcu);
> list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
> - mpam_reset_ris(ris);
> + mpam_touch_msc(msc, &mpam_reset_ris, ris);
>
> /*
> * Set in_reset_state when coming online. The reset state
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-08-28 16:13 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-28 16:13 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> When CPUs come online the original configuration should be restored.
> Once the maximum partid is known, allocate an configuration array for
> each component, and reprogram each RIS configuration from this.
>
> The MPAM spec describes how multiple controls can interact. To prevent
> this happening by accident, always reset controls that don't have a
> valid configuration. This allows the same helper to be used for
> configuration and reset.
What in particular are you worried about here? It does seem a bit
wasteful that to update a single control in a ris all the controls in
that ris are updated. This is needed for reset and restore but do we
really want if we are just changing one control, e.g. the cache portion
bitmap.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Added a comment about the ordering around max_partid.
> * Allocate configurations after interrupts are registered to reduce churn.
> * Added mpam_assert_partid_sizes_fixed();
> ---
> drivers/resctrl/mpam_devices.c | 253 +++++++++++++++++++++++++++++---
> drivers/resctrl/mpam_internal.h | 26 +++-
> 2 files changed, 251 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index b424af666b1e..8f6df2406c22 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -112,6 +112,16 @@ LIST_HEAD(mpam_classes);
> /* List of all objects that can be free()d after synchronise_srcu() */
> static LLIST_HEAD(mpam_garbage);
>
> +/*
> + * Once mpam is enabled, new requestors cannot further reduce the available
> + * partid. Assert that the size is fixed, and new requestors will be turned
> + * away.
> + */
> +static void mpam_assert_partid_sizes_fixed(void)
> +{
> + WARN_ON_ONCE(!partid_max_published);
> +}
> +
> static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
> {
> WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> @@ -374,12 +384,16 @@ static void mpam_class_destroy(struct mpam_class *class)
> add_to_garbage(class);
> }
>
> +static void __destroy_component_cfg(struct mpam_component *comp);
> +
> static void mpam_comp_destroy(struct mpam_component *comp)
> {
> struct mpam_class *class = comp->class;
>
> lockdep_assert_held(&mpam_list_lock);
>
> + __destroy_component_cfg(comp);
> +
> list_del_rcu(&comp->class_list);
> add_to_garbage(comp);
>
> @@ -911,51 +925,90 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> __mpam_write_reg(msc, reg, bm);
> }
>
> -static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +/* Called via IPI. Call while holding an SRCU reference */
> +static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> + struct mpam_config *cfg)
> {
> u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
> struct mpam_msc *msc = ris->vmsc->msc;
> struct mpam_props *rprops = &ris->props;
>
> - mpam_assert_srcu_read_lock_held();
> -
> mutex_lock(&msc->part_sel_lock);
> __mpam_part_sel(ris->ris_idx, partid, msc);
>
> - if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> - mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> + if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
> + if (mpam_has_feature(mpam_feat_cpor_part, cfg))
> + mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> + else
> + mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
> + rprops->cpbm_wd);
> + }
>
> - if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> - mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> + if (mpam_has_feature(mpam_feat_mbw_part, rprops)) {
> + if (mpam_has_feature(mpam_feat_mbw_part, cfg))
> + mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
> + else
> + mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
> + rprops->mbw_pbm_bits);
> + }
>
> if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> mpam_write_partsel_reg(msc, MBW_MIN, 0);
>
> - if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> - mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
> + if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
> + if (mpam_has_feature(mpam_feat_mbw_max, cfg))
> + mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
> + else
> + mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
> + }
>
> if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
> mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
> mutex_unlock(&msc->part_sel_lock);
> }
>
> +struct reprogram_ris {
> + struct mpam_msc_ris *ris;
> + struct mpam_config *cfg;
> +};
> +
> +/* Call with MSC lock held */
> +static int mpam_reprogram_ris(void *_arg)
> +{
> + u16 partid, partid_max;
> + struct reprogram_ris *arg = _arg;
> + struct mpam_msc_ris *ris = arg->ris;
> + struct mpam_config *cfg = arg->cfg;
> +
> + if (ris->in_reset_state)
> + return 0;
> +
> + spin_lock(&partid_max_lock);
> + partid_max = mpam_partid_max;
> + spin_unlock(&partid_max_lock);
> + for (partid = 0; partid <= partid_max; partid++)
> + mpam_reprogram_ris_partid(ris, partid, cfg);
> +
> + return 0;
> +}
> +
> /*
> * Called via smp_call_on_cpu() to prevent migration, while still being
> * pre-emptible.
> */
> static int mpam_reset_ris(void *arg)
> {
> - u16 partid, partid_max;
> struct mpam_msc_ris *ris = arg;
> + struct reprogram_ris reprogram_arg;
> + struct mpam_config empty_cfg = { 0 };
>
> if (ris->in_reset_state)
> return 0;
>
> - spin_lock(&partid_max_lock);
> - partid_max = mpam_partid_max;
> - spin_unlock(&partid_max_lock);
> - for (partid = 0; partid < partid_max; partid++)
> - mpam_reset_ris_partid(ris, partid);
> + reprogram_arg.ris = ris;
> + reprogram_arg.cfg = &empty_cfg;
> +
> + mpam_reprogram_ris(&reprogram_arg);
>
> return 0;
> }
> @@ -986,13 +1039,11 @@ static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
>
> static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> {
> - int idx;
> struct mpam_msc_ris *ris;
>
> mpam_assert_srcu_read_lock_held();
>
> mpam_mon_sel_outer_lock(msc);
> - idx = srcu_read_lock(&mpam_srcu);
> list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
> mpam_touch_msc(msc, &mpam_reset_ris, ris);
>
> @@ -1002,10 +1053,42 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> */
> ris->in_reset_state = online;
> }
> - srcu_read_unlock(&mpam_srcu, idx);
> mpam_mon_sel_outer_unlock(msc);
> }
>
> +static void mpam_reprogram_msc(struct mpam_msc *msc)
> +{
> + u16 partid;
> + bool reset;
> + struct mpam_config *cfg;
> + struct mpam_msc_ris *ris;
> +
> + /*
> + * No lock for mpam_partid_max as partid_max_published has been
> + * set by mpam_enabled(), so the values can no longer change.
> + */
> + mpam_assert_partid_sizes_fixed();
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_rcu(ris, &msc->ris, msc_list) {
> + if (!mpam_is_enabled() && !ris->in_reset_state) {
> + mpam_touch_msc(msc, &mpam_reset_ris, ris);
> + ris->in_reset_state = true;
> + continue;
> + }
> +
> + reset = true;
> + for (partid = 0; partid <= mpam_partid_max; partid++) {
> + cfg = &ris->vmsc->comp->cfg[partid];
> + if (cfg->features)
> + reset = false;
> +
> + mpam_reprogram_ris_partid(ris, partid, cfg);
> + }
> + ris->in_reset_state = reset;
> + }
> +}
> +
> static void _enable_percpu_irq(void *_irq)
> {
> int *irq = _irq;
> @@ -1027,7 +1110,7 @@ static int mpam_cpu_online(unsigned int cpu)
> _enable_percpu_irq(&msc->reenable_error_ppi);
>
> if (atomic_fetch_inc(&msc->online_refs) == 0)
> - mpam_reset_msc(msc, true);
> + mpam_reprogram_msc(msc);
> }
> srcu_read_unlock(&mpam_srcu, idx);
>
> @@ -1807,6 +1890,45 @@ static void mpam_unregister_irqs(void)
> cpus_read_unlock();
> }
>
> +static void __destroy_component_cfg(struct mpam_component *comp)
> +{
> + add_to_garbage(comp->cfg);
> +}
> +
> +static int __allocate_component_cfg(struct mpam_component *comp)
> +{
> + mpam_assert_partid_sizes_fixed();
> +
> + if (comp->cfg)
> + return 0;
> +
> + comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
> + if (!comp->cfg)
> + return -ENOMEM;
> + init_garbage(comp->cfg);
> +
> + return 0;
> +}
> +
> +static int mpam_allocate_config(void)
> +{
> + int err = 0;
> + struct mpam_class *class;
> + struct mpam_component *comp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_for_each_entry(class, &mpam_classes, classes_list) {
> + list_for_each_entry(comp, &class->components, class_list) {
> + err = __allocate_component_cfg(comp);
> + if (err)
> + return err;
> + }
> + }
> +
> + return 0;
> +}
> +
> static void mpam_enable_once(void)
> {
> int err;
> @@ -1826,12 +1948,21 @@ static void mpam_enable_once(void)
> */
> cpus_read_lock();
> mutex_lock(&mpam_list_lock);
> - mpam_enable_merge_features(&mpam_classes);
> + do {
> + mpam_enable_merge_features(&mpam_classes);
>
> - err = mpam_register_irqs();
> - if (err)
> - pr_warn("Failed to register irqs: %d\n", err);
> + err = mpam_register_irqs();
> + if (err) {
> + pr_warn("Failed to register irqs: %d\n", err);
> + break;
> + }
>
> + err = mpam_allocate_config();
> + if (err) {
> + pr_err("Failed to allocate configuration arrays.\n");
> + break;
> + }
> + } while (0);
> mutex_unlock(&mpam_list_lock);
> cpus_read_unlock();
>
> @@ -1856,6 +1987,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
>
> might_sleep();
> lockdep_assert_cpus_held();
> + mpam_assert_partid_sizes_fixed();
> +
> + memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
>
> idx = srcu_read_lock(&mpam_srcu);
> list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> @@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
> mpam_enable_once();
> }
>
> +struct mpam_write_config_arg {
> + struct mpam_msc_ris *ris;
> + struct mpam_component *comp;
> + u16 partid;
> +};
> +
> +static int __write_config(void *arg)
> +{
> + struct mpam_write_config_arg *c = arg;
> +
> + mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
> +
> + return 0;
> +}
> +
> +#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
> + if (mpam_has_feature(feature, newcfg) && \
> + (newcfg)->member != (cfg)->member) { \
> + (cfg)->member = (newcfg)->member; \
> + cfg->features |= (1 << feature); \
> + \
> + (changes) |= (1 << feature); \
> + } \
> +} while (0)
> +
> +static mpam_features_t mpam_update_config(struct mpam_config *cfg,
> + const struct mpam_config *newcfg)
> +{
> + mpam_features_t changes = 0;
> +
> + maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
> + maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
> + maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
> +
> + return changes;
> +}
> +
> +/* TODO: split into write_config/sync_config */
> +/* TODO: add config_dirty bitmap to drive sync_config */
Any changes to come for these TODO comments?
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> + struct mpam_config *cfg)
> +{
> + struct mpam_write_config_arg arg;
> + struct mpam_msc_ris *ris;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc *msc;
> + int idx;
> +
> + lockdep_assert_cpus_held();
> +
> + /* Don't pass in the current config! */
> + WARN_ON_ONCE(&comp->cfg[partid] == cfg);
> +
> + if (!mpam_update_config(&comp->cfg[partid], cfg))
> + return 0;
> +
> + arg.comp = comp;
> + arg.partid = partid;
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> + msc = vmsc->msc;
> +
> + list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> + arg.ris = ris;
> + mpam_touch_msc(msc, __write_config, &arg);
> + }
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> +
> + return 0;
> +}
> +
> /*
> * MSC that are hidden under caches are not created as platform devices
> * as there is no cache driver. Caches are also special-cased in
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 3476ee97f8ac..70cba9f22746 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -191,11 +191,7 @@ struct mpam_props {
> u16 num_mbwu_mon;
> };
>
> -static inline bool mpam_has_feature(enum mpam_device_features feat,
> - struct mpam_props *props)
> -{
> - return (1 << feat) & props->features;
> -}
> +#define mpam_has_feature(_feat, x) ((1 << (_feat)) & (x)->features)
>
> static inline void mpam_set_feature(enum mpam_device_features feat,
> struct mpam_props *props)
> @@ -226,6 +222,17 @@ struct mpam_class {
> struct mpam_garbage garbage;
> };
>
> +struct mpam_config {
> + /* Which configuration values are valid. 0 is used for reset */
> + mpam_features_t features;
> +
> + u32 cpbm;
> + u32 mbw_pbm;
> + u16 mbw_max;
> +
> + struct mpam_garbage garbage;
> +};
> +
> struct mpam_component {
> u32 comp_id;
>
> @@ -234,6 +241,12 @@ struct mpam_component {
>
> cpumask_t affinity;
>
> + /*
> + * Array of configuration values, indexed by partid.
> + * Read from cpuhp callbacks, hold the cpuhp lock when writing.
> + */
> + struct mpam_config *cfg;
> +
> /* member of mpam_class:components */
> struct list_head class_list;
>
> @@ -298,6 +311,9 @@ extern u8 mpam_pmg_max;
> void mpam_enable(struct work_struct *work);
> void mpam_disable(struct work_struct *work);
>
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> + struct mpam_config *cfg);
> +
> int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> cpumask_t *affinity);
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-08-28 16:14 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-28 16:14 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
>
> mpam v0.1 and versions above v1.0 support optional long counter for
> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
> indicating support for long counters. As of now, a 44 bit counter
> represented by HAS_LONG field (bit 30) and a 63 bit counter represented
> by LWD (bit 29) can be optionally integrated. Probe for these counters
> and set corresponding feature bits if any of these counters are present.
>
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 23 ++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 8 ++++++++
> 2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 11be34b54643..2ab7f127baaa 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
> }
> if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
> - bool hw_managed;
> + bool has_long, hw_managed;
> u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
nit: the variable name would be more readable with an underscore,
mwumon_idr.
>
> props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
> @@ -880,6 +880,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
> mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
>
> + /*
> + * Treat long counter and its extension, lwd as mutually
> + * exclusive feature bits. Though these are dependent
> + * fields at the implementation level, there would never
> + * be a need for mpam_feat_msmon_mbwu_44counter (long
> + * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
> + * bits to be set together.
> + *
> + * mpam_feat_msmon_mbwu isn't treated as an exclusive
> + * bit as this feature bit would be used as the "front
> + * facing feature bit" for any checks related to mbwu
> + * monitors.
> + */
> + has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumonidr);
> + if (props->num_mbwu_mon && has_long) {
> + if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumonidr))
> + mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
> + else
> + mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
> + }
> +
> /* Is NRDY hardware managed? */
> mpam_mon_sel_outer_lock(msc);
> hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 9a50a5432f4a..9f627b5f72a1 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -178,7 +178,15 @@ enum mpam_device_features {
> mpam_feat_msmon_csu,
> mpam_feat_msmon_csu_capture,
> mpam_feat_msmon_csu_hw_nrdy,
> +
> + /*
> + * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
> + * counter would be used. The exact counter used is decided based on the
> + * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
mpam_feat_msmon_mbwu_44counter/mpam_feat_msmon_mbwu_63counter
> + */
> mpam_feat_msmon_mbwu,
> + mpam_feat_msmon_mbwu_44counter,
> + mpam_feat_msmon_mbwu_63counter,
> mpam_feat_msmon_mbwu_capture,
> mpam_feat_msmon_mbwu_rwbw,
> mpam_feat_msmon_mbwu_hw_nrdy,
Other than the two nits, the change looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-08-28 17:07 ` Fenghua Yu
0 siblings, 0 replies; 130+ messages in thread
From: Fenghua Yu @ 2025-08-28 17:07 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi, James,
On 8/22/25 08:29, James Morse wrote:
> The MSC MON_SEL register needs to be accessed from hardirq context by the
> PMU drivers, making an irqsave spinlock the obvious lock to protect these
> registers. On systems with SCMI mailboxes it must be able to sleep, meaning
> a mutex must be used.
>
> Clearly these two can't exist at the same time.
>
> Add helpers for the MON_SEL locking. The outer lock must be taken in a
> pre-emptible context before the inner lock can be taken. On systems with
> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
> will fail to be 'taken' if the caller is unable to sleep. This will allow
> the PMU driver to fail without having to check the interface type of
> each MSC.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
> 1 file changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a623f405ddd8..c6f087f9fa7d 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -68,10 +68,19 @@ struct mpam_msc {
>
> /*
> * mon_sel_lock protects access to the MSC hardware registers that are
> - * affeted by MPAMCFG_MON_SEL.
> + * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> + * Both the 'inner' and 'outer' must be taken.
> + * For real MMIO MSC, the outer lock is unnecessary - but keeps the
> + * code common with:
> + * Firmware backed MSC need to sleep when accessing the MSC, which
> + * means some code-paths will always fail. For these MSC the outer
> + * lock is providing the protection, and the inner lock fails to
> + * be taken if the task is unable to sleep.
> + *
> * If needed, take msc->probe_lock first.
> */
> struct mutex outer_mon_sel_lock;
> + bool outer_lock_held;
Is it better to define outer_lock_held at atomic_t?
> raw_spinlock_t inner_mon_sel_lock;
> unsigned long inner_mon_sel_flags;
>
> @@ -81,6 +90,52 @@ struct mpam_msc {
> struct mpam_garbage garbage;
> };
>
> +static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
> +{
> + /*
> + * The outer lock may be taken by a CPU that then issues an IPI to run
> + * a helper that takes the inner lock. lockdep can't help us here.
> + */
> + WARN_ON_ONCE(!msc->outer_lock_held);
At this point, msc->outer_lock_held might not be true yet due to no
memory barrier on it on this CPU. If it's atomic_t and it's set as true
on another CPU by smp_store_release(), it's guaranteed to be visible as
true on this CPU. Without atomic setting, we may see a false warning
here and cause debug difficult.
> +
> + if (msc->iface == MPAM_IFACE_MMIO) {
> + raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
> + return true;
> + }
> +
> + /* Accesses must fail if we are not pre-emptible */
> + return !!preemptible();
> +}
> +
> +static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
> +{
> + WARN_ON_ONCE(!msc->outer_lock_held);
> +
> + if (msc->iface == MPAM_IFACE_MMIO)
> + raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
> +}
> +
> +static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
> +{
> + mutex_lock(&msc->outer_mon_sel_lock);
> + msc->outer_lock_held = true;
> +}
> +
> +static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
> +{
> + msc->outer_lock_held = false;
> + mutex_unlock(&msc->outer_mon_sel_lock);
> +}
> +
> +static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> +{
> + WARN_ON_ONCE(!msc->outer_lock_held);
> + if (msc->iface == MPAM_IFACE_MMIO)
> + lockdep_assert_held_once(&msc->inner_mon_sel_lock);
> + else
> + lockdep_assert_preemption_enabled();
> +}
> +
> struct mpam_class {
> /* mpam_components in this class */
> struct list_head components;
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
2025-08-28 15:58 ` James Morse
@ 2025-08-29 8:20 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 8:20 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/28/25 16:58, James Morse wrote:
> Hi Ben,
>
> On 27/08/2025 09:53, Ben Horgan wrote:
>> On 8/22/25 16:29, James Morse wrote:
>>> The bulk of the MPAM driver lives outside the arch code because it
>>> largely manages MMIO devices that generate interrupts. The driver
>>> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
>>> platforms, that is where the Kconfig option makes the most sense.
>>>
>>> This Kconfig option will later be used by the arch code to enable
>>> or disable the MPAM context-switch code, and registering the CPUs
>>> properties with the MPAM driver.
>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index e9bbfacc35a6..658e47fc0c5a 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>>> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>>> range of input addresses.
>>>
>>> +config ARM64_MPAM
>>> + bool "Enable support for MPAM"
>>> + help
>>> + Memory Partitioning and Monitoring is an optional extension
>>> + that allows the CPUs to mark load and store transactions with
>>> + labels for partition-id and performance-monitoring-group.
>>> + System components, such as the caches, can use the partition-id
>>> + to apply a performance policy. MPAM monitors can use the
>>> + partition-id and performance-monitoring-group to measure the
>>> + cache occupancy or data throughput.
>>> +
>>> + Use of this extension requires CPU support, support in the
>>> + memory system components (MSC), and a description from firmware
>>> + of where the MSC are in the address space.
>>> +
>>> + MPAM is exposed to user-space via the resctrl pseudo filesystem.
>>> +
>>> endmenu # "ARMv8.4 architectural features"
>
>> Should this be moved to "ARMv8.2 architectural features" rather than the
>> 8.4 menu? In the arm reference manual, version L.b, I see FEAT_MPAM
>> listed in the section A2.2.3.1 Features added to the Armv8.2 extension
>> in later releases.
>
> Hmmm, I don't think we've done that anywhere else. I'm only aware of one v8.2 platform
> that had it, and those are not widely available. As it was a headline v8.4 feature I'd
> prefer to keep it there.
>
> I think its more confusing to put it under v8.2!
Ok, always best to minimise confusion. Keep it in v8.4.
>
> Thanks,
>
> James
--
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-08-29 8:42 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 8:42 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
>
> Add the definitions for these registers as offset within the page(s).
>
> Link: https://developer.arm.com/documentation/ihi0099/latest/
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
> * Whitepsace churn.
> * Cite a more recent document.
> * Removed some stale feature, fixed some names etc.
> ---
> drivers/resctrl/mpam_internal.h | 266 ++++++++++++++++++++++++++++++++
> 1 file changed, 266 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d49bb884b433..6e0982a1a9ac 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
> int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> cpumask_t *affinity);
>
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/
> + */
> +#define MPAM_ARCHITECTURE_V1 0x10
> +
> +/* Memory mapped control pages: */
> +/* ID Register offsets in the memory mapped page */
> +#define MPAMF_IDR 0x0000 /* features id register */
> +#define MPAMF_MSMON_IDR 0x0080 /* performance monitoring features */
> +#define MPAMF_IMPL_IDR 0x0028 /* imp-def partitioning */
> +#define MPAMF_CPOR_IDR 0x0030 /* cache-portion partitioning */
> +#define MPAMF_CCAP_IDR 0x0038 /* cache-capacity partitioning */
> +#define MPAMF_MBW_IDR 0x0040 /* mem-bw partitioning */
> +#define MPAMF_PRI_IDR 0x0048 /* priority partitioning */
> +#define MPAMF_CSUMON_IDR 0x0088 /* cache-usage monitor */
> +#define MPAMF_MBWUMON_IDR 0x0090 /* mem-bw usage monitor */
> +#define MPAMF_PARTID_NRW_IDR 0x0050 /* partid-narrowing */
> +#define MPAMF_IIDR 0x0018 /* implementer id register */
> +#define MPAMF_AIDR 0x0020 /* architectural id register */
> +
> +/* Configuration and Status Register offsets in the memory mapped page */
> +#define MPAMCFG_PART_SEL 0x0100 /* partid to configure: */
> +#define MPAMCFG_CPBM 0x1000 /* cache-portion config */
> +#define MPAMCFG_CMAX 0x0108 /* cache-capacity config */
> +#define MPAMCFG_CMIN 0x0110 /* cache-capacity config */
> +#define MPAMCFG_MBW_MIN 0x0200 /* min mem-bw config */
> +#define MPAMCFG_MBW_MAX 0x0208 /* max mem-bw config */
> +#define MPAMCFG_MBW_WINWD 0x0220 /* mem-bw accounting window config */
> +#define MPAMCFG_MBW_PBM 0x2000 /* mem-bw portion bitmap config */
> +#define MPAMCFG_PRI 0x0400 /* priority partitioning config */
> +#define MPAMCFG_MBW_PROP 0x0500 /* mem-bw stride config */
> +#define MPAMCFG_INTPARTID 0x0600 /* partid-narrowing config */
> +
> +#define MSMON_CFG_MON_SEL 0x0800 /* monitor selector */
> +#define MSMON_CFG_CSU_FLT 0x0810 /* cache-usage monitor filter */
> +#define MSMON_CFG_CSU_CTL 0x0818 /* cache-usage monitor config */
> +#define MSMON_CFG_MBWU_FLT 0x0820 /* mem-bw monitor filter */
> +#define MSMON_CFG_MBWU_CTL 0x0828 /* mem-bw monitor config */
> +#define MSMON_CSU 0x0840 /* current cache-usage */
> +#define MSMON_CSU_CAPTURE 0x0848 /* last cache-usage value captured */
> +#define MSMON_MBWU 0x0860 /* current mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE 0x0868 /* last mem-bw value captured */
> +#define MSMON_MBWU_L 0x0880 /* current long mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE_L 0x0890 /* last long mem-bw value captured */
> +#define MSMON_CAPT_EVNT 0x0808 /* signal a capture event */
> +#define MPAMF_ESR 0x00F8 /* error status register */
> +#define MPAMF_ECR 0x00F0 /* error control register */
> +
> +/* MPAMF_IDR - MPAM features ID register */
> +#define MPAMF_IDR_PARTID_MAX GENMASK(15, 0)
> +#define MPAMF_IDR_PMG_MAX GENMASK(23, 16)
> +#define MPAMF_IDR_HAS_CCAP_PART BIT(24)
> +#define MPAMF_IDR_HAS_CPOR_PART BIT(25)
> +#define MPAMF_IDR_HAS_MBW_PART BIT(26)
> +#define MPAMF_IDR_HAS_PRI_PART BIT(27)
> +#define MPAMF_IDR_EXT BIT(28)
> +#define MPAMF_IDR_HAS_IMPL_IDR BIT(29)
> +#define MPAMF_IDR_HAS_MSMON BIT(30)
> +#define MPAMF_IDR_HAS_PARTID_NRW BIT(31)
> +#define MPAMF_IDR_HAS_RIS BIT(32)
> +#define MPAMF_IDR_HAS_EXTD_ESR BIT(38)
> +#define MPAMF_IDR_HAS_ESR BIT(39)
> +#define MPAMF_IDR_RIS_MAX GENMASK(59, 56)
> +
> +/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
> +#define MPAMF_MSMON_IDR_MSMON_CSU BIT(16)
> +#define MPAMF_MSMON_IDR_MSMON_MBWU BIT(17)
> +#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT BIT(31)
> +
> +/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
> +#define MPAMF_CPOR_IDR_CPBM_WD GENMASK(15, 0)
> +
> +/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
> +#define MPAMF_CCAP_IDR_CMAX_WD GENMASK(5, 0)
> +#define MPAMF_CCAP_IDR_CASSOC_WD GENMASK(12, 8)
> +#define MPAMF_CCAP_IDR_HAS_CASSOC BIT(28)
> +#define MPAMF_CCAP_IDR_HAS_CMIN BIT(29)
> +#define MPAMF_CCAP_IDR_NO_CMAX BIT(30)
> +#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM BIT(31)
> +
> +/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
> +#define MPAMF_MBW_IDR_BWA_WD GENMASK(5, 0)
> +#define MPAMF_MBW_IDR_HAS_MIN BIT(10)
> +#define MPAMF_MBW_IDR_HAS_MAX BIT(11)
> +#define MPAMF_MBW_IDR_HAS_PBM BIT(12)
> +#define MPAMF_MBW_IDR_HAS_PROP BIT(13)
> +#define MPAMF_MBW_IDR_WINDWR BIT(14)
> +#define MPAMF_MBW_IDR_BWPBM_WD GENMASK(28, 16)
> +
> +/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
> +#define MPAMF_PRI_IDR_HAS_INTPRI BIT(0)
> +#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW BIT(1)
> +#define MPAMF_PRI_IDR_INTPRI_WD GENMASK(9, 4)
> +#define MPAMF_PRI_IDR_HAS_DSPRI BIT(16)
> +#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW BIT(17)
> +#define MPAMF_PRI_IDR_DSPRI_WD GENMASK(25, 20)
> +
> +/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
> +#define MPAMF_CSUMON_IDR_NUM_MON GENMASK(15, 0)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT BIT(24)
> +#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW BIT(25)
> +#define MPAMF_CSUMON_IDR_HAS_OFSR BIT(26)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG BIT(27)
> +#define MPAMF_CSUMON_IDR_HAS_XCL BIT(29)
> +#define MPAMF_CSUMON_IDR_CSU_RO BIT(30)
> +#define MPAMF_CSUMON_IDR_HAS_CAPTURE BIT(31)
> +
> +/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
> +#define MPAMF_MBWUMON_IDR_NUM_MON GENMASK(15, 0)
> +#define MPAMF_MBWUMON_IDR_HAS_RWBW BIT(28)
> +#define MPAMF_MBWUMON_IDR_LWD BIT(29)
> +#define MPAMF_MBWUMON_IDR_HAS_LONG BIT(30)
> +#define MPAMF_MBWUMON_IDR_HAS_CAPTURE BIT(31)
> +
> +/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
> +#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX GENMASK(15, 0)
nit: spaces used instead of tabs
> +
> +/* MPAMF_IIDR - MPAM implementation ID register */
> +#define MPAMF_IIDR_PRODUCTID GENMASK(31, 20)
> +#define MPAMF_IIDR_PRODUCTID_SHIFT 20
> +#define MPAMF_IIDR_VARIANT GENMASK(19, 16)
> +#define MPAMF_IIDR_VARIANT_SHIFT 16
> +#define MPAMF_IIDR_REVISON GENMASK(15, 12)
> +#define MPAMF_IIDR_REVISON_SHIFT 12
> +#define MPAMF_IIDR_IMPLEMENTER GENMASK(11, 0)
> +#define MPAMF_IIDR_IMPLEMENTER_SHIFT 0
> +
> +/* MPAMF_AIDR - MPAM architecture ID register */
> +#define MPAMF_AIDR_ARCH_MAJOR_REV GENMASK(7, 4)
> +#define MPAMF_AIDR_ARCH_MINOR_REV GENMASK(3, 0)
> +
> +/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
> +#define MPAMCFG_PART_SEL_PARTID_SEL GENMASK(15, 0)
> +#define MPAMCFG_PART_SEL_INTERNAL BIT(16)
> +#define MPAMCFG_PART_SEL_RIS GENMASK(27, 24)
> +
> +/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
> +#define MPAMCFG_CMAX_SOFTLIM BIT(31)
> +#define MPAMCFG_CMAX_CMAX GENMASK(15, 0)
> +
> +/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
> +#define MPAMCFG_CMIN_CMIN GENMASK(15, 0)
> +
> +/*
> + * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
> + * register
> + */
> +#define MPAMCFG_MBW_MIN_MIN GENMASK(15, 0)
> +
> +/*
> + * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
> + * register
> + */
> +#define MPAMCFG_MBW_MAX_MAX GENMASK(15, 0)
> +#define MPAMCFG_MBW_MAX_HARDLIM BIT(31)
> +
> +/*
> + * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
> + * register
> + */
> +#define MPAMCFG_MBW_WINWD_US_FRAC GENMASK(7, 0)
> +#define MPAMCFG_MBW_WINWD_US_INT GENMASK(23, 8)
> +
> +/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
> +#define MPAMCFG_PRI_INTPRI GENMASK(15, 0)
> +#define MPAMCFG_PRI_DSPRI GENMASK(31, 16)
> +
> +/*
> + * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
> + * configuration register
> + */
> +#define MPAMCFG_MBW_PROP_STRIDEM1 GENMASK(15, 0)
> +#define MPAMCFG_MBW_PROP_EN BIT(31)
> +
> +/*
> + * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
> + */
> +#define MPAMCFG_INTPARTID_INTPARTID GENMASK(15, 0)
> +#define MPAMCFG_INTPARTID_INTERNAL BIT(16)
> +
> +/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
> +#define MSMON_CFG_MON_SEL_MON_SEL GENMASK(15, 0)
> +#define MSMON_CFG_MON_SEL_RIS GENMASK(27, 24)
> +
> +/* MPAMF_ESR - MPAM Error Status Register */
> +#define MPAMF_ESR_PARTID_MON GENMASK(15, 0)
> +#define MPAMF_ESR_PMG GENMASK(23, 16)
> +#define MPAMF_ESR_ERRCODE GENMASK(27, 24)
> +#define MPAMF_ESR_OVRWR BIT(31)
> +#define MPAMF_ESR_RIS GENMASK(35, 32)
> +
> +/* MPAMF_ECR - MPAM Error Control Register */
> +#define MPAMF_ECR_INTEN BIT(0)
> +
> +/* Error conditions in accessing memory mapped registers */
> +#define MPAM_ERRCODE_NONE 0
> +#define MPAM_ERRCODE_PARTID_SEL_RANGE 1
> +#define MPAM_ERRCODE_REQ_PARTID_RANGE 2
> +#define MPAM_ERRCODE_MSMONCFG_ID_RANGE 3
> +#define MPAM_ERRCODE_REQ_PMG_RANGE 4
> +#define MPAM_ERRCODE_MONITOR_RANGE 5
> +#define MPAM_ERRCODE_INTPARTID_RANGE 6
> +#define MPAM_ERRCODE_UNEXPECTED_INTERNAL 7
> +
> +/*
> + * MSMON_CFG_CSU_FLT - Memory system performance monitor configure cache storage
> + * usage monitor filter register
> + */
> +#define MSMON_CFG_CSU_FLT_PARTID GENMASK(15, 0)
> +#define MSMON_CFG_CSU_FLT_PMG GENMASK(23, 16)
> +
> +/*
> + * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
> + * usage monitor control register
> + * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
> + * bandwidth usage monitor control register
> + */
> +#define MSMON_CFG_x_CTL_TYPE GENMASK(7, 0)
> +#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L BIT(15)
> +#define MSMON_CFG_x_CTL_MATCH_PARTID BIT(16)
> +#define MSMON_CFG_x_CTL_MATCH_PMG BIT(17)
> +#define MSMON_CFG_x_CTL_SCLEN BIT(19)
> +#define MSMON_CFG_x_CTL_SUBTYPE GENMASK(22, 20)
> +#define MSMON_CFG_x_CTL_OFLOW_FRZ BIT(24)
> +#define MSMON_CFG_x_CTL_OFLOW_INTR BIT(25)
> +#define MSMON_CFG_x_CTL_OFLOW_STATUS BIT(26)
> +#define MSMON_CFG_x_CTL_CAPT_RESET BIT(27)
> +#define MSMON_CFG_x_CTL_CAPT_EVNT GENMASK(30, 28)
> +#define MSMON_CFG_x_CTL_EN BIT(31)
> +
> +#define MSMON_CFG_MBWU_CTL_TYPE_MBWU 0x42
> +#define MSMON_CFG_CSU_CTL_TYPE_CSU 0
> +
> +/*
> + * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
> + * bandwidth usage monitor filter register
> + */
> +#define MSMON_CFG_MBWU_FLT_PARTID GENMASK(15, 0)
> +#define MSMON_CFG_MBWU_FLT_PMG GENMASK(23, 16)
> +#define MSMON_CFG_MBWU_FLT_RWBW GENMASK(31, 30)
> +
> +/*
> + * MSMON_CSU - Memory system performance monitor cache storage usage monitor
> + * register
> + * MSMON_CSU_CAPTURE - Memory system performance monitor cache storage usage
> + * capture register
> + * MSMON_MBWU - Memory system performance monitor memory bandwidth usage
> + * monitor register
> + * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
> + * capture register
> + */
> +#define MSMON___VALUE GENMASK(30, 0)
> +#define MSMON___NRDY BIT(31)
> +#define MSMON___NRDY_L BIT(63)
> +#define MSMON___L_VALUE GENMASK(43, 0)
> +#define MSMON___LWD_VALUE GENMASK(62, 0)
> +
> +/*
> + * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
> + * generation register
> + */
> +#define MSMON_CAPT_EVNT_NOW BIT(0)
> +
> #endif /* MPAM_INTERNAL_H */
The names and values match the specification.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-29 12:41 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 12:41 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
>
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
>
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented
> as different RIS. Re-combining these RIS allows their feature bits to
> be or-ed. This structure is not visible outside mpam_devices.c
>
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need
> to be configured the same as the hardware is not able to distribute the
> configuration.
>
> Add support for creating and destroying these structures.
>
> A gfp is passed as the structures may need creating when a new RIS entry
> is discovered when probing the MSC.
>
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * removed a pr_err() debug message that crept in.
> ---
> drivers/resctrl/mpam_devices.c | 488 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 91 ++++++
> include/linux/arm_mpam.h | 8 +-
> 3 files changed, 574 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -20,7 +20,6 @@
> #include <linux/printk.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> -#include <linux/srcu.h>
> #include <linux/types.h>
>
> #include <acpi/pcc.h>
> @@ -35,11 +34,483 @@
> static DEFINE_MUTEX(mpam_list_lock);
> static LIST_HEAD(mpam_all_msc);
>
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
>
> /* MPAM isn't available until all the MSC have been probed. */
> static u32 mpam_num_msc;
>
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
> +{
> + struct mpam_vmsc *vmsc;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + vmsc = kzalloc(sizeof(*vmsc), gfp);
> + if (!comp)
> + return ERR_PTR(-ENOMEM);
> + init_garbage(vmsc);
> +
> + INIT_LIST_HEAD_RCU(&vmsc->ris);
> + INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> + vmsc->comp = comp;
> + vmsc->msc = msc;
> +
> + list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> + return vmsc;
> +}
> +
> +static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
> + struct mpam_msc *msc, bool alloc,
> + gfp_t gfp)
> +{
> + struct mpam_vmsc *vmsc;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> + if (vmsc->msc->id == msc->id)
> + return vmsc;
> + }
> +
> + if (!alloc)
> + return ERR_PTR(-ENOENT);
> +
> + return mpam_vmsc_alloc(comp, msc, gfp);
> +}
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp)
> +{
> + struct mpam_component *comp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + comp = kzalloc(sizeof(*comp), gfp);
> + if (!comp)
> + return ERR_PTR(-ENOMEM);
> + init_garbage(comp);
> +
> + comp->comp_id = id;
> + INIT_LIST_HEAD_RCU(&comp->vmsc);
> + /* affinity is updated when ris are added */
> + INIT_LIST_HEAD_RCU(&comp->class_list);
> + comp->class = class;
> +
> + list_add_rcu(&comp->class_list, &class->components);
> +
> + return comp;
> +}
> +
> +static struct mpam_component *
> +mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t gfp)
> +{
> + struct mpam_component *comp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_for_each_entry(comp, &class->components, class_list) {
> + if (comp->comp_id == id)
> + return comp;
> + }
> +
> + if (!alloc)
> + return ERR_PTR(-ENOENT);
> +
> + return mpam_component_alloc(class, id, gfp);
> +}
> +
> +static struct mpam_class *
> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
> +{
> + struct mpam_class *class;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + class = kzalloc(sizeof(*class), gfp);
> + if (!class)
> + return ERR_PTR(-ENOMEM);
> + init_garbage(class);
> +
> + INIT_LIST_HEAD_RCU(&class->components);
> + /* affinity is updated when ris are added */
> + class->level = level_idx;
> + class->type = type;
> + INIT_LIST_HEAD_RCU(&class->classes_list);
> +
> + list_add_rcu(&class->classes_list, &mpam_classes);
> +
> + return class;
> +}
> +
> +static struct mpam_class *
> +mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc, gfp_t gfp)
> +{
> + bool found = false;
> + struct mpam_class *class;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_for_each_entry(class, &mpam_classes, classes_list) {
> + if (class->type == type && class->level == level_idx) {
> + found = true;
> + break;
> + }
> + }
> +
> + if (found)
> + return class;
> +
> + if (!alloc)
> + return ERR_PTR(-ENOENT);
> +
> + return mpam_class_alloc(level_idx, type, gfp);
> +}
> +
> +#define add_to_garbage(x) \
> +do { \
> + __typeof__(x) _x = x; \
> + (_x)->garbage.to_free = (_x); \
> + llist_add(&(_x)->garbage.llist, &mpam_garbage); \
> +} while (0)
> +
> +static void mpam_class_destroy(struct mpam_class *class)
> +{
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_del_rcu(&class->classes_list);
> + add_to_garbage(class);
> +}
> +
> +static void mpam_comp_destroy(struct mpam_component *comp)
> +{
> + struct mpam_class *class = comp->class;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_del_rcu(&comp->class_list);
> + add_to_garbage(comp);
> +
> + if (list_empty(&class->components))
> + mpam_class_destroy(class);
> +}
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> + struct mpam_component *comp = vmsc->comp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_del_rcu(&vmsc->comp_list);
> + add_to_garbage(vmsc);
> +
> + if (list_empty(&comp->vmsc))
> + mpam_comp_destroy(comp);
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> + struct mpam_vmsc *vmsc = ris->vmsc;
> + struct mpam_msc *msc = vmsc->msc;
> + struct platform_device *pdev = msc->pdev;
> + struct mpam_component *comp = vmsc->comp;
> + struct mpam_class *class = comp->class;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> + cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> + clear_bit(ris->ris_idx, msc->ris_idxs);
> + list_del_rcu(&ris->vmsc_list);
> + list_del_rcu(&ris->msc_list);
> + add_to_garbage(ris);
> + ris->garbage.pdev = pdev;
> +
> + if (list_empty(&vmsc->ris))
> + mpam_vmsc_destroy(vmsc);
> +}
> +
> +/*
> + * There are two ways of reaching a struct mpam_msc_ris. Via the
> + * class->component->vmsc->ris, or via the msc.
> + * When destroying the msc, the other side needs unlinking and cleaning up too.
> + */
> +static void mpam_msc_destroy(struct mpam_msc *msc)
> +{
> + struct platform_device *pdev = msc->pdev;
> + struct mpam_msc_ris *ris, *tmp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + list_del_rcu(&msc->glbl_list);
> + platform_set_drvdata(pdev, NULL);
> +
> + list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> + mpam_ris_destroy(ris);
> +
> + add_to_garbage(msc);
> + msc->garbage.pdev = pdev;
> +}
> +
> +static void mpam_free_garbage(void)
> +{
> + struct mpam_garbage *iter, *tmp;
> + struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +
> + if (!to_free)
> + return;
> +
> + synchronize_srcu(&mpam_srcu);
> +
> + llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> + if (iter->pdev)
> + devm_kfree(&iter->pdev->dev, iter->to_free);
> + else
> + kfree(iter->to_free);
> + }
> +}
> +
> +/* Called recursively to walk the list of caches from a particular CPU */
> +static void __mpam_get_cpumask_from_cache_id(int cpu, struct device_node *cache_node,
> + unsigned long cache_id,
> + u32 cache_level,
> + cpumask_t *affinity)
> +{
> + int err;
> + u32 iter_level;
> + unsigned long iter_cache_id;
> + struct device_node *iter_node __free(device_node) = of_find_next_cache_node(cache_node);
> +
> + if (!iter_node)
> + return;
> +
> + err = of_property_read_u32(iter_node, "cache-level", &iter_level);
> + if (err)
> + return;
> +
> + /*
> + * get_cpu_cacheinfo_id() isn't ready until sometime
> + * during device_initcall(). Use cache_of_calculate_id().
> + */
> + iter_cache_id = cache_of_calculate_id(iter_node);
> + if (cache_id == ~0UL)
> + return;
> +
> + if (iter_level == cache_level && iter_cache_id == cache_id)
> + cpumask_set_cpu(cpu, affinity);
> +
> + __mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id, cache_level,
> + affinity);
> +}
> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + * This helper walks the device tree to include offline CPUs too.
> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> + cpumask_t *affinity)
> +{
> + int cpu;
> +
> + if (!acpi_disabled)
> + return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
> +
> + for_each_possible_cpu(cpu) {
> + struct device_node *cpu_node __free(device_node) = of_get_cpu_node(cpu, NULL);
> + if (!cpu_node) {
> + pr_err("Failed to find cpu%d device node\n", cpu);
> + return -ENOENT;
> + }
> +
> + __mpam_get_cpumask_from_cache_id(cpu, cpu_node, cache_id,
> + cache_level, affinity);
> + continue;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * cpumask_of_node() only knows about online CPUs. This can't tell us whether
> + * a class is represented on all possible CPUs.
> + */
> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
> +{
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + if (node_id == cpu_to_node(cpu))
> + cpumask_set_cpu(cpu, affinity);
> + }
> +}
> +
> +static int get_cpumask_from_cache(struct device_node *cache,
> + cpumask_t *affinity)
> +{
> + int err;
> + u32 cache_level;
> + unsigned long cache_id;
> +
> + err = of_property_read_u32(cache, "cache-level", &cache_level);
> + if (err) {
> + pr_err("Failed to read cache-level from cache node\n");
> + return -ENOENT;
> + }
> +
> + cache_id = cache_of_calculate_id(cache);
> + if (cache_id == ~0UL) {
> + pr_err("Failed to calculate cache-id from cache node\n");
> + return -ENOENT;
> + }
> +
> + return mpam_get_cpumask_from_cache_id(cache_id, cache_level, affinity);
> +}
> +
> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
> + enum mpam_class_types type,
> + struct mpam_class *class,
> + struct mpam_component *comp)
> +{
> + int err;
> +
> + switch (type) {
> + case MPAM_CLASS_CACHE:
> + err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
> + affinity);
> + if (err)
> + return err;
> +
> + if (cpumask_empty(affinity))
> + pr_warn_once("%s no CPUs associated with cache node",
> + dev_name(&msc->pdev->dev));
> +
> + break;
> + case MPAM_CLASS_MEMORY:
> + get_cpumask_from_node_id(comp->comp_id, affinity);
> + /* affinity may be empty for CPU-less memory nodes */
> + break;
> + case MPAM_CLASS_UNKNOWN:
> + return 0;
> + }
> +
> + cpumask_and(affinity, affinity, &msc->accessibility);
> +
> + return 0;
> +}
> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> + enum mpam_class_types type, u8 class_id,
> + int component_id, gfp_t gfp)
> +{
> + int err;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> + struct mpam_class *class;
> + struct mpam_component *comp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + if (test_and_set_bit(ris_idx, msc->ris_idxs))
> + return -EBUSY;
> +
> + ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
> + if (!ris)
> + return -ENOMEM;
> + init_garbage(ris);
> +
> + class = mpam_class_get(class_id, type, true, gfp);
> + if (IS_ERR(class))
> + return PTR_ERR(class);
> +
> + comp = mpam_component_get(class, component_id, true, gfp);
> + if (IS_ERR(comp)) {
> + if (list_empty(&class->components))
> + mpam_class_destroy(class);
> + return PTR_ERR(comp);
> + }
> +
> + vmsc = mpam_vmsc_get(comp, msc, true, gfp);
> + if (IS_ERR(vmsc)) {
> + if (list_empty(&comp->vmsc))
> + mpam_comp_destroy(comp);
> + return PTR_ERR(vmsc);
> + }
> +
> + err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> + if (err) {
> + if (list_empty(&vmsc->ris))
> + mpam_vmsc_destroy(vmsc);
> + return err;
> + }
> +
> + ris->ris_idx = ris_idx;
> + INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> + ris->vmsc = vmsc;
> +
> + cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> + cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> + list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +
> + return 0;
> +}
> +
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> + enum mpam_class_types type, u8 class_id, int component_id)
> +{
> + int err;
> +
> + mutex_lock(&mpam_list_lock);
> + err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> + component_id, GFP_KERNEL);
> + mutex_unlock(&mpam_list_lock);
> + if (err)
> + mpam_free_garbage();
> +
> + return err;
> +}
> +
> static void mpam_discovery_complete(void)
> {
> pr_err("Discovered all MSC\n");
> @@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
> cpumask_copy(&msc->accessibility, cpu_possible_mask);
> err = 0;
> } else {
> - if (of_device_is_compatible(parent, "memory")) {
> + if (of_device_is_compatible(parent, "cache")) {
> + err = get_cpumask_from_cache(parent,
> + &msc->accessibility);
> + } else if (of_device_is_compatible(parent, "memory")) {
The determination of the accessibility for the h/w msc doesn't fit with
the subject of this patch. Could this hunk and the supporting functions
be split into a precursor patch?
> cpumask_copy(&msc->accessibility, cpu_possible_mask);
> err = 0;
> } else {
> @@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>
> mutex_lock(&mpam_list_lock);
> mpam_num_msc--;
> - platform_set_drvdata(pdev, NULL);
> - list_del_rcu(&msc->glbl_list);
> - synchronize_srcu(&mpam_srcu);
> - devm_kfree(&pdev->dev, msc);
> + mpam_msc_destroy(msc);
> mutex_unlock(&mpam_list_lock);
> +
> + mpam_free_garbage();
> }
>
> static int mpam_msc_drv_probe(struct platform_device *pdev)
> @@ -230,6 +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
> err = -ENOMEM;
> break;
> }
> + init_garbage(msc);
>
> mutex_init(&msc->probe_lock);
> mutex_init(&msc->part_sel_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 07e0f240eaca..d49bb884b433 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -7,10 +7,27 @@
> #include <linux/arm_mpam.h>
> #include <linux/cpumask.h>
> #include <linux/io.h>
> +#include <linux/llist.h>
> #include <linux/mailbox_client.h>
> #include <linux/mutex.h>
> #include <linux/resctrl.h>
> #include <linux/sizes.h>
> +#include <linux/srcu.h>
> +
> +/*
> + * Structures protected by SRCU may not be freed for a surprising amount of
> + * time (especially if perf is running). To ensure the MPAM error interrupt can
> + * tear down all the structures, build a list of objects that can be gargbage
nit: s/gargbage/garbage/
> + * collected once synchronize_srcu() has returned.> + * If pdev is
non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> + /* member of mpam_garbage */
> + struct llist_node llist;
> +
> + void *to_free;
> + struct platform_device *pdev;
> +};
>
> struct mpam_msc {
> /* member of mpam_all_msc */
> @@ -57,6 +74,80 @@ struct mpam_msc {
>
> void __iomem *mapped_hwpage;
> size_t mapped_hwpage_sz;
> +
> + struct mpam_garbage garbage;
> +};
> +
> +struct mpam_class {
> + /* mpam_components in this class */
> + struct list_head components;
> +
> + cpumask_t affinity;
> +
> + u8 level;
> + enum mpam_class_types type;
> +
> + /* member of mpam_classes */
> + struct list_head classes_list;
> +
> + struct mpam_garbage garbage;
> +};
> +
> +struct mpam_component {
> + u32 comp_id;
> +
> + /* mpam_vmsc in this component */
> + struct list_head vmsc;
> +
> + cpumask_t affinity;
> +
> + /* member of mpam_class:components */
> + struct list_head class_list;
> +
> + /* parent: */
> + struct mpam_class *class;
> +
> + struct mpam_garbage garbage;
> };
>
> +struct mpam_vmsc {
> + /* member of mpam_component:vmsc_list */
> + struct list_head comp_list;
> +
> + /* mpam_msc_ris in this vmsc */
> + struct list_head ris;
> +
> + /* All RIS in this vMSC are members of this MSC */
> + struct mpam_msc *msc;
> +
> + /* parent: */
> + struct mpam_component *comp;
> +
> + struct mpam_garbage garbage;
> +};
> +
> +struct mpam_msc_ris {
> + u8 ris_idx;
> +
> + cpumask_t affinity;
> +
> + /* member of mpam_vmsc:ris */
> + struct list_head vmsc_list;
> +
> + /* member of mpam_msc:ris */
> + struct list_head msc_list;
> +
> + /* parent: */
> + struct mpam_vmsc *vmsc;
> +
> + struct mpam_garbage garbage;
> +};
> +
> +/* List of all classes - protected by srcu*/
> +extern struct srcu_struct mpam_srcu;
> +extern struct list_head mpam_classes;
> +
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> + cpumask_t *affinity);
> +
> #endif /* MPAM_INTERNAL_H */
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 0edefa6ba019..406a77be68cb 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
> #endif
>
> -static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> - enum mpam_class_types type, u8 class_id,
> - int component_id)
> -{
> - return -EINVAL;
> -}
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> + enum mpam_class_types type, u8 class_id, int component_id);
>
> #endif /* __LINUX_ARM_MPAM_H */
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-08-29 13:54 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 13:54 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> To make a decision about whether to expose an mpam class as
> a resctrl resource we need to know its overall supported
> features and properties.
>
> Once we've probed all the resources, we can walk the tree
> and produce overall values by merging the bitmaps. This
> eliminates features that are only supported by some MSC
> that make up a component or class.
>
> If bitmap properties are mismatched within a component we
> cannot support the mismatched feature.
>
> Care has to be taken as vMSC may hold mismatched RIS.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 215 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 8 ++
> 2 files changed, 223 insertions(+)
Intricate but, as far as I can tell, all correct.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-08-29 14:30 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 14:30 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
>
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 62 +++++++++++++++++++++++++++++++--
> drivers/resctrl/mpam_internal.h | 1 +
> 2 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 759244966736..3516cbe8623e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -915,8 +915,6 @@ static int mpam_reset_ris(void *arg)
> u16 partid, partid_max;
> struct mpam_msc_ris *ris = arg;
>
> - mpam_assert_srcu_read_lock_held();
> -
> if (ris->in_reset_state)
> return 0;
>
> @@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
> mpam_partid_max + 1, mpam_pmg_max + 1);
> }
>
> +static void mpam_reset_component_locked(struct mpam_component *comp)
> +{
> + int idx;
> + struct mpam_msc *msc;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> +
> + might_sleep();
> + lockdep_assert_cpus_held();
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> + msc = vmsc->msc;
> +
> + list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> + if (!ris->in_reset_state)
> + mpam_touch_msc(msc, mpam_reset_ris, ris);
> + ris->in_reset_state = true;
> + }
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
> +static void mpam_reset_class_locked(struct mpam_class *class)
> +{
> + int idx;
> + struct mpam_component *comp;
> +
> + lockdep_assert_cpus_held();
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_rcu(comp, &class->components, class_list)
> + mpam_reset_component_locked(comp);
> + srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
> +static void mpam_reset_class(struct mpam_class *class)
> +{
> + cpus_read_lock();
> + mpam_reset_class_locked(class);
> + cpus_read_unlock();
> +}
> +
> +/*
> + * Called in response to an error IRQ.
> + * All of MPAMs errors indicate a software bug, restore any modified
> + * controls to their reset values.
> + */
> +void mpam_disable(void)
> +{
> + int idx;
> + struct mpam_class *class;
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> + srcu_read_lock_held(&mpam_srcu))
Why do you use list_for_each_entry_srcu() here when in other places you
use list_for_each_entry_rcu()?
> + mpam_reset_class(class);
> + srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
> /*
> * Enable mpam once all devices have been probed.
> * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 466d670a01eb..b30fee2b7674 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -281,6 +281,7 @@ extern u8 mpam_pmg_max;
>
> /* Scheduled work callback to enable mpam once all MSC have been probed */
> void mpam_enable(struct work_struct *work);
> +void mpam_disable(void);
>
> int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> cpumask_t *affinity);
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 26/33] arm_mpam: Add helpers to allocate monitors
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-08-29 15:47 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 15:47 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> MPAM's MSC support a number of monitors, each of which supports
> bandwidth counters, or cache-storage-utilisation counters. To use
> a counter, a monitor needs to be configured. Add helpers to allocate
> and free CSU or MBWU monitors.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 2 ++
> drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
> 2 files changed, 37 insertions(+)
This looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-08-29 15:55 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 15:55 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
>
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
>
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 222 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 18 +++
> 2 files changed, 240 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index e7e00c632512..9ce771aaf671 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> return 0;
> }
>
> +struct mon_read {
> + struct mpam_msc_ris *ris;
> + struct mon_cfg *ctx;
> + enum mpam_device_features type;
> + u64 *val;
> + int err;
> +};
> +
> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> + u32 *flt_val)
> +{
> + struct mon_cfg *ctx = m->ctx;
> +
> + switch (m->type) {
> + case mpam_feat_msmon_csu:
> + *ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
> + break;
> + case mpam_feat_msmon_mbwu:
> + *ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
> + break;
> + default:
> + return;
> + }
> +
> + /*
> + * For CSU counters its implementation-defined what happens when not
> + * filtering by partid.
> + */
> + *ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
> +
> + *flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
> + if (m->ctx->match_pmg) {
> + *ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
> + *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
> + }
As we are using MSMON_CFG_MBWU_FLT_{PMG,PARTID} for both CSU and MBWU
how about changing to MSMON_CFG_x_FLT_{PMG,PARTID}?
> +
> + if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
> + *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
This needs to be conditional on the type of the monitor being
configured. There is an XCL bit here for CSU monitors.
> +}
> +
> +static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> + u32 *flt_val)
> +{
> + struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> + switch (m->type) {
> + case mpam_feat_msmon_csu:
> + *ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
> + *flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
> + break;
> + case mpam_feat_msmon_mbwu:
> + *ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> + *flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> + break;
> + default:
> + return;
> + }
> +}
> +
> +/* Remove values set by the hardware to prevent apparant mismatches. */
> +static void clean_msmon_ctl_val(u32 *cur_ctl)
> +{
> + *cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +}
> +
> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> + u32 flt_val)
> +{
> + struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> + /*
> + * Write the ctl_val with the enable bit cleared, reset the counter,
> + * then enable counter.
> + */
> + switch (m->type) {
> + case mpam_feat_msmon_csu:
> + mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
> + mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
> + mpam_write_monsel_reg(msc, CSU, 0);
> + mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> + break;
> + case mpam_feat_msmon_mbwu:
> + mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> + mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> + mpam_write_monsel_reg(msc, MBWU, 0);
> + mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> + break;
> + default:
> + return;
> + }
> +}
> +
> +/* Call with MSC lock held */
> +static void __ris_msmon_read(void *arg)
> +{
> + u64 now;
> + bool nrdy = false;
> + struct mon_read *m = arg;
> + struct mon_cfg *ctx = m->ctx;
> + struct mpam_msc_ris *ris = m->ris;
> + struct mpam_props *rprops = &ris->props;
> + struct mpam_msc *msc = m->ris->vmsc->msc;
> + u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> +
> + if (!mpam_mon_sel_inner_lock(msc)) {
> + m->err = -EIO;
> + return;
> + }
> + mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
> + FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> + mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> + /*
> + * Read the existing configuration to avoid re-writing the same values.
> + * This saves waiting for 'nrdy' on subsequent reads.
> + */
> + read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
> + clean_msmon_ctl_val(&cur_ctl);
> + gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> + if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> + write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +
> + switch (m->type) {
> + case mpam_feat_msmon_csu:
> + now = mpam_read_monsel_reg(msc, CSU);
> + if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> + nrdy = now & MSMON___NRDY;
> + break;
> + case mpam_feat_msmon_mbwu:
> + now = mpam_read_monsel_reg(msc, MBWU);
> + if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> + nrdy = now & MSMON___NRDY;
> + break;
> + default:
> + m->err = -EINVAL;
> + break;
> + }
> + mpam_mon_sel_inner_unlock(msc);
> +
> + if (nrdy) {
> + m->err = -EBUSY;
> + return;
> + }
> +
> + now = FIELD_GET(MSMON___VALUE, now);
> + *m->val += now;
> +}
> +
> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> + int err, idx;
> + struct mpam_msc *msc;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> +
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> + msc = vmsc->msc;
> +
> + mpam_mon_sel_outer_lock(msc);
> + list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> + arg->ris = ris;
> +
> + err = smp_call_function_any(&msc->accessibility,
> + __ris_msmon_read, arg,
> + true);
> + if (!err && arg->err)
> + err = arg->err;
> + if (err)
> + break;
> + }
> + mpam_mon_sel_outer_unlock(msc);
> + if (err)
> + break;
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> +
> + return err;
> +}
> +
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> + enum mpam_device_features type, u64 *val)
> +{
> + int err;
> + struct mon_read arg;
> + u64 wait_jiffies = 0;
> + struct mpam_props *cprops = &comp->class->props;
> +
> + might_sleep();
> +
> + if (!mpam_is_enabled())
> + return -EIO;
> +
> + if (!mpam_has_feature(type, cprops))
> + return -EOPNOTSUPP;
> +
> + memset(&arg, 0, sizeof(arg));
> + arg.ctx = ctx;
> + arg.type = type;
> + arg.val = val;
> + *val = 0;
> +
> + err = _msmon_read(comp, &arg);
> + if (err == -EBUSY && comp->class->nrdy_usec)
> + wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
> +
> + while (wait_jiffies)
> + wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
> +
> + if (err == -EBUSY) {
> + memset(&arg, 0, sizeof(arg));
> + arg.ctx = ctx;
> + arg.type = type;
> + arg.val = val;
> + *val = 0;
> +
> + err = _msmon_read(comp, &arg);
> + }
> +
> + return err;
> +}
> +
> static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> {
> u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4981de120869..76e406a2b0d1 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -309,6 +309,21 @@ struct mpam_msc_ris {
> struct mpam_garbage garbage;
> };
>
> +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> +enum mon_filter_options {
> + COUNT_BOTH = 0,
> + COUNT_WRITE = 1,
> + COUNT_READ = 2,
> +};
> +
> +struct mon_cfg {
> + u16 mon;
> + u8 pmg;
> + bool match_pmg;
> + u32 partid;
> + enum mon_filter_options opts;
> +};
> +
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> {
> struct mpam_props *cprops = &class->props;
> @@ -361,6 +376,9 @@ void mpam_disable(struct work_struct *work);
> int mpam_apply_config(struct mpam_component *comp, u16 partid,
> struct mpam_config *cfg);
>
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> + enum mpam_device_features, u64 *val);
> +
> int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> cpumask_t *affinity);
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-29 16:09 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 16:09 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
>
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
>
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 163 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 54 ++++++++---
> 2 files changed, 200 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 9ce771aaf671..11be34b54643 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1004,6 +1004,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> *ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>
> *flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
> + *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
> if (m->ctx->match_pmg) {
> *ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
> *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
> @@ -1041,6 +1042,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
> static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> u32 flt_val)
> {
> + struct msmon_mbwu_state *mbwu_state;
> struct mpam_msc *msc = m->ris->vmsc->msc;
>
> /*
> @@ -1059,20 +1061,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> mpam_write_monsel_reg(msc, MBWU, 0);
> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +
> + mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> + if (mbwu_state)
> + mbwu_state->prev_val = 0;
> +
> break;
> default:
> return;
> }
> }
>
> +static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
> +{
> + /* TODO: scaling, and long counters */
> + return GENMASK_ULL(30, 0);
> +}
> +
> /* Call with MSC lock held */
> static void __ris_msmon_read(void *arg)
> {
> - u64 now;
> bool nrdy = false;
> struct mon_read *m = arg;
> + u64 now, overflow_val = 0;
> struct mon_cfg *ctx = m->ctx;
> struct mpam_msc_ris *ris = m->ris;
> + struct msmon_mbwu_state *mbwu_state;
> struct mpam_props *rprops = &ris->props;
> struct mpam_msc *msc = m->ris->vmsc->msc;
> u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> @@ -1100,11 +1114,30 @@ static void __ris_msmon_read(void *arg)
> now = mpam_read_monsel_reg(msc, CSU);
> if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> nrdy = now & MSMON___NRDY;
> + now = FIELD_GET(MSMON___VALUE, now);
> break;
> case mpam_feat_msmon_mbwu:
> now = mpam_read_monsel_reg(msc, MBWU);
> if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> nrdy = now & MSMON___NRDY;
> + now = FIELD_GET(MSMON___VALUE, now);
> +
> + if (nrdy)
> + break;
> +
> + mbwu_state = &ris->mbwu_state[ctx->mon];
> + if (!mbwu_state)
> + break;
> +
> + /* Add any pre-overflow value to the mbwu_state->val */
> + if (mbwu_state->prev_val > now)
> + overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
> +
> + mbwu_state->prev_val = now;
> + mbwu_state->correction += overflow_val;
> +
> + /* Include bandwidth consumed before the last hardware reset */
> + now += mbwu_state->correction;
> break;
> default:
> m->err = -EINVAL;
> @@ -1117,7 +1150,6 @@ static void __ris_msmon_read(void *arg)
> return;
> }
>
> - now = FIELD_GET(MSMON___VALUE, now);
> *m->val += now;
> }
>
> @@ -1329,6 +1361,72 @@ static int mpam_reprogram_ris(void *_arg)
> return 0;
> }
>
> +/* Call with MSC lock and outer mon_sel lock held */
> +static int mpam_restore_mbwu_state(void *_ris)
> +{
> + int i;
> + struct mon_read mwbu_arg;
> + struct mpam_msc_ris *ris = _ris;
> + struct mpam_msc *msc = ris->vmsc->msc;
> +
> + mpam_mon_sel_outer_lock(msc);
> +
> + for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> + if (ris->mbwu_state[i].enabled) {
> + mwbu_arg.ris = ris;
> + mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
> + mwbu_arg.type = mpam_feat_msmon_mbwu;
> +
> + __ris_msmon_read(&mwbu_arg);
> + }
> + }
> +
> + mpam_mon_sel_outer_unlock(msc);
> +
> + return 0;
> +}
> +
> +/* Call with MSC lock and outer mon_sel lock held */
> +static int mpam_save_mbwu_state(void *arg)
> +{
> + int i;
> + u64 val;
> + struct mon_cfg *cfg;
> + u32 cur_flt, cur_ctl, mon_sel;
> + struct mpam_msc_ris *ris = arg;
> + struct msmon_mbwu_state *mbwu_state;
> + struct mpam_msc *msc = ris->vmsc->msc;
> +
> + for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> + mbwu_state = &ris->mbwu_state[i];
> + cfg = &mbwu_state->cfg;
> +
> + if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
> + return -EIO;
> +
> + mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
> + FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> + mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> + cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> + cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> + mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
> +
> + val = mpam_read_monsel_reg(msc, MBWU);
> + mpam_write_monsel_reg(msc, MBWU, 0);
> +
> + cfg->mon = i;
> + cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
> + cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
> + cfg->partid = FIELD_GET(MSMON_CFG_MBWU_FLT_PARTID, cur_flt);
> + mbwu_state->correction += val;
> + mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
> + mpam_mon_sel_inner_unlock(msc);
> + }
> +
> + return 0;
> +}
> +
> /*
> * Called via smp_call_on_cpu() to prevent migration, while still being
> * pre-emptible.
> @@ -1389,6 +1487,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> * for non-zero partid may be lost while the CPUs are offline.
> */
> ris->in_reset_state = online;
> +
> + if (mpam_is_enabled() && !online)
> + mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
> }
> mpam_mon_sel_outer_unlock(msc);
> }
> @@ -1423,6 +1524,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
> mpam_reprogram_ris_partid(ris, partid, cfg);
> }
> ris->in_reset_state = reset;
> +
> + if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
> + mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
> }
> }
>
> @@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
>
> static void __destroy_component_cfg(struct mpam_component *comp)
> {
> + struct mpam_msc *msc;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> add_to_garbage(comp->cfg);
> + list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> + msc = vmsc->msc;
> +
> + mpam_mon_sel_outer_lock(msc);
> + if (mpam_mon_sel_inner_lock(msc)) {
> + list_for_each_entry(ris, &vmsc->ris, vmsc_list)
> + add_to_garbage(ris->mbwu_state);
> + mpam_mon_sel_inner_unlock(msc);
> + }
> + mpam_mon_sel_outer_lock(msc);
> + }
> }
>
> static int __allocate_component_cfg(struct mpam_component *comp)
> {
> + int err = 0;
> + struct mpam_msc *msc;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> + struct msmon_mbwu_state *mbwu_state;
> +
> + lockdep_assert_held(&mpam_list_lock);
> mpam_assert_partid_sizes_fixed();
>
> if (comp->cfg)
> @@ -2306,6 +2434,37 @@ static int __allocate_component_cfg(struct mpam_component *comp)
> return -ENOMEM;
> init_garbage(comp->cfg);
>
> + list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> + if (!vmsc->props.num_mbwu_mon)
> + continue;
> +
> + msc = vmsc->msc;
> + mpam_mon_sel_outer_lock(msc);
> + list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
> + if (!ris->props.num_mbwu_mon)
> + continue;
> +
> + mbwu_state = kcalloc(ris->props.num_mbwu_mon,
> + sizeof(*ris->mbwu_state),
> + GFP_KERNEL);
> + if (!mbwu_state) {
> + __destroy_component_cfg(comp);
> + err = -ENOMEM;
> + break;
> + }
> +
> + if (mpam_mon_sel_inner_lock(msc)) {
> + init_garbage(mbwu_state);
> + ris->mbwu_state = mbwu_state;
> + mpam_mon_sel_inner_unlock(msc);
> + }
> + }
> + mpam_mon_sel_outer_unlock(msc);
> +
> + if (err)
> + break;
> + }
> +
> return 0;
> }
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 76e406a2b0d1..9a50a5432f4a 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -271,6 +271,42 @@ struct mpam_component {
> struct mpam_garbage garbage;
> };
>
> +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> +enum mon_filter_options {
> + COUNT_BOTH = 0,
> + COUNT_WRITE = 1,
> + COUNT_READ = 2,
> +};
> +
> +struct mon_cfg {
> + /* mon is wider than u16 to hold an out of range 'USE_RMID_IDX' */
> + u32 mon;
> + u8 pmg;
> + bool match_pmg;
> + u32 partid;
> + enum mon_filter_options opts;
> +};
> +
> +/*
> + * Changes to enabled and cfg are protected by the msc->lock.
> + * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
> + */
> +struct msmon_mbwu_state {
> + bool enabled;
> + struct mon_cfg cfg;
> +
> + /* The value last read from the hardware. Used to detect overflow. */
> + u64 prev_val;
> +
> + /*
> + * The value to add to the new reading to account for power management,
> + * and shifts to trigger the overflow interrupt.
> + */
> + u64 correction;
> +
> + struct mpam_garbage garbage;
> +};
> +
These structures have ended up between struct mpam_component and struct
mpam_vmc. Move to somewhere more natural.
> struct mpam_vmsc {
> /* member of mpam_component:vmsc_list */
> struct list_head comp_list;
> @@ -306,22 +342,10 @@ struct mpam_msc_ris {
> /* parent: */
> struct mpam_vmsc *vmsc;
>
> - struct mpam_garbage garbage;
> -};
> + /* msmon mbwu configuration is preserved over reset */
> + struct msmon_mbwu_state *mbwu_state;
>
> -/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> -enum mon_filter_options {
> - COUNT_BOTH = 0,
> - COUNT_WRITE = 1,
> - COUNT_READ = 2,
> -};
> -
> -struct mon_cfg {
> - u16 mon;
> - u8 pmg;
> - bool match_pmg;
> - u32 partid;
> - enum mon_filter_options opts;
Choose where this enum and structure go in the previous patch.
> + struct mpam_garbage garbage;
> };
>
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-08-29 16:39 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 16:39 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
>
> If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
> the RIS, use long/LWD counter instead of the regular 31 bit mbwu
> counter.
>
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
>
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Commit message wrangling.
> * Refer to 31 bit counters as opposed to 32 bit (registers).
> ---
> drivers/resctrl/mpam_devices.c | 89 ++++++++++++++++++++++++++++++----
> 1 file changed, 80 insertions(+), 9 deletions(-)
>
Looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 2ab7f127baaa..8fbcf6eb946a 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1002,6 +1002,48 @@ struct mon_read {
> int err;
> };
>
> +static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
> +{
> + return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
> + mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
> +}
> +
> +static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
> +{
> + int retry = 3;
> + u32 mbwu_l_low;
> + u64 mbwu_l_high1, mbwu_l_high2;
> +
> + mpam_mon_sel_lock_held(msc);
> +
> + WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> + WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> + mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> + do {
> + mbwu_l_high1 = mbwu_l_high2;
> + mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
> + mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +
> + retry--;
> + } while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
> +
> + if (mbwu_l_high1 == mbwu_l_high2)
> + return (mbwu_l_high1 << 32) | mbwu_l_low;
> + return MSMON___NRDY_L;
> +}
> +
> +static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
> +{
> + mpam_mon_sel_lock_held(msc);
> +
> + WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> + WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> + __mpam_write_reg(msc, MSMON_MBWU_L, 0);
> + __mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
> +}
> +
> static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> u32 *flt_val)
> {
> @@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> static void clean_msmon_ctl_val(u32 *cur_ctl)
> {
> *cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> + *cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
I observe that this bit is res0, in the CSU case, and so the clearing is ok.
> }
>
> static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> @@ -1080,7 +1123,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> case mpam_feat_msmon_mbwu:
> mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> - mpam_write_monsel_reg(msc, MBWU, 0);
> + if (mpam_ris_has_mbwu_long_counter(m->ris))
> + mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
> + else
> + mpam_write_monsel_reg(msc, MBWU, 0);
> +
> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>
> mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> @@ -1095,8 +1142,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>
> static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
> {
> - /* TODO: scaling, and long counters */
> - return GENMASK_ULL(30, 0);
> + /* TODO: implement scaling counters */
> + if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
> + return GENMASK_ULL(62, 0);
> + else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
> + return GENMASK_ULL(43, 0);
> + else
> + return GENMASK_ULL(30, 0);
> }
>
> /* Call with MSC lock held */
> @@ -1138,10 +1190,24 @@ static void __ris_msmon_read(void *arg)
> now = FIELD_GET(MSMON___VALUE, now);
> break;
> case mpam_feat_msmon_mbwu:
> - now = mpam_read_monsel_reg(msc, MBWU);
> - if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> - nrdy = now & MSMON___NRDY;
> - now = FIELD_GET(MSMON___VALUE, now);
> + /*
> + * If long or lwd counters are supported, use them, else revert
> + * to the 31 bit counter.
> + */
> + if (mpam_ris_has_mbwu_long_counter(ris)) {
> + now = mpam_msc_read_mbwu_l(msc);
> + if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> + nrdy = now & MSMON___NRDY_L;
> + if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
> + now = FIELD_GET(MSMON___LWD_VALUE, now);
> + else
> + now = FIELD_GET(MSMON___L_VALUE, now);
> + } else {
> + now = mpam_read_monsel_reg(msc, MBWU);
> + if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> + nrdy = now & MSMON___NRDY;
> + now = FIELD_GET(MSMON___VALUE, now);
> + }
>
> if (nrdy)
> break;
> @@ -1433,8 +1499,13 @@ static int mpam_save_mbwu_state(void *arg)
> cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
>
> - val = mpam_read_monsel_reg(msc, MBWU);
> - mpam_write_monsel_reg(msc, MBWU, 0);
> + if (mpam_ris_has_mbwu_long_counter(ris)) {
> + val = mpam_msc_read_mbwu_l(msc);
> + mpam_msc_zero_mbwu_l(msc);
> + } else {
> + val = mpam_read_monsel_reg(msc, MBWU);
> + mpam_write_monsel_reg(msc, MBWU, 0);
> + }
>
> cfg->mon = i;
> cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
--
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-08-29 16:56 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 16:56 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> The bitmap reset code has been a source of bugs. Add a unit test.
>
> This currently has to be built in, as the rest of the driver is
> builtin.
>
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/Kconfig | 13 ++++++
> drivers/resctrl/mpam_devices.c | 4 ++
> drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
> 3 files changed, 85 insertions(+)
> create mode 100644 drivers/resctrl/test_mpam_devices.c
>
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> index dff7b87280ab..f5e0609975e4 100644
> --- a/drivers/resctrl/Kconfig
> +++ b/drivers/resctrl/Kconfig
> @@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
> bool "MPAM driver for System IP, e,g. caches and memory controllers"
> depends on ARM64_MPAM && EXPERT
>
> +menu "ARM64 MPAM driver options"
> +
> config ARM64_MPAM_DRIVER_DEBUG
> bool "Enable debug messages from the MPAM driver."
> depends on ARM64_MPAM_DRIVER
> help
> Say yes here to enable debug messages from the MPAM driver.
> +
> +config MPAM_KUNIT_TEST
> + bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
> + depends on KUNIT=y
It depends on ARM64_MPAM_DRIVER as well.
> + default KUNIT_ALL_TESTS
> + help
> + Enable this option to run tests in the MPAM driver.
> +
> + If unsure, say N.
> +
> +endmenu
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 65c30ebfe001..4cf5aae88c53 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -2903,3 +2903,7 @@ static int __init mpam_msc_driver_init(void)
> }
> /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
> subsys_initcall(mpam_msc_driver_init);
> +
> +#ifdef CONFIG_MPAM_KUNIT_TEST
> +#include "test_mpam_devices.c"
> +#endif
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> new file mode 100644
> index 000000000000..8e9d6c88171c
> --- /dev/null
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -0,0 +1,68 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2024 Arm Ltd.
> +/* This file is intended to be included into mpam_devices.c */
> +
> +#include <kunit/test.h>
> +
> +static void test_mpam_reset_msc_bitmap(struct kunit *test)
> +{
> + char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
> + struct mpam_msc fake_msc;
> + u32 *test_result;
> +
> + if (!buf)
> + return;
> +
> + fake_msc.mapped_hwpage = buf;
> + fake_msc.mapped_hwpage_sz = SZ_16K;
> + cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
> +
> + mutex_init(&fake_msc.part_sel_lock);
> + mutex_lock(&fake_msc.part_sel_lock);
> +
> + test_result = (u32 *)(buf + MPAMCFG_CPBM);
> +
> + mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
> + KUNIT_EXPECT_EQ(test, test_result[0], 0);
> + KUNIT_EXPECT_EQ(test, test_result[1], 0);
> + test_result[0] = 0;
> + test_result[1] = 0;
> +
> + mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
> + KUNIT_EXPECT_EQ(test, test_result[0], 1);
> + KUNIT_EXPECT_EQ(test, test_result[1], 0);
> + test_result[0] = 0;
> + test_result[1] = 0;
> +
> + mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
> + KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
> + KUNIT_EXPECT_EQ(test, test_result[1], 0);
> + test_result[0] = 0;
> + test_result[1] = 0;
> +
> + mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
> + KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
> + KUNIT_EXPECT_EQ(test, test_result[1], 0);
> + test_result[0] = 0;
> + test_result[1] = 0;
> +
> + mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
> + KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
> + KUNIT_EXPECT_EQ(test, test_result[1], 1);
> + test_result[0] = 0;
> + test_result[1] = 0;
> +
> + mutex_unlock(&fake_msc.part_sel_lock);
> +}
> +
> +static struct kunit_case mpam_devices_test_cases[] = {
> + KUNIT_CASE(test_mpam_reset_msc_bitmap),
> + {}
> +};
> +
> +static struct kunit_suite mpam_devices_test_suite = {
> + .name = "mpam_devices_test_suite",
> + .test_cases = mpam_devices_test_cases,
> +};
> +
> +kunit_test_suites(&mpam_devices_test_suite);
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-08-29 17:11 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-08-29 17:11 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
The tests seem reasonable. Just some comments on the comments.
On 8/22/25 16:30, James Morse wrote:
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
>
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_internal.h | 8 +-
> drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
> 2 files changed, 329 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index bbf0306abc82..6e973be095f8 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -18,6 +18,12 @@
>
> DECLARE_STATIC_KEY_FALSE(mpam_enabled);
>
> +#ifdef CONFIG_MPAM_KUNIT_TEST
> +#define PACKED_FOR_KUNIT __packed
> +#else
> +#define PACKED_FOR_KUNIT
> +#endif
> +
> static inline bool mpam_is_enabled(void)
> {
> return static_branch_likely(&mpam_enabled);
> @@ -209,7 +215,7 @@ struct mpam_props {
> u16 dspri_wd;
> u16 num_csu_mon;
> u16 num_mbwu_mon;
> -};
> +} PACKED_FOR_KUNIT;
>
> #define mpam_has_feature(_feat, x) ((1 << (_feat)) & (x)->features)
>
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> index 8e9d6c88171c..ef39696e7ff8 100644
> --- a/drivers/resctrl/test_mpam_devices.c
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -4,6 +4,326 @@
>
> #include <kunit/test.h>
>
> +/*
> + * This test catches fields that aren't being sanitised - but can't tell you
> + * which one...
> + */
> +static void test__props_mismatch(struct kunit *test)
> +{
> + struct mpam_props parent = { 0 };
> + struct mpam_props child;
> +
> + memset(&child, 0xff, sizeof(child));
> + __props_mismatch(&parent, &child, false);
> +
> + memset(&child, 0, sizeof(child));
> + KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +
> + memset(&child, 0xff, sizeof(child));
> + __props_mismatch(&parent, &child, true);
> +
> + KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +}
> +
> +static void test_mpam_enable_merge_features(struct kunit *test)
> +{
> + /* o/` How deep is your stack? o/` */
> + struct list_head fake_classes_list;
> + struct mpam_class fake_class = { 0 };
> + struct mpam_component fake_comp1 = { 0 };
> + struct mpam_component fake_comp2 = { 0 };
> + struct mpam_vmsc fake_vmsc1 = { 0 };
> + struct mpam_vmsc fake_vmsc2 = { 0 };
> + struct mpam_msc fake_msc1 = { 0 };
> + struct mpam_msc fake_msc2 = { 0 };
> + struct mpam_msc_ris fake_ris1 = { 0 };
> + struct mpam_msc_ris fake_ris2 = { 0 };
> + struct platform_device fake_pdev = { 0 };
> +
> +#define RESET_FAKE_HIEARCHY() do { \
> + INIT_LIST_HEAD(&fake_classes_list); \
> + \
> + memset(&fake_class, 0, sizeof(fake_class)); \
> + fake_class.level = 3; \
> + fake_class.type = MPAM_CLASS_CACHE; \
> + INIT_LIST_HEAD_RCU(&fake_class.components); \
> + INIT_LIST_HEAD(&fake_class.classes_list); \
> + \
> + memset(&fake_comp1, 0, sizeof(fake_comp1)); \
> + memset(&fake_comp2, 0, sizeof(fake_comp2)); \
> + fake_comp1.comp_id = 1; \
> + fake_comp2.comp_id = 2; \
> + INIT_LIST_HEAD(&fake_comp1.vmsc); \
> + INIT_LIST_HEAD(&fake_comp1.class_list); \
> + INIT_LIST_HEAD(&fake_comp2.vmsc); \
> + INIT_LIST_HEAD(&fake_comp2.class_list); \
> + \
> + memset(&fake_vmsc1, 0, sizeof(fake_vmsc1)); \
> + memset(&fake_vmsc2, 0, sizeof(fake_vmsc2)); \
> + INIT_LIST_HEAD(&fake_vmsc1.ris); \
> + INIT_LIST_HEAD(&fake_vmsc1.comp_list); \
> + fake_vmsc1.msc = &fake_msc1; \
> + INIT_LIST_HEAD(&fake_vmsc2.ris); \
> + INIT_LIST_HEAD(&fake_vmsc2.comp_list); \
> + fake_vmsc2.msc = &fake_msc2; \
> + \
> + memset(&fake_ris1, 0, sizeof(fake_ris1)); \
> + memset(&fake_ris2, 0, sizeof(fake_ris2)); \
> + fake_ris1.ris_idx = 1; \
> + INIT_LIST_HEAD(&fake_ris1.msc_list); \
> + fake_ris2.ris_idx = 2; \
> + INIT_LIST_HEAD(&fake_ris2.msc_list); \
> + \
> + fake_msc1.pdev = &fake_pdev; \
> + fake_msc2.pdev = &fake_pdev; \
> + \
> + list_add(&fake_class.classes_list, &fake_classes_list); \
> +} while (0)
> +
> + RESET_FAKE_HIEARCHY();
> +
> + mutex_lock(&mpam_list_lock);
> +
> + /* One Class+Comp, two RIS in one vMSC with common features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = NULL;
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc1;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cpbm_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two RIS in one vMSC with non-overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = NULL;
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc1;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cmax_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /* Multiple RIS within one MSC controlling the same resource can be mismatched */
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> + KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cpbm_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with non-overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cmax_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple RIS in different MSC can't the same resource, mismatched
s/can't the same/can't control the same/
> + * features can not be supported.
> + */
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with incompatible overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 5;
> + fake_ris2.props.cpbm_wd = 3;
> + fake_ris1.props.mbw_pbm_bits = 5;
> + fake_ris2.props.mbw_pbm_bits = 3;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple RIS in different MSC can't the same resource, mismatched
> + * features can not be supported.
> + */
Missing the word "control" again.
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> + KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with overlapping features that need tweaking */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
> + mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
> + fake_ris1.props.bwa_wd = 5;
> + fake_ris2.props.bwa_wd = 3;
> + fake_ris1.props.cmax_wd = 5;
> + fake_ris2.props.cmax_wd = 3;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple RIS in different MSC can't the same resource, mismatched
> + * features can not be supported.
> + */
Comment is for a different case.
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class Two Comp with overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = &fake_class;
> + list_add(&fake_comp2.class_list, &fake_class.components);
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp2;
> + list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cpbm_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class Two Comp with non-overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = &fake_class;
> + list_add(&fake_comp2.class_list, &fake_class.components);
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp2;
> + list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cmax_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple components can't control the same resource, mismatched features can
> + * not be supported.
> + */
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> + mutex_unlock(&mpam_list_lock);
> +
> +#undef RESET_FAKE_HIEARCHY
> +}
> +
> static void test_mpam_reset_msc_bitmap(struct kunit *test)
> {
> char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
> @@ -57,6 +377,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
>
> static struct kunit_case mpam_devices_test_cases[] = {
> KUNIT_CASE(test_mpam_reset_msc_bitmap),
> + KUNIT_CASE(test_mpam_enable_merge_features),
> + KUNIT_CASE(test__props_mismatch),
> {}
> };
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
` (3 preceding siblings ...)
2025-08-27 15:39 ` Rob Herring
@ 2025-09-01 9:11 ` Ben Horgan
2025-09-01 11:21 ` Dave Martin
5 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-09-01 9:11 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
>
> Start with driver probe/remove and mapping the MSC.
>
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Check for status=broken DT devices.
> * Moved all the files around.
> * Made Kconfig symbols depend on EXPERT
> ---
> arch/arm64/Kconfig | 1 +
> drivers/Kconfig | 2 +
> drivers/Makefile | 1 +
> drivers/resctrl/Kconfig | 11 ++
> drivers/resctrl/Makefile | 4 +
> drivers/resctrl/mpam_devices.c | 336 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 62 ++++++
> 7 files changed, 417 insertions(+)
> create mode 100644 drivers/resctrl/Kconfig
> create mode 100644 drivers/resctrl/Makefile
> create mode 100644 drivers/resctrl/mpam_devices.c
> create mode 100644 drivers/resctrl/mpam_internal.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>
> config ARM64_MPAM
> bool "Enable support for MPAM"
> + select ARM64_MPAM_DRIVER
> select ACPI_MPAM if ACPI
> help
> Memory Partitioning and Monitoring is an optional extension
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>
> source "drivers/cdx/Kconfig"
>
> +source "drivers/resctrl/Kconfig"
> +
> endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE) += hte/
> obj-$(CONFIG_DRM_ACCEL) += accel/
> obj-$(CONFIG_CDX_BUS) += cdx/
> obj-$(CONFIG_DPLL) += dpll/
> +obj-y += resctrl/
>
> obj-$(CONFIG_S390) += s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> + bool "MPAM driver for System IP, e,g. caches and memory controllers"
> + depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> + bool "Enable debug messages from the MPAM driver."
> + depends on ARM64_MPAM_DRIVER
> + help
> + Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
> +mpam-y += mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
> +static u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void)
> +{
> + pr_err("Discovered all MSC\n");
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> + int count = 0;
> + struct device_node *np;
> +
> + for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> + if (of_device_is_available(np))
> + count++;
> + }
> +
> + return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> + u32 ris_idx)
> +{
> + int err = 0;
> + u32 level = 0;
> + unsigned long cache_id;
> + struct device_node *cache;
> +
> + do {
> + if (of_device_is_compatible(np, "arm,mpam-cache")) {
> + cache = of_parse_phandle(np, "arm,mpam-device", 0);
> + if (!cache) {
> + pr_err("Failed to read phandle\n");
> + break;
> + }
> + } else if (of_device_is_compatible(np->parent, "cache")) {
> + cache = of_node_get(np->parent);
> + } else {
> + /* For now, only caches are supported */
> + cache = NULL;
> + break;
> + }
> +
> + err = of_property_read_u32(cache, "cache-level", &level);
> + if (err) {
> + pr_err("Failed to read cache-level\n");
> + break;
> + }
> +
> + cache_id = cache_of_calculate_id(cache);
> + if (cache_id == ~0UL) {
> + err = -ENOENT;
> + break;
> + }
> +
> + err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> + cache_id);
> + } while (0);
> + of_node_put(cache);
> +
> + return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> + int err, num_ris = 0;
> + const u32 *ris_idx_p;
> + struct device_node *iter, *np;
> +
> + np = msc->pdev->dev.of_node;
> + for_each_child_of_node(np, iter) {
> + ris_idx_p = of_get_property(iter, "reg", NULL);
> + if (ris_idx_p) {
> + num_ris++;
> + err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> + if (err) {
> + of_node_put(iter);
> + return err;
> + }
> + }
> + }
> +
> + if (!num_ris)
> + mpam_dt_parse_resource(msc, np, 0);
> +
> + return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> + struct device_node *parent;
> + u32 affinity_id;
> + int err;
> +
> + if (!acpi_disabled) {
> + err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> + &affinity_id);
> + if (err)
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + else
> + acpi_pptt_get_cpus_from_container(affinity_id,
> + &msc->accessibility);
> +
> + return 0;
> + }
> +
> + /* This depends on the path to of_node */
> + parent = of_get_parent(msc->pdev->dev.of_node);
> + if (parent == of_root) {
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + err = 0;
> + } else {
> + err = -EINVAL;
> + pr_err("Cannot determine accessibility of MSC: %s\n",
> + dev_name(&msc->pdev->dev));
> + }
> + of_node_put(parent);
> +
> + return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> + /* TODO: wake up tasks blocked on this MSC's PCC channel */
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> + struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> + if (!msc)
> + return;
> +
> + mutex_lock(&mpam_list_lock);
> + mpam_num_msc--;
> + platform_set_drvdata(pdev, NULL);
> + list_del_rcu(&msc->glbl_list);
> + synchronize_srcu(&mpam_srcu);
> + devm_kfree(&pdev->dev, msc);
> + mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> + int err;
> + struct mpam_msc *msc;
> + struct resource *msc_res;
> + void *plat_data = pdev->dev.platform_data;
> +
> + mutex_lock(&mpam_list_lock);
> + do {
> + msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> + if (!msc) {
> + err = -ENOMEM;
> + break;
> + }
> +
> + mutex_init(&msc->probe_lock);
> + mutex_init(&msc->part_sel_lock);
> + mutex_init(&msc->outer_mon_sel_lock);
> + raw_spin_lock_init(&msc->inner_mon_sel_lock);
> + msc->id = mpam_num_msc++;
> + msc->pdev = pdev;
> + INIT_LIST_HEAD_RCU(&msc->glbl_list);
> + INIT_LIST_HEAD_RCU(&msc->ris);
> +
> + err = update_msc_accessibility(msc);
> + if (err)
> + break;
> + if (cpumask_empty(&msc->accessibility)) {
> + pr_err_once("msc:%u is not accessible from any CPU!",
> + msc->id);
> + err = -EINVAL;
> + break;
> + }
> +
> + if (device_property_read_u32(&pdev->dev, "pcc-channel",
> + &msc->pcc_subspace_id))
> + msc->iface = MPAM_IFACE_MMIO;
> + else
> + msc->iface = MPAM_IFACE_PCC;
> +
> + if (msc->iface == MPAM_IFACE_MMIO) {
> + void __iomem *io;
> +
> + io = devm_platform_get_and_ioremap_resource(pdev, 0,
> + &msc_res);
> + if (IS_ERR(io)) {
> + pr_err("Failed to map MSC base address\n");
> + err = PTR_ERR(io);
> + break;
> + }
> + msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> + msc->mapped_hwpage = io;
> + } else if (msc->iface == MPAM_IFACE_PCC) {
> + msc->pcc_cl.dev = &pdev->dev;
> + msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> + msc->pcc_cl.tx_block = false;
> + msc->pcc_cl.tx_tout = 1000; /* 1s */
> + msc->pcc_cl.knows_txdone = false;
> +
> + msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> + msc->pcc_subspace_id);
> + if (IS_ERR(msc->pcc_chan)) {
> + pr_err("Failed to request MSC PCC channel\n");
> + err = PTR_ERR(msc->pcc_chan);
> + break;
> + }
> + }
> +
> + list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> + platform_set_drvdata(pdev, msc);
> + } while (0);
> + mutex_unlock(&mpam_list_lock);
> +
> + if (!err) {
> + /* Create RIS entries described by firmware */
> + if (!acpi_disabled)
> + err = acpi_mpam_parse_resources(msc, plat_data);
> + else
> + err = mpam_dt_parse_resources(msc, plat_data);
> + }
> +
> + if (!err && fw_num_msc == mpam_num_msc)
> + mpam_discovery_complete();
> +
> + if (err && msc)
> + mpam_msc_drv_remove(pdev);
> +
> + return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> + { .compatible = "arm,mpam-msc", },
> + {},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> + .driver = {
> + .name = "mpam_msc",
> + .of_match_table = of_match_ptr(mpam_of_match),
> + },
> + .probe = mpam_msc_drv_probe,
> + .remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> + int err;
> + struct device_node *cache;
> +
> + for_each_compatible_node(cache, NULL, "cache") {
> + err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> + if (err)
> + pr_err("Failed to create MSC devices under caches\n");
> + }
> +}
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> + if (!system_supports_mpam())
> + return -EOPNOTSUPP;
> +
> + init_srcu_struct(&mpam_srcu);
> +
> + if (!acpi_disabled)
> + fw_num_msc = acpi_mpam_count_msc();
> + else
> + fw_num_msc = mpam_dt_count_msc();
> +
> + if (fw_num_msc <= 0) {
> + pr_err("No MSC devices found in firmware\n");
> + return -EINVAL;
> + }
> +
> + if (acpi_disabled)
> + mpam_dt_create_foundling_msc();
> +
> + return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> + /* member of mpam_all_msc */
> + struct list_head glbl_list;
> +
> + int id;
> + struct platform_device *pdev;
> +
> + /* Not modified after mpam_is_enabled() becomes true */
> + enum mpam_msc_iface iface;
> + u32 pcc_subspace_id;
> + struct mbox_client pcc_cl;
> + struct pcc_mbox_chan *pcc_chan;
> + u32 nrdy_usec;
> + cpumask_t accessibility;
> +
> + /*
> + * probe_lock is only take during discovery. After discovery these
> + * properties become read-only and the lists are protected by SRCU.
> + */
> + struct mutex probe_lock;
> + unsigned long ris_idxs[128 / BITS_PER_LONG];
Why is this sized this way? RIS_MAX is 4 bits and so there are at most
16 RIS per msc.
> + u32 ris_max;
> +
> + /* mpam_msc_ris of this component */
> + struct list_head ris;
> +
> + /*
> + * part_sel_lock protects access to the MSC hardware registers that are
> + * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> + * by RIS).
> + * If needed, take msc->lock first.
> + */
> + struct mutex part_sel_lock;
> +
> + /*
> + * mon_sel_lock protects access to the MSC hardware registers that are
> + * affeted by MPAMCFG_MON_SEL.
> + * If needed, take msc->lock first.
> + */
> + struct mutex outer_mon_sel_lock;
> + raw_spinlock_t inner_mon_sel_lock;
> + unsigned long inner_mon_sel_flags;
> +
> + void __iomem *mapped_hwpage;
> + size_t mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 22/33] arm_mpam: Register and enable IRQs
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-09-01 10:05 ` Ben Horgan
0 siblings, 0 replies; 130+ messages in thread
From: Ben Horgan @ 2025-09-01 10:05 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
>
> Only the irq handler accesses the ESR register, so no locking is needed.
> The work to disable MPAM after an error needs to happen at process
> context, use a threaded interrupt.
>
> There is no support for percpu threaded interrupts, for now schedule
> the work to be done from the irq handler.
>
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
>
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
>
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Use guard marco when walking srcu list.
> * Use INTEN macro for enabling interrupts.
> * Move partid_max_published up earlier in mpam_enable_once().
> ---
> drivers/resctrl/mpam_devices.c | 311 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 9 +-
> 2 files changed, 312 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3516cbe8623e..210d64fad0b1 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -14,6 +14,9 @@
> #include <linux/device.h>
> #include <linux/errno.h>
> #include <linux/gfp.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
> #include <linux/list.h>
> #include <linux/lockdep.h>
> #include <linux/mutex.h>
> @@ -62,6 +65,12 @@ static DEFINE_SPINLOCK(partid_max_lock);
> */
> static DECLARE_WORK(mpam_enable_work, &mpam_enable);
>
> +/*
> + * All mpam error interrupts indicate a software bug. On receipt, disable the
> + * driver.
> + */
> +static DECLARE_WORK(mpam_broken_work, &mpam_disable);
> +
> /*
> * An MSC is a physical container for controls and monitors, each identified by
> * their RIS index. These share a base-address, interrupts and some MMIO
> @@ -159,6 +168,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
> return (idr_high << 32) | idr_low;
> }
>
> +static void mpam_msc_zero_esr(struct mpam_msc *msc)
> +{
> + __mpam_write_reg(msc, MPAMF_ESR, 0);
> + if (msc->has_extd_esr)
> + __mpam_write_reg(msc, MPAMF_ESR + 4, 0);
> +}
> +
> +static u64 mpam_msc_read_esr(struct mpam_msc *msc)
> +{
> + u64 esr_high = 0, esr_low;
> +
> + esr_low = __mpam_read_reg(msc, MPAMF_ESR);
> + if (msc->has_extd_esr)
> + esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
> +
> + return (esr_high << 32) | esr_low;
> +}
> +
> static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
> {
> lockdep_assert_held(&msc->part_sel_lock);
> @@ -405,12 +432,12 @@ static void mpam_msc_destroy(struct mpam_msc *msc)
>
> lockdep_assert_held(&mpam_list_lock);
>
> - list_del_rcu(&msc->glbl_list);
> - platform_set_drvdata(pdev, NULL);
> -
> list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> mpam_ris_destroy(ris);
>
> + list_del_rcu(&msc->glbl_list);
> + platform_set_drvdata(pdev, NULL);
> +
Reordering can be done when introduced.
> add_to_garbage(msc);
> msc->garbage.pdev = pdev;
> }
> @@ -828,6 +855,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> msc->partid_max = min(msc->partid_max, partid_max);
> msc->pmg_max = min(msc->pmg_max, pmg_max);
> + msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
>
> ris = mpam_get_or_create_ris(msc, ris_idx);
> if (IS_ERR(ris))
> @@ -840,6 +868,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> mutex_unlock(&msc->part_sel_lock);
> }
>
> + /* Clear any stale errors */
> + mpam_msc_zero_esr(msc);
> +
> spin_lock(&partid_max_lock);
> mpam_partid_max = min(mpam_partid_max, msc->partid_max);
> mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> @@ -973,6 +1004,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> mpam_mon_sel_outer_unlock(msc);
> }
>
> +static void _enable_percpu_irq(void *_irq)
> +{
> + int *irq = _irq;
> +
> + enable_percpu_irq(*irq, IRQ_TYPE_NONE);
> +}
> +
> static int mpam_cpu_online(unsigned int cpu)
> {
> int idx;
> @@ -983,6 +1021,9 @@ static int mpam_cpu_online(unsigned int cpu)
> if (!cpumask_test_cpu(cpu, &msc->accessibility))
> continue;
>
> + if (msc->reenable_error_ppi)
> + _enable_percpu_irq(&msc->reenable_error_ppi);
> +
> if (atomic_fetch_inc(&msc->online_refs) == 0)
> mpam_reset_msc(msc, true);
> }
> @@ -1031,6 +1072,9 @@ static int mpam_cpu_offline(unsigned int cpu)
> if (!cpumask_test_cpu(cpu, &msc->accessibility))
> continue;
>
> + if (msc->reenable_error_ppi)
> + disable_percpu_irq(msc->reenable_error_ppi);
> +
> if (atomic_dec_and_test(&msc->online_refs))
> mpam_reset_msc(msc, false);
> }
> @@ -1057,6 +1101,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
> mutex_unlock(&mpam_cpuhp_state_lock);
> }
>
> +static int __setup_ppi(struct mpam_msc *msc)
> +{
> + int cpu;
> +
> + msc->error_dev_id = alloc_percpu_gfp(struct mpam_msc *, GFP_KERNEL);
Simpler to use alloc_percpu().
> + if (!msc->error_dev_id)
> + return -ENOMEM;
> +
> + for_each_cpu(cpu, &msc->accessibility) {
> + struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
> +
> + if (empty) {
> + pr_err_once("%s shares PPI with %s!\n",
> + dev_name(&msc->pdev->dev),
> + dev_name(&empty->pdev->dev));
> + return -EBUSY;
> + }
> + *per_cpu_ptr(msc->error_dev_id, cpu) = msc;
> + }
> +
> + return 0;
> +}
> +
> +static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
> +{
> + int irq;
> +
> + irq = platform_get_irq_byname_optional(msc->pdev, "error");
> + if (irq <= 0)
> + return 0;
> +
> + /* Allocate and initialise the percpu device pointer for PPI */
> + if (irq_is_percpu(irq))
> + return __setup_ppi(msc);
> +
> + /* sanity check: shared interrupts can be routed anywhere? */
> + if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
> + pr_err_once("msc:%u is a private resource with a shared error interrupt",
> + msc->id);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> static int mpam_dt_count_msc(void)
> {
> int count = 0;
> @@ -1265,6 +1354,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
> break;
> }
>
> + err = mpam_msc_setup_error_irq(msc);
> + if (err)
> + break;
> +
> if (device_property_read_u32(&pdev->dev, "pcc-channel",
> &msc->pcc_subspace_id))
> msc->iface = MPAM_IFACE_MMIO;
> @@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
> }
> }
>
> +static char *mpam_errcode_names[16] = {
> + [0] = "No error",
> + [1] = "PARTID_SEL_Range",
> + [2] = "Req_PARTID_Range",
> + [3] = "MSMONCFG_ID_RANGE",
> + [4] = "Req_PMG_Range",
> + [5] = "Monitor_Range",
> + [6] = "intPARTID_Range",
> + [7] = "Unexpected_INTERNAL",
> + [8] = "Undefined_RIS_PART_SEL",
> + [9] = "RIS_No_Control",
> + [10] = "Undefined_RIS_MON_SEL",
> + [11] = "RIS_No_Monitor",
> + [12 ... 15] = "Reserved"
> +};
These names match the spec.
> +
> +static int mpam_enable_msc_ecr(void *_msc)
> +{
> + struct mpam_msc *msc = _msc;
> +
> + __mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
> +
> + return 0;
> +}
> +
> +static int mpam_disable_msc_ecr(void *_msc)
> +{
> + struct mpam_msc *msc = _msc;
> +
> + __mpam_write_reg(msc, MPAMF_ECR, 0);
> +
> + return 0;
> +}
> +
> +static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
> +{
> + u64 reg;
> + u16 partid;
> + u8 errcode, pmg, ris;
> +
> + if (WARN_ON_ONCE(!msc) ||
> + WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> + &msc->accessibility)))
> + return IRQ_NONE;
> +
> + reg = mpam_msc_read_esr(msc);
> +
> + errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
> + if (!errcode)
> + return IRQ_NONE;
> +
> + /* Clear level triggered irq */
> + mpam_msc_zero_esr(msc);
> +
> + partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
> + pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
> + ris = FIELD_GET(MPAMF_ESR_RIS, reg);
> +
> + pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
> + msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
> +
> + if (irq_is_percpu(irq)) {
> + mpam_disable_msc_ecr(msc);
> + schedule_work(&mpam_broken_work);
> + return IRQ_HANDLED;
> + }
> +
> + return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
> +{
> + struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
> +
> + return __mpam_irq_handler(irq, msc);
> +}
> +
> +static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
> +{
> + struct mpam_msc *msc = dev_id;
> +
> + return __mpam_irq_handler(irq, msc);
> +}
> +
> +static irqreturn_t mpam_disable_thread(int irq, void *dev_id);
> +
> +static int mpam_register_irqs(void)
> +{
> + int err, irq;
> + struct mpam_msc *msc;
> +
> + lockdep_assert_cpus_held();
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> + irq = platform_get_irq_byname_optional(msc->pdev, "error");
> + if (irq <= 0)
> + continue;
> +
> + /* The MPAM spec says the interrupt can be SPI, PPI or LPI */
> + /* We anticipate sharing the interrupt with other MSCs */
> + if (irq_is_percpu(irq)) {
> + err = request_percpu_irq(irq, &mpam_ppi_handler,
> + "mpam:msc:error",
> + msc->error_dev_id);
> + if (err)
> + return err;
> +
> + msc->reenable_error_ppi = irq;
> + smp_call_function_many(&msc->accessibility,
> + &_enable_percpu_irq, &irq,
> + true);
> + } else {
> + err = devm_request_threaded_irq(&msc->pdev->dev, irq,
> + &mpam_spi_handler,
> + &mpam_disable_thread,
> + IRQF_SHARED,
> + "mpam:msc:error", msc);
> + if (err)
> + return err;
> + }
> +
> + msc->error_irq_requested = true;
> + mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
> + msc->error_irq_hw_enabled = true;
> + }
> +
> + return 0;
> +}
> +
> +static void mpam_unregister_irqs(void)
> +{
> + int irq, idx;
> + struct mpam_msc *msc;
> +
> + cpus_read_lock();
> + /* take the lock as free_irq() can sleep */
> + idx = srcu_read_lock(&mpam_srcu);
> + list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> + irq = platform_get_irq_byname_optional(msc->pdev, "error");
> + if (irq <= 0)
> + continue;
> +
> + if (msc->error_irq_hw_enabled) {
> + mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
> + msc->error_irq_hw_enabled = false;
> + }
> +
> + if (msc->error_irq_requested) {
> + if (irq_is_percpu(irq)) {
> + msc->reenable_error_ppi = 0;
> + free_percpu_irq(irq, msc->error_dev_id);
> + } else {
> + devm_free_irq(&msc->pdev->dev, irq, msc);
> + }
> + msc->error_irq_requested = false;
> + }
> + }
> + srcu_read_unlock(&mpam_srcu, idx);
> + cpus_read_unlock();
> +}
> +
> static void mpam_enable_once(void)
> {
> - mutex_lock(&mpam_list_lock);
> - mpam_enable_merge_features(&mpam_classes);
> - mutex_unlock(&mpam_list_lock);
> + int err;
>
> /*
> * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> @@ -1561,6 +1814,27 @@ static void mpam_enable_once(void)
> partid_max_published = true;
> spin_unlock(&partid_max_lock);
>
> + /*
> + * If all the MSC have been probed, enabling the IRQs happens next.
> + * That involves cross-calling to a CPU that can reach the MSC, and
> + * the locks must be taken in this order:
> + */
> + cpus_read_lock();
> + mutex_lock(&mpam_list_lock);
> + mpam_enable_merge_features(&mpam_classes);
> +
> + err = mpam_register_irqs();
> + if (err)
> + pr_warn("Failed to register irqs: %d\n", err);
> +
> + mutex_unlock(&mpam_list_lock);
> + cpus_read_unlock();
> +
> + if (err) {
> + schedule_work(&mpam_broken_work);
> + return;
> + }
> +
> mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>
> printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
> @@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
> * All of MPAMs errors indicate a software bug, restore any modified
> * controls to their reset values.
> */
> -void mpam_disable(void)
> +static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
> {
> int idx;
> struct mpam_class *class;
> + struct mpam_msc *msc, *tmp;
> +
> + mutex_lock(&mpam_cpuhp_state_lock);
> + if (mpam_cpuhp_state) {
> + cpuhp_remove_state(mpam_cpuhp_state);
> + mpam_cpuhp_state = 0;
> + }
> + mutex_unlock(&mpam_cpuhp_state_lock);
> +
> + mpam_unregister_irqs();
>
> idx = srcu_read_lock(&mpam_srcu);
> list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> srcu_read_lock_held(&mpam_srcu))
> mpam_reset_class(class);
> srcu_read_unlock(&mpam_srcu, idx);
> +
> + mutex_lock(&mpam_list_lock);
> + list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
> + mpam_msc_destroy(msc);
> + mutex_unlock(&mpam_list_lock);
> + mpam_free_garbage();
> +
> + return IRQ_HANDLED;
> +}
> +
> +void mpam_disable(struct work_struct *ignored)
> +{
> + mpam_disable_thread(0, NULL);
> }
>
> /*
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index b30fee2b7674..c9418c9cf9f2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -44,6 +44,11 @@ struct mpam_msc {
> struct pcc_mbox_chan *pcc_chan;
> u32 nrdy_usec;
> cpumask_t accessibility;
> + bool has_extd_esr;
> +
> + int reenable_error_ppi;
> + struct mpam_msc * __percpu *error_dev_id;
> +
> atomic_t online_refs;
>
> /*
> @@ -52,6 +57,8 @@ struct mpam_msc {
> */
> struct mutex probe_lock;
> bool probed;
> + bool error_irq_requested;
> + bool error_irq_hw_enabled;
> u16 partid_max;
> u8 pmg_max;
> unsigned long ris_idxs[128 / BITS_PER_LONG];
> @@ -281,7 +288,7 @@ extern u8 mpam_pmg_max;
>
> /* Scheduled work callback to enable mpam once all MSC have been probed */
> void mpam_enable(struct work_struct *work);
> -void mpam_disable(void);
> +void mpam_disable(struct work_struct *work);
>
> int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> cpumask_t *affinity);
Thanks,
Ben
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
2025-08-28 1:29 ` Fenghua Yu
@ 2025-09-01 11:09 ` Dave Martin
1 sibling, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-09-01 11:09 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
Hi,
> Subject: arm_mpam: Add the class and component structures for ris firmware described
Mangled subject line?
There is a fair intersection between the commit message and what the
patch does, but they don't quite seem to match up.
Some key issues like locking / object lifecycle management
and DT parsing (a bit of which, it appears, lives here too) are not
mentioned at all.
In lieu of a complete rewrite, it might be best to discard the
explanation of the various object types. The comment in the code
speaks for itself, and looks clearer.
[...]
On Fri, Aug 22, 2025 at 03:29:53PM +0000, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
>
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
>
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented
> as different RIS. Re-combining these RIS allows their feature bits to
> be or-ed. This structure is not visible outside mpam_devices.c
>
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need
> to be configured the same as the hardware is not able to distribute the
> configuration.
>
> Add support for creating and destroying these structures.
> A gfp is passed as the structures may need creating when a new RIS entry
> is discovered when probing the MSC.
>
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * removed a pr_err() debug message that crept in.
> ---
> drivers/resctrl/mpam_devices.c | 488 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 91 ++++++
> include/linux/arm_mpam.h | 8 +-
> 3 files changed, 574 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -20,7 +20,6 @@
[...]
> @@ -35,11 +34,483 @@
> static DEFINE_MUTEX(mpam_list_lock);
> static LIST_HEAD(mpam_all_msc);
>
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
Why expose this here? This patch makes no use of the exposed symbol.
>
> /* MPAM isn't available until all the MSC have been probed. */
> static u32 mpam_num_msc;
>
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
This description of the structures and how they relate to each other
seems OK (bearing in mind that I am already familiar with this stuff --
I can't speak for other people).
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
[...]
> +#define add_to_garbage(x) \
> +do { \
> + __typeof__(x) _x = x; \
Nit:
= (x)
(for the paranoid)
> + (_x)->garbage.to_free = (_x); \
_x->garbage.to_free = _x;
(_x is an identifier, not a macro argument. It can't get re-parsed as
something else -- assuming that there is not a #define for _x, but then
all bets would be off anyway.)
> + llist_add(&(_x)->garbage.llist, &mpam_garbage); \
&_x->...
[...]
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> + struct mpam_vmsc *vmsc = ris->vmsc;
> + struct mpam_msc *msc = vmsc->msc;
> + struct platform_device *pdev = msc->pdev;
> + struct mpam_component *comp = vmsc->comp;
> + struct mpam_class *class = comp->class;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> + cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
This is not the inverse of the cpumask_or()s in mpam_ris_create_locked(),
unless the the ris associated with each class and each component have
strictly disjoint affinity masks. Is that checked anywhere, or should
it be impossible by construction?
But, thinking about it:
I wonder why we ever really need to do the teardown. If we get an
error interrupt then we can just go into a sulk, spam dmesg a bit, put
the hardware into the most vanilla state that we can, and refuse to
manipulate it further. But this only happens in the case of a software
or hardware *bug* (or, in a future world where we might implement
virtualisation, an uncontainable MPAM error triggered by a guest -- for
which tearing down the host MPAM would be an overreaction).
Trying to cleanly tear the MPAM driver down after such an error seems a
bit futile.
The MPAM resctrl glue could eventually be made into a module (though
not needed from day 1) -- which would allow for unloading resctrlfs if
that is eventually module-ised. I think this wouldn't require the MPAM
devices backend to be torn down at any point, though (?)
If we can simplify or eliminate the teardown, does it simplify the
locking at all? The garbage collection logic can also be dispensed
with if there is never any garbage.
Since MSCs etc. never disappear from the hardware, it feels like it
ought not to be necessary ever to remove items from any of these lists
except when trying to do a teardown (?)
(Putting the hardware into a quiecent state is not the same thing as
tearing down the data structures -- we do want to quiesce MPAM when
shutting down the kernel, as least for the kexec scenario.)
> + clear_bit(ris->ris_idx, msc->ris_idxs);
> + list_del_rcu(&ris->vmsc_list);
> + list_del_rcu(&ris->msc_list);
> + add_to_garbage(ris);
> + ris->garbage.pdev = pdev;
> +
> + if (list_empty(&vmsc->ris))
> + mpam_vmsc_destroy(vmsc);
> +}
[...]
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> + enum mpam_class_types type, u8 class_id,
> + int component_id, gfp_t gfp)
> +{
> + int err;
> + struct mpam_vmsc *vmsc;
> + struct mpam_msc_ris *ris;
> + struct mpam_class *class;
> + struct mpam_component *comp;
> +
> + lockdep_assert_held(&mpam_list_lock);
> +
> + if (test_and_set_bit(ris_idx, msc->ris_idxs))
> + return -EBUSY;
Is it impossible by construction to get in here with an out-of-range
ris_idx?
To avoid the callers (i.e., ACPI) needing to understand the internal
limitations of this code, maybe it is worth having a check here (even
if technically redundant).
[...]
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
` (4 preceding siblings ...)
2025-09-01 9:11 ` Ben Horgan
@ 2025-09-01 11:21 ` Dave Martin
5 siblings, 0 replies; 130+ messages in thread
From: Dave Martin @ 2025-09-01 11:21 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:51PM +0000, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
>
> Start with driver probe/remove and mapping the MSC.
>
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
> * Check for status=broken DT devices.
> * Moved all the files around.
> * Made Kconfig symbols depend on EXPERT
> ---
[...]
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
Drop confusing comment?
CPUs are not mentioned other than in the comment -- I think the
descriptions are sufficiently self-explanatory that they don't read
onto CPUs.
> +config ARM64_MPAM_DRIVER
> + bool "MPAM driver for System IP, e,g. caches and memory controllers"
> + depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> + bool "Enable debug messages from the MPAM driver."
Nit: spurious full stop.
(i.e., people don't add one in these one-line descriptions.
They are title-like and self-delimiting, even when the text is a valid
sentence.)
> + depends on ARM64_MPAM_DRIVER
> + help
> + Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
> +mpam-y += mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
Comment doesn't really explain the variable.
Maybe something like "Number of MSCs that need to be probed for MPAM
to be usable" ?
> +static u32 mpam_num_msc;
Any particular reason this is u32 and not unsigned int?
How are accesses to this protected against data races?
If there are supposed to be locks to protect globals in the MPAM driver,
is it worth wrapping them in access functions with a lockdep assert?
Otherwise, it feels rather easy to get this wrong -- I think I've found
at least one bug (see mpam_msc_drv_probe().)
> +
> +static void mpam_discovery_complete(void)
> +{
> + pr_err("Discovered all MSC\n");
> +}
As others have commented, if this is non-functional code that gets
removed later on, it's probably best to drop this up-front?
[...]
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> + u32 ris_idx)
> +{
> + int err = 0;
> + u32 level = 0;
> + unsigned long cache_id;
> + struct device_node *cache;
> +
> + do {
> + if (of_device_is_compatible(np, "arm,mpam-cache")) {
> + cache = of_parse_phandle(np, "arm,mpam-device", 0);
> + if (!cache) {
> + pr_err("Failed to read phandle\n");
> + break;
> + }
> + } else if (of_device_is_compatible(np->parent, "cache")) {
> + cache = of_node_get(np->parent);
> + } else {
> + /* For now, only caches are supported */
> + cache = NULL;
> + break;
> + }
> +
> + err = of_property_read_u32(cache, "cache-level", &level);
> + if (err) {
> + pr_err("Failed to read cache-level\n");
> + break;
> + }
> +
> + cache_id = cache_of_calculate_id(cache);
> + if (cache_id == ~0UL) {
The type of cache_id may change if the return type of
cache_of_calculate_id() changes (see comments on patch 1).
Possible #define for the exceptional value.
> + err = -ENOENT;
> + break;
The lack of a diagnostic here is inconsistent with the level of
diagnostics in the rest of the loop.
> + }
> +
> + err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> + cache_id);
> + } while (0);
Abuse of do ... while () here?
There is no loop. The breaks are stealth "goto"s to this statement:
> + of_node_put(cache);
(It works either way, but maybe gotos to an explicit label would be
more readable, as well as avoiding an unnecessary level of indentation.)
> +
> + return err;
> +}
[...]
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
Nit: the the
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> + struct device_node *parent;
> + u32 affinity_id;
> + int err;
> +
> + if (!acpi_disabled) {
> + err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> + &affinity_id);
> + if (err)
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + else
> + acpi_pptt_get_cpus_from_container(affinity_id,
> + &msc->accessibility);
> +
> + return 0;
> + }
> +
> + /* This depends on the path to of_node */
> + parent = of_get_parent(msc->pdev->dev.of_node);
> + if (parent == of_root) {
> + cpumask_copy(&msc->accessibility, cpu_possible_mask);
> + err = 0;
> + } else {
> + err = -EINVAL;
> + pr_err("Cannot determine accessibility of MSC: %s\n",
> + dev_name(&msc->pdev->dev));
> + }
> + of_node_put(parent);
> +
> + return err;
> +}
> +
> +static int fw_num_msc;
Does this need to be protected against data races?
If individual mpam_msc_drv_probe() calls may execute on different CPUs
from mpam_msc_driver_init(), then seem to be potential races here.
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> + /* TODO: wake up tasks blocked on this MSC's PCC channel */
So, is this broken in this commit?
(If the series does not get broken up or applied piecemail, that's not
such a concern, though.)
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
The MPAM driver cannot currently be built as a module.
Is it possible to exercise the driver remove paths, today?
> + struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> + if (!msc)
> + return;
> +
> + mutex_lock(&mpam_list_lock);
> + mpam_num_msc--;
> + platform_set_drvdata(pdev, NULL);
> + list_del_rcu(&msc->glbl_list);
> + synchronize_srcu(&mpam_srcu);
> + devm_kfree(&pdev->dev, msc);
> + mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> + int err;
> + struct mpam_msc *msc;
> + struct resource *msc_res;
> + void *plat_data = pdev->dev.platform_data;
> +
> + mutex_lock(&mpam_list_lock);
> + do {
> + msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> + if (!msc) {
> + err = -ENOMEM;
> + break;
> + }
> +
> + mutex_init(&msc->probe_lock);
> + mutex_init(&msc->part_sel_lock);
> + mutex_init(&msc->outer_mon_sel_lock);
> + raw_spin_lock_init(&msc->inner_mon_sel_lock);
> + msc->id = mpam_num_msc++;
> + msc->pdev = pdev;
> + INIT_LIST_HEAD_RCU(&msc->glbl_list);
> + INIT_LIST_HEAD_RCU(&msc->ris);
> +
> + err = update_msc_accessibility(msc);
> + if (err)
> + break;
> + if (cpumask_empty(&msc->accessibility)) {
> + pr_err_once("msc:%u is not accessible from any CPU!",
> + msc->id);
> + err = -EINVAL;
> + break;
> + }
> +
> + if (device_property_read_u32(&pdev->dev, "pcc-channel",
> + &msc->pcc_subspace_id))
> + msc->iface = MPAM_IFACE_MMIO;
> + else
> + msc->iface = MPAM_IFACE_PCC;
> +
> + if (msc->iface == MPAM_IFACE_MMIO) {
> + void __iomem *io;
> +
> + io = devm_platform_get_and_ioremap_resource(pdev, 0,
> + &msc_res);
> + if (IS_ERR(io)) {
> + pr_err("Failed to map MSC base address\n");
> + err = PTR_ERR(io);
> + break;
> + }
> + msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> + msc->mapped_hwpage = io;
> + } else if (msc->iface == MPAM_IFACE_PCC) {
> + msc->pcc_cl.dev = &pdev->dev;
> + msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> + msc->pcc_cl.tx_block = false;
> + msc->pcc_cl.tx_tout = 1000; /* 1s */
> + msc->pcc_cl.knows_txdone = false;
> +
> + msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> + msc->pcc_subspace_id);
> + if (IS_ERR(msc->pcc_chan)) {
> + pr_err("Failed to request MSC PCC channel\n");
> + err = PTR_ERR(msc->pcc_chan);
> + break;
> + }
> + }
Should the lock be held across initialisation of the msc fields?
list_add_rcu() might imply sufficient barriers to ensure that the
initialisations are visible to other threads that obtain the msc
pointer by iterating over mpam_all_msc.
It's probably cleaner to hold the lock explicitly, though.
What other ways of obtaining the msc pointer exist?
> +
> + list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> + platform_set_drvdata(pdev, msc);
> + } while (0);
> + mutex_unlock(&mpam_list_lock);
> +
> + if (!err) {
> + /* Create RIS entries described by firmware */
> + if (!acpi_disabled)
> + err = acpi_mpam_parse_resources(msc, plat_data);
> + else
> + err = mpam_dt_parse_resources(msc, plat_data);
> + }
> +
> + if (!err && fw_num_msc == mpam_num_msc)
Unlocked read of mpam_num_msc?
> + mpam_discovery_complete();
> +
> + if (err && msc)
> + mpam_msc_drv_remove(pdev);
> +
> + return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> + { .compatible = "arm,mpam-msc", },
> + {},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> + .driver = {
> + .name = "mpam_msc",
> + .of_match_table = of_match_ptr(mpam_of_match),
> + },
> + .probe = mpam_msc_drv_probe,
> + .remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
Can you elaborate? I don't understand quite what this is doing.
> +static void mpam_dt_create_foundling_msc(void)
> +{
> + int err;
> + struct device_node *cache;
> +
> + for_each_compatible_node(cache, NULL, "cache") {
> + err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> + if (err)
> + pr_err("Failed to create MSC devices under caches\n");
> + }
> +}
[...]
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> + /* member of mpam_all_msc */
> + struct list_head glbl_list;
It is worth making these names less mismatched?
> +
> + int id;
> + struct platform_device *pdev;
> +
> + /* Not modified after mpam_is_enabled() becomes true */
> + enum mpam_msc_iface iface;
> + u32 pcc_subspace_id;
> + struct mbox_client pcc_cl;
> + struct pcc_mbox_chan *pcc_chan;
> + u32 nrdy_usec;
> + cpumask_t accessibility;
> +
> + /*
> + * probe_lock is only take during discovery. After discovery these
> + * properties become read-only and the lists are protected by SRCU.
> + */
> + struct mutex probe_lock;
Can we have more clarify about the locking strategy, including details
of which things each lock is supposed to apply to and when, and how (if
at all) the locks are intended to nest?
(Similarly for the global locks.)
> + unsigned long ris_idxs[128 / BITS_PER_LONG];
> + u32 ris_max;
nrdy_usec, ris_idxs and ris_max appear unused in this patch (though I
suppose they get initialised by virtue of kzalloc()). Is this
intentional?
> +
> + /* mpam_msc_ris of this component */
> + struct list_head ris;
> +
> + /*
> + * part_sel_lock protects access to the MSC hardware registers that are
> + * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> + * by RIS).
> + * If needed, take msc->lock first.
> + */
What's msc->lock ?
> + struct mutex part_sel_lock;
> +
> + /*
> + * mon_sel_lock protects access to the MSC hardware registers that are
> + * affeted by MPAMCFG_MON_SEL.
> + * If needed, take msc->lock first.
> + */
Same here.
> + struct mutex outer_mon_sel_lock;
> + raw_spinlock_t inner_mon_sel_lock;
> + unsigned long inner_mon_sel_flags;
> +
> + void __iomem *mapped_hwpage;
> + size_t mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
[...]
Cheers
---Dave
^ permalink raw reply [flat|nested] 130+ messages in thread
* Re: [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-09-02 16:59 ` Fenghua Yu
0 siblings, 0 replies; 130+ messages in thread
From: Fenghua Yu @ 2025-09-02 16:59 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
devicetree
Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
Greg Kroah-Hartman, Danilo Krummrich
Hi, James,
On 8/22/25 08:30, James Morse wrote:
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
>
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_internal.h | 8 +-
> drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
> 2 files changed, 329 insertions(+), 1 deletion(-)
[SNIP]
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> index 8e9d6c88171c..ef39696e7ff8 100644
> --- a/drivers/resctrl/test_mpam_devices.c
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -4,6 +4,326 @@
>
> #include <kunit/test.h>
>
> +/*
> + * This test catches fields that aren't being sanitised - but can't tell you
> + * which one...
> + */
> +static void test__props_mismatch(struct kunit *test)
> +{
> + struct mpam_props parent = { 0 };
> + struct mpam_props child;
> +
> + memset(&child, 0xff, sizeof(child));
> + __props_mismatch(&parent, &child, false);
> +
> + memset(&child, 0, sizeof(child));
> + KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +
> + memset(&child, 0xff, sizeof(child));
> + __props_mismatch(&parent, &child, true);
> +
> + KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +}
> +
> +static void test_mpam_enable_merge_features(struct kunit *test)
> +{
> + /* o/` How deep is your stack? o/` */
> + struct list_head fake_classes_list;
> + struct mpam_class fake_class = { 0 };
> + struct mpam_component fake_comp1 = { 0 };
> + struct mpam_component fake_comp2 = { 0 };
> + struct mpam_vmsc fake_vmsc1 = { 0 };
> + struct mpam_vmsc fake_vmsc2 = { 0 };
> + struct mpam_msc fake_msc1 = { 0 };
> + struct mpam_msc fake_msc2 = { 0 };
> + struct mpam_msc_ris fake_ris1 = { 0 };
> + struct mpam_msc_ris fake_ris2 = { 0 };
> + struct platform_device fake_pdev = { 0 };
> +
> +#define RESET_FAKE_HIEARCHY() do { \
> + INIT_LIST_HEAD(&fake_classes_list); \
> + \
> + memset(&fake_class, 0, sizeof(fake_class)); \
> + fake_class.level = 3; \
> + fake_class.type = MPAM_CLASS_CACHE; \
> + INIT_LIST_HEAD_RCU(&fake_class.components); \
> + INIT_LIST_HEAD(&fake_class.classes_list); \
> + \
> + memset(&fake_comp1, 0, sizeof(fake_comp1)); \
> + memset(&fake_comp2, 0, sizeof(fake_comp2)); \
> + fake_comp1.comp_id = 1; \
> + fake_comp2.comp_id = 2; \
> + INIT_LIST_HEAD(&fake_comp1.vmsc); \
> + INIT_LIST_HEAD(&fake_comp1.class_list); \
> + INIT_LIST_HEAD(&fake_comp2.vmsc); \
> + INIT_LIST_HEAD(&fake_comp2.class_list); \
> + \
> + memset(&fake_vmsc1, 0, sizeof(fake_vmsc1)); \
> + memset(&fake_vmsc2, 0, sizeof(fake_vmsc2)); \
> + INIT_LIST_HEAD(&fake_vmsc1.ris); \
> + INIT_LIST_HEAD(&fake_vmsc1.comp_list); \
> + fake_vmsc1.msc = &fake_msc1; \
> + INIT_LIST_HEAD(&fake_vmsc2.ris); \
> + INIT_LIST_HEAD(&fake_vmsc2.comp_list); \
> + fake_vmsc2.msc = &fake_msc2; \
> + \
> + memset(&fake_ris1, 0, sizeof(fake_ris1)); \
> + memset(&fake_ris2, 0, sizeof(fake_ris2)); \
> + fake_ris1.ris_idx = 1; \
> + INIT_LIST_HEAD(&fake_ris1.msc_list); \
> + fake_ris2.ris_idx = 2; \
> + INIT_LIST_HEAD(&fake_ris2.msc_list); \
> + \
> + fake_msc1.pdev = &fake_pdev; \
> + fake_msc2.pdev = &fake_pdev; \
> + \
> + list_add(&fake_class.classes_list, &fake_classes_list); \
> +} while (0)
> +
> + RESET_FAKE_HIEARCHY();
> +
> + mutex_lock(&mpam_list_lock);
> +
> + /* One Class+Comp, two RIS in one vMSC with common features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = NULL;
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc1;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cpbm_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two RIS in one vMSC with non-overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = NULL;
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc1;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cmax_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /* Multiple RIS within one MSC controlling the same resource can be mismatched */
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> + KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cpbm_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with non-overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cmax_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple RIS in different MSC can't the same resource, mismatched
> + * features can not be supported.
> + */
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with incompatible overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 5;
> + fake_ris2.props.cpbm_wd = 3;
> + fake_ris1.props.mbw_pbm_bits = 5;
> + fake_ris2.props.mbw_pbm_bits = 3;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple RIS in different MSC can't the same resource, mismatched
> + * features can not be supported.
> + */
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> + KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class+Comp, two MSC with overlapping features that need tweaking */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = NULL;
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp1;
> + list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
> + mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
> + fake_ris1.props.bwa_wd = 5;
> + fake_ris2.props.bwa_wd = 3;
> + fake_ris1.props.cmax_wd = 5;
> + fake_ris2.props.cmax_wd = 3;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple RIS in different MSC can't the same resource, mismatched
> + * features can not be supported.
> + */
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class Two Comp with overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = &fake_class;
> + list_add(&fake_comp2.class_list, &fake_class.components);
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp2;
> + list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cpbm_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> + RESET_FAKE_HIEARCHY();
> +
> + /* One Class Two Comp with non-overlapping features */
> + fake_comp1.class = &fake_class;
> + list_add(&fake_comp1.class_list, &fake_class.components);
> + fake_comp2.class = &fake_class;
> + list_add(&fake_comp2.class_list, &fake_class.components);
> + fake_vmsc1.comp = &fake_comp1;
> + list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> + fake_vmsc2.comp = &fake_comp2;
> + list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> + fake_ris1.vmsc = &fake_vmsc1;
> + list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> + fake_ris2.vmsc = &fake_vmsc2;
> + list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> + mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> + mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> + fake_ris1.props.cpbm_wd = 4;
> + fake_ris2.props.cmax_wd = 4;
> +
> + mpam_enable_merge_features(&fake_classes_list);
> +
> + /*
> + * Multiple components can't control the same resource, mismatched features can
> + * not be supported.
> + */
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> + KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> + KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> + KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> + mutex_unlock(&mpam_list_lock);
> +
> +#undef RESET_FAKE_HIEARCHY
> +}
> +
In file included from drivers/resctrl/mpam_devices.c:2908:
drivers/resctrl/test_mpam_devices.c: In function
‘test_mpam_enable_merge_features’:
drivers/resctrl/test_mpam_devices.c:325:1: error: the frame size of 5520
bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
325 | }
| ^
It's better to split the big function into a few sub-tests. Each
sub-test defines and uses less variables to avoid big frame size issue.
[SNIP]
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 130+ messages in thread
end of thread, other threads:[~2025-09-02 21:44 UTC | newest]
Thread overview: 130+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
2025-08-27 10:46 ` Dave Martin
2025-08-27 17:11 ` James Morse
2025-08-28 14:08 ` Dave Martin
2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
2025-08-24 17:25 ` Krzysztof Kozlowski
2025-08-27 17:11 ` James Morse
2025-08-27 10:46 ` Dave Martin
2025-08-27 17:11 ` James Morse
2025-08-28 14:10 ` Dave Martin
2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
2025-08-26 14:45 ` Ben Horgan
2025-08-28 15:56 ` James Morse
2025-08-27 10:48 ` Dave Martin
2025-08-28 15:57 ` James Morse
2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
2025-08-27 10:49 ` Dave Martin
2025-08-28 15:57 ` James Morse
2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
2025-08-23 12:14 ` Markus Elfring
2025-08-28 15:57 ` James Morse
2025-08-27 9:25 ` Ben Horgan
2025-08-28 15:57 ` James Morse
2025-08-27 10:50 ` Dave Martin
2025-08-28 15:58 ` James Morse
2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
2025-08-27 10:53 ` Dave Martin
2025-08-28 15:58 ` James Morse
2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
2025-08-27 8:53 ` Ben Horgan
2025-08-28 15:58 ` James Morse
2025-08-29 8:20 ` Ben Horgan
2025-08-27 11:01 ` Dave Martin
2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
2025-08-23 10:55 ` Markus Elfring
2025-08-27 16:05 ` Dave Martin
2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
2025-08-27 16:22 ` Dave Martin
2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
2025-08-22 19:15 ` Markus Elfring
2025-08-22 19:55 ` Markus Elfring
2025-08-23 6:41 ` Greg Kroah-Hartman
2025-08-27 13:03 ` Ben Horgan
2025-08-27 15:39 ` Rob Herring
2025-08-27 16:16 ` Rob Herring
2025-09-01 9:11 ` Ben Horgan
2025-09-01 11:21 ` Dave Martin
2025-08-22 15:29 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
2025-08-28 1:29 ` Fenghua Yu
2025-09-01 11:09 ` Dave Martin
2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
2025-08-29 8:42 ` Ben Horgan
2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
2025-08-27 16:08 ` Rob Herring
2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
2025-08-28 13:12 ` Ben Horgan
2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
2025-08-28 17:07 ` Fenghua Yu
2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
2025-08-28 13:44 ` Ben Horgan
2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
2025-08-29 13:54 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
2025-08-27 16:19 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
2025-08-28 16:13 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
2025-08-29 14:30 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
2025-08-28 16:13 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
2025-08-28 10:11 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
2025-08-29 15:47 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
2025-08-29 15:55 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
2025-08-29 16:09 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
2025-08-28 16:14 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
2025-08-29 16:39 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
2025-08-29 16:56 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
2025-08-29 17:11 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
2025-08-22 15:30 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
2025-08-22 15:30 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
2025-08-22 15:30 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
2025-08-22 15:30 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
2025-08-22 15:30 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
2025-08-22 15:30 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
2025-08-22 15:30 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
2025-08-22 15:30 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
2025-08-22 15:30 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
2025-08-22 15:30 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
2025-08-22 15:30 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
2025-08-29 12:41 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
2025-08-22 15:30 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
2025-08-22 15:30 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
2025-08-22 15:30 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
2025-08-22 15:30 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
2025-08-22 15:30 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
2025-09-01 10:05 ` Ben Horgan
2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
2025-08-28 0:58 ` Fenghua Yu
2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
2025-09-02 16:59 ` Fenghua Yu
2025-08-24 17:24 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Krzysztof Kozlowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).