[PATCH 00/33] arm_mpam: Add basic mpam driver

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/33] arm_mpam: Add basic mpam driver
@ 2025-11-07 12:34 Ben Horgan
  2025-11-07 12:34 ` [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container Ben Horgan
                   ` (39 more replies)
  0 siblings, 40 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

Hi all,

This version of the series comes to you from me as James is otherwise
engaged. I hope I have done his work justice. I've made quite a few
changes, rework, bugs, typos, all the usual. In order to aid review,
as Jonathan suggested, I've split out some patches and made an effort
to minimise the amount of churn between patches.

It would be great to get this taken up quickly. There are lots more
patches to come before we have a working MPAM story and this driver is
hidden behind the expert config. All reviews, comments, testing
welcomed and thank you for all the feedback so far.

See below for a public branch. No public updated version of the
snapshot (the rest of the driver) I'm afraid.

Changelogs in the patches.

Previous cover letter from James:

This is just enough MPAM driver for ACPI. DT got ripped out. If you need DT
support - please share your DTS so the DT folk know the binding is what is
needed.
This doesn't contain any of the resctrl code, meaning you can't actually drive it
from user-space yet. Because of that, its hidden behind CONFIG_EXPERT.
This will change once the user interface is connected up.

This is the initial group of patches that allows the resctrl code to be built
on top. Including that will increase the number of trees that may need to
coordinate, so breaking it up make sense.

The locking got simplified, but is still strange - this is because of the 'mpam-fb'
firmware interface specification that is still alpha. That thing needs to wait for
an interrupt after every system register write, which significantly impacts the
driver. Some features just won't work, e.g. reading the monitor registers via
perf.

I've not found a platform that can test all the behaviours around the monitors,
so this is where I'd expect the most bugs.

The MPAM spec that describes all the system and MMIO registers can be found here:
https://developer.arm.com/documentation/ddi0598/db/?lang=en
(Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
 This document has the best overview)

The expectation is this will go via the arm64 tree.

This series is based on v6.18-rc4, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v4

The rest of the driver can be found here: (no updated version - based on v3)
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1

What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/

This series was previously posted here:
[v3] https://lore.kernel.org/linux-arm-kernel/20251017185645.26604-1-james.morse@arm.com/
[v2] lore.kernel.org/r/20250910204309.20751-1-james.morse@arm.com
[v1] lore.kernel.org/r/20250822153048.2287-1-james.morse@arm.com
[RFC] lore.kernel.org/r/20250711183648.30766-2-james.morse@arm.com

Ben Horgan (4):
  ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one
    structure
  platform: Define platform_device_put cleanup handler
  ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret()
    helper
  arm_mpam: Consider overflow in bandwidth counter state

James Morse (27):
  ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear
    levels
  ACPI / PPTT: Find cache level by cache-id
  ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  arm64: kconfig: Add Kconfig entry for MPAM
  ACPI / MPAM: Parse the MPAM table
  arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  arm_mpam: Add the class and component structures for firmware
    described ris
  arm_mpam: Add MPAM MSC register layout definitions
  arm_mpam: Add cpuhp callbacks to probe MSC hardware
  arm_mpam: Probe hardware to find the supported partid/pmg values
  arm_mpam: Add helpers for managing the locking around the mon_sel
    registers
  arm_mpam: Probe the hardware features resctrl supports
  arm_mpam: Merge supported features during mpam_enable() into
    mpam_class
  arm_mpam: Reset MSC controls from cpuhp callbacks
  arm_mpam: Add a helper to touch an MSC from any CPU
  arm_mpam: Extend reset logic to allow devices to be reset any time
  arm_mpam: Register and enable IRQs
  arm_mpam: Use a static key to indicate when mpam is enabled
  arm_mpam: Allow configuration to be applied and restored during cpu
    online
  arm_mpam: Probe and reset the rest of the features
  arm_mpam: Add helpers to allocate monitors
  arm_mpam: Add mpam_msmon_read() to read monitor value
  arm_mpam: Track bandwidth counter state for power management
  arm_mpam: Add helper to reset saved mbwu state
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add kunit tests for props_mismatch()

Rohit Mathew (2):
  arm_mpam: Probe for long/lwd mbwu counters
  arm_mpam: Use long MBWU counters if supported

 arch/arm64/Kconfig                  |   25 +
 drivers/Kconfig                     |    2 +
 drivers/Makefile                    |    1 +
 drivers/acpi/arm64/Kconfig          |    3 +
 drivers/acpi/arm64/Makefile         |    1 +
 drivers/acpi/arm64/mpam.c           |  403 ++++
 drivers/acpi/pptt.c                 |  334 +++-
 drivers/acpi/tables.c               |    2 +-
 drivers/resctrl/Kconfig             |   24 +
 drivers/resctrl/Makefile            |    4 +
 drivers/resctrl/mpam_devices.c      | 2729 +++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h     |  656 +++++++
 drivers/resctrl/test_mpam_devices.c |  389 ++++
 include/linux/acpi.h                |   26 +
 include/linux/arm_mpam.h            |   66 +
 include/linux/platform_device.h     |    1 +
 16 files changed, 4611 insertions(+), 55 deletions(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h
 create mode 100644 drivers/resctrl/test_mpam_devices.c
 create mode 100644 include/linux/arm_mpam.h

-- 
2.43.0

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-08  4:31   ` Gavin Shan
  2025-11-12  5:45   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 02/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels Ben Horgan
                   ` (38 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

The ACPI MPAM table uses the UID of a processor container specified in
the PPTT to indicate the subset of CPUs and cache topology that can
access each MPAM System Component (MSC).

This information is not directly useful to the kernel. The equivalent
cpumask is needed instead.

Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.

CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Refer to processor hierarchy in comments (Jonathan)
Fix indent (Jonathan)
---
 drivers/acpi/pptt.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  3 ++
 2 files changed, 88 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..69917cc6bd2f 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -817,3 +817,88 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT
+ * processor hierarchy node
+ *
+ * @table_hdr:		A reference to the PPTT table
+ * @parent_node:	A pointer to the processor hierarchy node in the
+ *			table_hdr
+ * @cpus:		A cpumask to fill with the CPUs below @parent_node
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+				     struct acpi_pptt_processor *parent_node,
+				     cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_id;
+	int cpu;
+
+	cpumask_clear(cpus);
+
+	for_each_possible_cpu(cpu) {
+		acpi_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+		while (cpu_node) {
+			if (cpu_node == parent_node) {
+				cpumask_set_cpu(cpu, cpus);
+				break;
+			}
+			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+		}
+	}
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ *                                       processor container
+ * @acpi_cpu_id:	The UID of the processor container
+ * @cpus:		The resulting CPU mask
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor Hierarchy' entries in the PPTT are either a CPU
+ * or a Processor Container, they may exist purely to describe a
+ * Private resource. CPUs have to be leaves, so a Processor Container
+ * is a non-leaf that has the 'ACPI Processor ID valid' flag set.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+	struct acpi_table_header *table_hdr;
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 proc_sz;
+
+	cpumask_clear(cpus);
+
+	table_hdr = acpi_get_pptt();
+	if (!table_hdr)
+		return;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+	proc_sz = sizeof(struct acpi_pptt_processor);
+	while ((unsigned long)entry + proc_sz <= table_end) {
+
+		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR) {
+			struct acpi_pptt_processor *cpu_node;
+
+			cpu_node = (struct acpi_pptt_processor *)entry;
+			if (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID &&
+			    !acpi_pptt_leaf_node(table_hdr, cpu_node) &&
+			    cpu_node->acpi_processor_id == acpi_cpu_id) {
+				acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+				break;
+			}
+		}
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 5ff5d99f6ead..4752ebd48132 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
 int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 {
 	return -EINVAL;
 }
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+						     cpumask_t *cpus) { }
 #endif
 
 void acpi_arch_init(void);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-11-07 12:34 ` [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container Ben Horgan
@ 2025-11-08  4:31   ` Gavin Shan
  2025-11-12 10:14     ` Ben Horgan
  2025-11-12  5:45   ` Shaopeng Tan (Fujitsu)
  1 sibling, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-08  4:31 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT to indicate the subset of CPUs and cache topology that can
> access each MPAM System Component (MSC).
> 
> This information is not directly useful to the kernel. The equivalent
> cpumask is needed instead.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Refer to processor hierarchy in comments (Jonathan)
> Fix indent (Jonathan)
> ---
>   drivers/acpi/pptt.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  3 ++
>   2 files changed, 88 insertions(+)
> 

Two nitpicks below...

> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..69917cc6bd2f 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -817,3 +817,88 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>   	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>   					  ACPI_PPTT_ACPI_IDENTICAL);
>   }
> +
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT
> + * processor hierarchy node
> + *
> + * @table_hdr:		A reference to the PPTT table
> + * @parent_node:	A pointer to the processor hierarchy node in the
> + *			table_hdr
> + * @cpus:		A cpumask to fill with the CPUs below @parent_node
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> +				     struct acpi_pptt_processor *parent_node,
> +				     cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_id;
> +	int cpu;
> +
> +	cpumask_clear(cpus);
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
> +
> +		while (cpu_node) {
> +			if (cpu_node == parent_node) {
> +				cpumask_set_cpu(cpu, cpus);
> +				break;
> +			}
> +			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +		}
> +	}
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor container
> + * @acpi_cpu_id:	The UID of the processor container
> + * @cpus:		The resulting CPU mask
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor Hierarchy' entries in the PPTT are either a CPU
> + * or a Processor Container, they may exist purely to describe a
> + * Private resource. CPUs have to be leaves, so a Processor Container
> + * is a non-leaf that has the 'ACPI Processor ID valid' flag set.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 proc_sz;
> +
> +	cpumask_clear(cpus);
> +
> +	table_hdr = acpi_get_pptt();
> +	if (!table_hdr)
> +		return;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +	proc_sz = sizeof(struct acpi_pptt_processor);
> +	while ((unsigned long)entry + proc_sz <= table_end) {
> +

Unnecessary blank line here.

> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR) {
> +			struct acpi_pptt_processor *cpu_node;
> +
> +			cpu_node = (struct acpi_pptt_processor *)entry;
> +			if (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID &&
> +			    !acpi_pptt_leaf_node(table_hdr, cpu_node) &&
> +			    cpu_node->acpi_processor_id == acpi_cpu_id) {
> +				acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> +				break;
> +			}
> +		}
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);

Need we to check if @cpu_node does crosses the boundary (@table_end), as what's
doing in acpi_find_processor_node()? Actually, the similar hunk of code from
the function can be reused here.

static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_header *table_hdr,
                                                             u32 acpi_cpu_id)
{
		:
	while ((unsigned long)entry + proc_sz <= table_end) {
                 cpu_node = (struct acpi_pptt_processor *)entry;

                 if (entry->length == 0) {
                         pr_warn("Invalid zero length subtable\n");
                         break;
                 }

                 /* entry->length may not equal proc_sz, revalidate the processor structure length */
                 if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
                     acpi_cpu_id == cpu_node->acpi_processor_id &&
                     (unsigned long)entry + entry->length <= table_end &&
                     entry->length == proc_sz + cpu_node->number_of_priv_resources * sizeof(u32) &&
                      acpi_pptt_leaf_node(table_hdr, cpu_node)) {
                         return (struct acpi_pptt_processor *)entry;
                 }

                 entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
                                      entry->length);
	}

		:
}


> +	}
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 5ff5d99f6ead..4752ebd48132 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
>   int find_acpi_cpu_topology_cluster(unsigned int cpu);
>   int find_acpi_cpu_topology_package(unsigned int cpu);
>   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>   #else
>   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>   {
> @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>   {
>   	return -EINVAL;
>   }
> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> +						     cpumask_t *cpus) { }
>   #endif
>   
>   void acpi_arch_init(void);

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-11-08  4:31   ` Gavin Shan
@ 2025-11-12 10:14     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 10:14 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

On 11/8/25 04:31, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> The ACPI MPAM table uses the UID of a processor container specified in
>> the PPTT to indicate the subset of CPUs and cache topology that can
>> access each MPAM System Component (MSC).
>>
>> This information is not directly useful to the kernel. The equivalent
>> cpumask is needed instead.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
>>
>> CC: Dave Martin <dave.martin@arm.com>
>> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> Refer to processor hierarchy in comments (Jonathan)
>> Fix indent (Jonathan)
>> ---
>>   drivers/acpi/pptt.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/acpi.h |  3 ++
>>   2 files changed, 88 insertions(+)
>>
> 
> Two nitpicks below...
> 
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 54676e3d82dd..69917cc6bd2f 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -817,3 +817,88 @@ int find_acpi_cpu_topology_hetero_id(unsigned int
>> cpu)
>>       return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>>                         ACPI_PPTT_ACPI_IDENTICAL);
>>   }
>> +
>> +/**
>> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT
>> + * processor hierarchy node
>> + *
>> + * @table_hdr:        A reference to the PPTT table
>> + * @parent_node:    A pointer to the processor hierarchy node in the
>> + *            table_hdr
>> + * @cpus:        A cpumask to fill with the CPUs below @parent_node
>> + *
>> + * Walks up the PPTT from every possible CPU to find if the provided
>> + * @parent_node is a parent of this CPU.
>> + */
>> +static void acpi_pptt_get_child_cpus(struct acpi_table_header
>> *table_hdr,
>> +                     struct acpi_pptt_processor *parent_node,
>> +                     cpumask_t *cpus)
>> +{
>> +    struct acpi_pptt_processor *cpu_node;
>> +    u32 acpi_id;
>> +    int cpu;
>> +
>> +    cpumask_clear(cpus);
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        acpi_id = get_acpi_id_for_cpu(cpu);
>> +        cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
>> +
>> +        while (cpu_node) {
>> +            if (cpu_node == parent_node) {
>> +                cpumask_set_cpu(cpu, cpus);
>> +                break;
>> +            }
>> +            cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +        }
>> +    }
>> +}
>> +
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all
>> CPUs in a
>> + *                                       processor container
>> + * @acpi_cpu_id:    The UID of the processor container
>> + * @cpus:        The resulting CPU mask
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all
>> the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor Hierarchy' entries in the PPTT are either a CPU
>> + * or a Processor Container, they may exist purely to describe a
>> + * Private resource. CPUs have to be leaves, so a Processor Container
>> + * is a non-leaf that has the 'ACPI Processor ID valid' flag set.
>> + */
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>> +{
>> +    struct acpi_table_header *table_hdr;
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    u32 proc_sz;
>> +
>> +    cpumask_clear(cpus);
>> +
>> +    table_hdr = acpi_get_pptt();
>> +    if (!table_hdr)
>> +        return;
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> +                 sizeof(struct acpi_table_pptt));
>> +    proc_sz = sizeof(struct acpi_pptt_processor);
>> +    while ((unsigned long)entry + proc_sz <= table_end) {
>> +
> 
> Unnecessary blank line here.

Ack

> 
>> +        if (entry->type == ACPI_PPTT_TYPE_PROCESSOR) {
>> +            struct acpi_pptt_processor *cpu_node;
>> +
>> +            cpu_node = (struct acpi_pptt_processor *)entry;
>> +            if (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID &&
>> +                !acpi_pptt_leaf_node(table_hdr, cpu_node) &&
>> +                cpu_node->acpi_processor_id == acpi_cpu_id) {
>> +                acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
>> +                break;
>> +            }
>> +        }
>> +        entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>> +                     entry->length);
> 
> Need we to check if @cpu_node does crosses the boundary (@table_end), as
> what's
> doing in acpi_find_processor_node()? Actually, the similar hunk of code
> from
> the function can be reused here.

As acpi_find_processor_node() is called from acpi_pptt_get_child_cpus()
I don't think we need to duplicate the checking here.

> 
> static struct acpi_pptt_processor *acpi_find_processor_node(struct
> acpi_table_header *table_hdr,
>                                                             u32
> acpi_cpu_id)
> {
>         :
>     while ((unsigned long)entry + proc_sz <= table_end) {
>                 cpu_node = (struct acpi_pptt_processor *)entry;
> 
>                 if (entry->length == 0) {
>                         pr_warn("Invalid zero length subtable\n");
>                         break;
>                 }
> 
>                 /* entry->length may not equal proc_sz, revalidate the
> processor structure length */
>                 if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>                     acpi_cpu_id == cpu_node->acpi_processor_id &&
>                     (unsigned long)entry + entry->length <= table_end &&
>                     entry->length == proc_sz + cpu_node-
>>number_of_priv_resources * sizeof(u32) &&
>                      acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>                         return (struct acpi_pptt_processor *)entry;
>                 }
> 
>                 entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>                                      entry->length);
>     }
> 
>         :
> }
> 
> 
>> +    }
>> +}
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index 5ff5d99f6ead..4752ebd48132 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int
>> level);
>>   int find_acpi_cpu_topology_cluster(unsigned int cpu);
>>   int find_acpi_cpu_topology_package(unsigned int cpu);
>>   int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t
>> *cpus);
>>   #else
>>   static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>>   {
>> @@ -1562,6 +1563,8 @@ static inline int
>> find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>   {
>>       return -EINVAL;
>>   }
>> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>> +                             cpumask_t *cpus) { }
>>   #endif
>>     void acpi_arch_init(void);
> 
> Thanks,
> Gavin
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-11-07 12:34 ` [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container Ben Horgan
  2025-11-08  4:31   ` Gavin Shan
@ 2025-11-12  5:45   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  5:45 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> The ACPI MPAM table uses the UID of a processor container specified in the
> PPTT to indicate the subset of CPUs and cache topology that can access each
> MPAM System Component (MSC).
> 
> This information is not directly useful to the kernel. The equivalent cpumask is
> needed instead.
> 
> Add a helper to find the processor container by its id, then walk the possible
> CPUs to fill a cpumask with the CPUs that have this processor container as a
> parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 02/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
  2025-11-07 12:34 ` [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-11  7:34   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
                   ` (37 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

In acpi_count_levels(), the initial value of *levels passed by the
caller is really an implementation detail of acpi_count_levels(), so it
is unreasonable to expect the callers of this function to know what to
pass in for this parameter.  The only sensible initial value is 0,
which is what the only upstream caller (acpi_get_cache_info()) passes.

Use a local variable for the starting cache level in acpi_count_levels(),
and pass the result back to the caller via the function return value.

Get rid of the levels parameter, which has no remaining purpose.

Fix acpi_get_cache_info() to match.

Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
s/starting_level/current_level/ (Jonathan)
---
 drivers/acpi/pptt.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 69917cc6bd2f..1027ca3566b1 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -177,14 +177,14 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
 }
 
 /**
- * acpi_count_levels() - Given a PPTT table, and a CPU node, count the cache
- * levels and split cache levels (data/instruction).
+ * acpi_count_levels() - Given a PPTT table, and a CPU node, count the
+ * total number of levels and split cache levels (data/instruction).
  * @table_hdr: Pointer to the head of the PPTT table
  * @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
  * @split_levels:	Number of split cache levels (data/instruction) if
  *			success. Can by NULL.
  *
+ * Return: number of levels.
  * Given a processor node containing a processing unit, walk into it and count
  * how many levels exist solely for it, and then walk up each level until we hit
  * the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * split cache levels (data/instruction) that exist at each level on the way
  * up.
  */
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
-			      struct acpi_pptt_processor *cpu_node,
-			      unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node,
+			     unsigned int *split_levels)
 {
+	int current_level = 0;
+
 	do {
-		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+		acpi_find_cache_level(table_hdr, cpu_node, &current_level, split_levels, 0, 0);
 		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
 	} while (cpu_node);
+
+	return current_level;
 }
 
 /**
@@ -645,7 +649,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
 	if (!cpu_node)
 		return -ENOENT;
 
-	acpi_count_levels(table, cpu_node, levels, split_levels);
+	*levels = acpi_count_levels(table, cpu_node, split_levels);
 
 	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
 		 *levels, split_levels ? *split_levels : -1);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* RE: [PATCH 02/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-11-07 12:34 ` [PATCH 02/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels Ben Horgan
@ 2025-11-11  7:34   ` Shaopeng Tan (Fujitsu)
  0 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-11  7:34 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

Hello Ben,

> From: James Morse <james.morse@arm.com>
> 
> In acpi_count_levels(), the initial value of *levels passed by the caller is really
> an implementation detail of acpi_count_levels(), so it is unreasonable to
> expect the callers of this function to know what to pass in for this parameter.
> The only sensible initial value is 0, which is what the only upstream caller
> (acpi_get_cache_info()) passes.
> 
> Use a local variable for the starting cache level in acpi_count_levels(), and pass
> the result back to the caller via the function return value.
> 
> Get rid of the levels parameter, which has no remaining purpose.
> 
> Fix acpi_get_cache_info() to match.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
"Reviewed-by" and "Tested-by" are mixed together.
It would be better to group them.
(Not just in this patch, but in other patches as well.)

Best regards,
Shaopeng TAN


> Changes since v3:
> s/starting_level/current_level/ (Jonathan)
> ---
>  drivers/acpi/pptt.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index
> 69917cc6bd2f..1027ca3566b1 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -177,14 +177,14 @@ acpi_find_cache_level(struct acpi_table_header
> *table_hdr,  }
> 
>  /**
> - * acpi_count_levels() - Given a PPTT table, and a CPU node, count the cache
> - * levels and split cache levels (data/instruction).
> + * acpi_count_levels() - Given a PPTT table, and a CPU node, count the
> + * total number of levels and split cache levels (data/instruction).
>   * @table_hdr: Pointer to the head of the PPTT table
>   * @cpu_node: processor node we wish to count caches for
> - * @levels: Number of levels if success.
>   * @split_levels:	Number of split cache levels (data/instruction) if
>   *			success. Can by NULL.
>   *
> + * Return: number of levels.
>   * Given a processor node containing a processing unit, walk into it and count
>   * how many levels exist solely for it, and then walk up each level until we hit
>   * the root node (ignore the package level because it may be possible to have
> @@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header
> *table_hdr,
>   * split cache levels (data/instruction) that exist at each level on the way
>   * up.
>   */
> -static void acpi_count_levels(struct acpi_table_header *table_hdr,
> -			      struct acpi_pptt_processor *cpu_node,
> -			      unsigned int *levels, unsigned int *split_levels)
> +static int acpi_count_levels(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node,
> +			     unsigned int *split_levels)
>  {
> +	int current_level = 0;
> +
>  	do {
> -		acpi_find_cache_level(table_hdr, cpu_node, levels,
> split_levels, 0, 0);
> +		acpi_find_cache_level(table_hdr, cpu_node, &current_level,
> +split_levels, 0, 0);
>  		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>  	} while (cpu_node);
> +
> +	return current_level;
>  }
> 
>  /**
> @@ -645,7 +649,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int
> *levels,
>  	if (!cpu_node)
>  		return -ENOENT;
> 
> -	acpi_count_levels(table, cpu_node, levels, split_levels);
> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
> 
>  	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
>  		 *levels, split_levels ? *split_levels : -1);
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
  2025-11-07 12:34 ` [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container Ben Horgan
  2025-11-07 12:34 ` [PATCH 02/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-08  4:54   ` Gavin Shan
                     ` (3 more replies)
  2025-11-07 12:34 ` [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id Ben Horgan
                   ` (36 subsequent siblings)
  39 siblings, 4 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

In actbl2.h, struct acpi_pptt_cache describes the fields in the original
cache type structure. In PPTT table version 3 a new field was added at the
end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
the new, acpi_pptt_cache_v1_full to contain both these structures. Update
the existing code to use this new struct. This simplifies the code, removes
a non-standard use of ACPI_ADD_PTR and allows using the length in the
header to check if the cache_id is valid.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
New patch
---
 drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
 1 file changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 1027ca3566b1..1ed2099c0d1a 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -21,6 +21,11 @@
 #include <linux/cacheinfo.h>
 #include <acpi/processor.h>
 
+struct acpi_pptt_cache_v1_full {
+	struct acpi_pptt_cache		f;
+	struct acpi_pptt_cache_v1	extra;
+} __packed;
+
 static struct acpi_subtable_header *fetch_pptt_subtable(struct acpi_table_header *table_hdr,
 							u32 pptt_ref)
 {
@@ -50,10 +55,24 @@ static struct acpi_pptt_processor *fetch_pptt_node(struct acpi_table_header *tab
 	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
 }
 
-static struct acpi_pptt_cache *fetch_pptt_cache(struct acpi_table_header *table_hdr,
-						u32 pptt_ref)
+static struct acpi_pptt_cache_v1_full *fetch_pptt_cache(struct acpi_table_header *table_hdr,
+							u32 pptt_ref)
 {
-	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+	return (struct acpi_pptt_cache_v1_full *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+#define ACPI_PPTT_CACHE_V1_LEN sizeof(struct acpi_pptt_cache_v1_full)
+
+/*
+ * From PPTT table version 3, a new field cache_id was added at the end of
+ * the cache type structure.  We now use struct acpi_pptt_cache_v1_full,
+ * containing the cache_id, everywhere but must check validity before accessing
+ * the cache_id.
+ */
+static bool acpi_pptt_cache_id_is_valid(struct acpi_pptt_cache_v1_full *cache)
+{
+	return (cache->f.header.length >= ACPI_PPTT_CACHE_V1_LEN &&
+		cache->f.flags & ACPI_PPTT_CACHE_ID_VALID);
 }
 
 static struct acpi_subtable_header *acpi_get_pptt_resource(struct acpi_table_header *table_hdr,
@@ -103,30 +122,30 @@ static unsigned int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
 					 unsigned int local_level,
 					 unsigned int *split_levels,
 					 struct acpi_subtable_header *res,
-					 struct acpi_pptt_cache **found,
+					 struct acpi_pptt_cache_v1_full **found,
 					 unsigned int level, int type)
 {
-	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1_full *cache;
 
 	if (res->type != ACPI_PPTT_TYPE_CACHE)
 		return 0;
 
-	cache = (struct acpi_pptt_cache *) res;
+	cache = (struct acpi_pptt_cache_v1_full *)res;
 	while (cache) {
 		local_level++;
 
-		if (!(cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
-			cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+		if (!(cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
+			cache = fetch_pptt_cache(table_hdr, cache->f.next_level_of_cache);
 			continue;
 		}
 
 		if (split_levels &&
-		    (acpi_pptt_match_type(cache->attributes, ACPI_PPTT_CACHE_TYPE_DATA) ||
-		     acpi_pptt_match_type(cache->attributes, ACPI_PPTT_CACHE_TYPE_INSTR)))
+		    (acpi_pptt_match_type(cache->f.attributes, ACPI_PPTT_CACHE_TYPE_DATA) ||
+		     acpi_pptt_match_type(cache->f.attributes, ACPI_PPTT_CACHE_TYPE_INSTR)))
 			*split_levels = local_level;
 
 		if (local_level == level &&
-		    acpi_pptt_match_type(cache->attributes, type)) {
+		    acpi_pptt_match_type(cache->f.attributes, type)) {
 			if (*found != NULL && cache != *found)
 				pr_warn("Found duplicate cache level/type unable to determine uniqueness\n");
 
@@ -138,12 +157,12 @@ static unsigned int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
 			 * cache node.
 			 */
 		}
-		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+		cache = fetch_pptt_cache(table_hdr, cache->f.next_level_of_cache);
 	}
 	return local_level;
 }
 
-static struct acpi_pptt_cache *
+static struct acpi_pptt_cache_v1_full *
 acpi_find_cache_level(struct acpi_table_header *table_hdr,
 		      struct acpi_pptt_processor *cpu_node,
 		      unsigned int *starting_level, unsigned int *split_levels,
@@ -152,7 +171,7 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
 	struct acpi_subtable_header *res;
 	unsigned int number_of_levels = *starting_level;
 	int resource = 0;
-	struct acpi_pptt_cache *ret = NULL;
+	struct acpi_pptt_cache_v1_full *ret = NULL;
 	unsigned int local_level;
 
 	/* walk down from processor node */
@@ -324,14 +343,14 @@ static u8 acpi_cache_type(enum cache_type type)
 	}
 }
 
-static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *table_hdr,
-						    u32 acpi_cpu_id,
-						    enum cache_type type,
-						    unsigned int level,
-						    struct acpi_pptt_processor **node)
+static struct acpi_pptt_cache_v1_full *acpi_find_cache_node(struct acpi_table_header *table_hdr,
+							    u32 acpi_cpu_id,
+							    enum cache_type type,
+							    unsigned int level,
+							    struct acpi_pptt_processor **node)
 {
 	unsigned int total_levels = 0;
-	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_cache_v1_full *found = NULL;
 	struct acpi_pptt_processor *cpu_node;
 	u8 acpi_type = acpi_cache_type(type);
 
@@ -355,7 +374,6 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
  * @this_leaf: Kernel cache info structure being updated
  * @found_cache: The PPTT node describing this cache instance
  * @cpu_node: A unique reference to describe this cache instance
- * @revision: The revision of the PPTT table
  *
  * The ACPI spec implies that the fields in the cache structures are used to
  * extend and correct the information probed from the hardware. Lets only
@@ -364,23 +382,20 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
  * Return: nothing. Side effect of updating the global cacheinfo
  */
 static void update_cache_properties(struct cacheinfo *this_leaf,
-				    struct acpi_pptt_cache *found_cache,
-				    struct acpi_pptt_processor *cpu_node,
-				    u8 revision)
+				    struct acpi_pptt_cache_v1_full *found_cache,
+				    struct acpi_pptt_processor *cpu_node)
 {
-	struct acpi_pptt_cache_v1* found_cache_v1;
-
 	this_leaf->fw_token = cpu_node;
-	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
-		this_leaf->size = found_cache->size;
-	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
-		this_leaf->coherency_line_size = found_cache->line_size;
-	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
-		this_leaf->number_of_sets = found_cache->number_of_sets;
-	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
-		this_leaf->ways_of_associativity = found_cache->associativity;
-	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
-		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+	if (found_cache->f.flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->f.size;
+	if (found_cache->f.flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->f.line_size;
+	if (found_cache->f.flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->f.number_of_sets;
+	if (found_cache->f.flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->f.associativity;
+	if (found_cache->f.flags & ACPI_PPTT_WRITE_POLICY_VALID) {
+		switch (found_cache->f.attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
 		case ACPI_PPTT_CACHE_POLICY_WT:
 			this_leaf->attributes = CACHE_WRITE_THROUGH;
 			break;
@@ -389,8 +404,8 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 			break;
 		}
 	}
-	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
-		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+	if (found_cache->f.flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
+		switch (found_cache->f.attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
 		case ACPI_PPTT_CACHE_READ_ALLOCATE:
 			this_leaf->attributes |= CACHE_READ_ALLOCATE;
 			break;
@@ -415,13 +430,11 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 	 * specified in PPTT.
 	 */
 	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
-	    found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)
+	    found_cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)
 		this_leaf->type = CACHE_TYPE_UNIFIED;
 
-	if (revision >= 3 && (found_cache->flags & ACPI_PPTT_CACHE_ID_VALID)) {
-		found_cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
-	                                      found_cache, sizeof(struct acpi_pptt_cache));
-		this_leaf->id = found_cache_v1->cache_id;
+	if (acpi_pptt_cache_id_is_valid(found_cache)) {
+		this_leaf->id = found_cache->extra.cache_id;
 		this_leaf->attributes |= CACHE_ID;
 	}
 }
@@ -429,7 +442,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 static void cache_setup_acpi_cpu(struct acpi_table_header *table,
 				 unsigned int cpu)
 {
-	struct acpi_pptt_cache *found_cache;
+	struct acpi_pptt_cache_v1_full *found_cache;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
 	struct cacheinfo *this_leaf;
@@ -445,8 +458,7 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
 		pr_debug("found = %p %p\n", found_cache, cpu_node);
 		if (found_cache)
 			update_cache_properties(this_leaf, found_cache,
-						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)),
-						table->revision);
+						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)));
 
 		index++;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
@ 2025-11-08  4:54   ` Gavin Shan
  2025-11-10 15:51     ` Ben Horgan
  2025-11-10 15:46   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-08  4:54 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
> cache type structure. In PPTT table version 3 a new field was added at the
> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
> the existing code to use this new struct. This simplifies the code, removes
> a non-standard use of ACPI_ADD_PTR and allows using the length in the
> header to check if the cache_id is valid.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> New patch
> ---
>   drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
>   1 file changed, 58 insertions(+), 46 deletions(-)
> 

Two nitpicks below. LGTM in either way.

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 1027ca3566b1..1ed2099c0d1a 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -21,6 +21,11 @@
>   #include <linux/cacheinfo.h>
>   #include <acpi/processor.h>
>   
> +struct acpi_pptt_cache_v1_full {
> +	struct acpi_pptt_cache		f;
> +	struct acpi_pptt_cache_v1	extra;
> +} __packed;
> +
>   static struct acpi_subtable_header *fetch_pptt_subtable(struct acpi_table_header *table_hdr,
>   							u32 pptt_ref)
>   {
> @@ -50,10 +55,24 @@ static struct acpi_pptt_processor *fetch_pptt_node(struct acpi_table_header *tab
>   	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
>   }
>   
> -static struct acpi_pptt_cache *fetch_pptt_cache(struct acpi_table_header *table_hdr,
> -						u32 pptt_ref)
> +static struct acpi_pptt_cache_v1_full *fetch_pptt_cache(struct acpi_table_header *table_hdr,
> +							u32 pptt_ref)
>   {
> -	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +	return (struct acpi_pptt_cache_v1_full *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +#define ACPI_PPTT_CACHE_V1_LEN sizeof(struct acpi_pptt_cache_v1_full)
> +
> +/*
> + * From PPTT table version 3, a new field cache_id was added at the end of
> + * the cache type structure.  We now use struct acpi_pptt_cache_v1_full,
> + * containing the cache_id, everywhere but must check validity before accessing
> + * the cache_id.
> + */
> +static bool acpi_pptt_cache_id_is_valid(struct acpi_pptt_cache_v1_full *cache)
> +{
> +	return (cache->f.header.length >= ACPI_PPTT_CACHE_V1_LEN &&
> +		cache->f.flags & ACPI_PPTT_CACHE_ID_VALID);
>   }
>   

This function is nice fit to 'inline'. Besides, I'm not sure if we can just
use sizeof(*cache) instead of ACPI_PPTT_CACHE_V1_LEN, which is used for once
in pptt.c

>   static struct acpi_subtable_header *acpi_get_pptt_resource(struct acpi_table_header *table_hdr,
> @@ -103,30 +122,30 @@ static unsigned int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>   					 unsigned int local_level,
>   					 unsigned int *split_levels,
>   					 struct acpi_subtable_header *res,
> -					 struct acpi_pptt_cache **found,
> +					 struct acpi_pptt_cache_v1_full **found,
>   					 unsigned int level, int type)
>   {
> -	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1_full *cache;
>   
>   	if (res->type != ACPI_PPTT_TYPE_CACHE)
>   		return 0;
>   
> -	cache = (struct acpi_pptt_cache *) res;
> +	cache = (struct acpi_pptt_cache_v1_full *)res;
>   	while (cache) {
>   		local_level++;
>   
> -		if (!(cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
> -			cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +		if (!(cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
> +			cache = fetch_pptt_cache(table_hdr, cache->f.next_level_of_cache);
>   			continue;
>   		}
>   
>   		if (split_levels &&
> -		    (acpi_pptt_match_type(cache->attributes, ACPI_PPTT_CACHE_TYPE_DATA) ||
> -		     acpi_pptt_match_type(cache->attributes, ACPI_PPTT_CACHE_TYPE_INSTR)))
> +		    (acpi_pptt_match_type(cache->f.attributes, ACPI_PPTT_CACHE_TYPE_DATA) ||
> +		     acpi_pptt_match_type(cache->f.attributes, ACPI_PPTT_CACHE_TYPE_INSTR)))
>   			*split_levels = local_level;
>   
>   		if (local_level == level &&
> -		    acpi_pptt_match_type(cache->attributes, type)) {
> +		    acpi_pptt_match_type(cache->f.attributes, type)) {
>   			if (*found != NULL && cache != *found)
>   				pr_warn("Found duplicate cache level/type unable to determine uniqueness\n");
>   
> @@ -138,12 +157,12 @@ static unsigned int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>   			 * cache node.
>   			 */
>   		}
> -		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +		cache = fetch_pptt_cache(table_hdr, cache->f.next_level_of_cache);
>   	}
>   	return local_level;
>   }
>   
> -static struct acpi_pptt_cache *
> +static struct acpi_pptt_cache_v1_full *
>   acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   		      struct acpi_pptt_processor *cpu_node,
>   		      unsigned int *starting_level, unsigned int *split_levels,
> @@ -152,7 +171,7 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   	struct acpi_subtable_header *res;
>   	unsigned int number_of_levels = *starting_level;
>   	int resource = 0;
> -	struct acpi_pptt_cache *ret = NULL;
> +	struct acpi_pptt_cache_v1_full *ret = NULL;
>   	unsigned int local_level;
>   
>   	/* walk down from processor node */
> @@ -324,14 +343,14 @@ static u8 acpi_cache_type(enum cache_type type)
>   	}
>   }
>   
> -static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *table_hdr,
> -						    u32 acpi_cpu_id,
> -						    enum cache_type type,
> -						    unsigned int level,
> -						    struct acpi_pptt_processor **node)
> +static struct acpi_pptt_cache_v1_full *acpi_find_cache_node(struct acpi_table_header *table_hdr,
> +							    u32 acpi_cpu_id,
> +							    enum cache_type type,
> +							    unsigned int level,
> +							    struct acpi_pptt_processor **node)
>   {
>   	unsigned int total_levels = 0;
> -	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_cache_v1_full *found = NULL;
>   	struct acpi_pptt_processor *cpu_node;
>   	u8 acpi_type = acpi_cache_type(type);
>   
> @@ -355,7 +374,6 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>    * @this_leaf: Kernel cache info structure being updated
>    * @found_cache: The PPTT node describing this cache instance
>    * @cpu_node: A unique reference to describe this cache instance
> - * @revision: The revision of the PPTT table
>    *
>    * The ACPI spec implies that the fields in the cache structures are used to
>    * extend and correct the information probed from the hardware. Lets only
> @@ -364,23 +382,20 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>    * Return: nothing. Side effect of updating the global cacheinfo
>    */
>   static void update_cache_properties(struct cacheinfo *this_leaf,
> -				    struct acpi_pptt_cache *found_cache,
> -				    struct acpi_pptt_processor *cpu_node,
> -				    u8 revision)
> +				    struct acpi_pptt_cache_v1_full *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
>   {
> -	struct acpi_pptt_cache_v1* found_cache_v1;
> -
>   	this_leaf->fw_token = cpu_node;
> -	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> -		this_leaf->size = found_cache->size;
> -	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> -		this_leaf->coherency_line_size = found_cache->line_size;
> -	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> -		this_leaf->number_of_sets = found_cache->number_of_sets;
> -	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> -		this_leaf->ways_of_associativity = found_cache->associativity;
> -	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> -		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +	if (found_cache->f.flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->f.size;
> +	if (found_cache->f.flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->f.line_size;
> +	if (found_cache->f.flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->f.number_of_sets;
> +	if (found_cache->f.flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->f.associativity;
> +	if (found_cache->f.flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>   		case ACPI_PPTT_CACHE_POLICY_WT:
>   			this_leaf->attributes = CACHE_WRITE_THROUGH;
>   			break;
> @@ -389,8 +404,8 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>   			break;
>   		}
>   	}
> -	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> -		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +	if (found_cache->f.flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>   		case ACPI_PPTT_CACHE_READ_ALLOCATE:
>   			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>   			break;
> @@ -415,13 +430,11 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>   	 * specified in PPTT.
>   	 */
>   	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
> -	    found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)
> +	    found_cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)
>   		this_leaf->type = CACHE_TYPE_UNIFIED;
>   
> -	if (revision >= 3 && (found_cache->flags & ACPI_PPTT_CACHE_ID_VALID)) {
> -		found_cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> -	                                      found_cache, sizeof(struct acpi_pptt_cache));
> -		this_leaf->id = found_cache_v1->cache_id;
> +	if (acpi_pptt_cache_id_is_valid(found_cache)) {
> +		this_leaf->id = found_cache->extra.cache_id;
>   		this_leaf->attributes |= CACHE_ID;
>   	}
>   }
> @@ -429,7 +442,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>   static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>   				 unsigned int cpu)
>   {
> -	struct acpi_pptt_cache *found_cache;
> +	struct acpi_pptt_cache_v1_full *found_cache;
>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>   	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>   	struct cacheinfo *this_leaf;
> @@ -445,8 +458,7 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>   		pr_debug("found = %p %p\n", found_cache, cpu_node);
>   		if (found_cache)
>   			update_cache_properties(this_leaf, found_cache,
> -						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)),
> -						table->revision);
> +						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)));
>   
>   		index++;
>   	}

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-08  4:54   ` Gavin Shan
@ 2025-11-10 15:51     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-10 15:51 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Gavin,

On 11/8/25 04:54, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
>> cache type structure. In PPTT table version 3 a new field was added at
>> the
>> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
>> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
>> the existing code to use this new struct. This simplifies the code,
>> removes
>> a non-standard use of ACPI_ADD_PTR and allows using the length in the
>> header to check if the cache_id is valid.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> New patch
>> ---
>>   drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
>>   1 file changed, 58 insertions(+), 46 deletions(-)
>>
> 
> Two nitpicks below. LGTM in either way.
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 1027ca3566b1..1ed2099c0d1a 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -21,6 +21,11 @@
>>   #include <linux/cacheinfo.h>
>>   #include <acpi/processor.h>
>>   +struct acpi_pptt_cache_v1_full {
>> +    struct acpi_pptt_cache        f;
>> +    struct acpi_pptt_cache_v1    extra;
>> +} __packed;
>> +
>>   static struct acpi_subtable_header *fetch_pptt_subtable(struct
>> acpi_table_header *table_hdr,
>>                               u32 pptt_ref)
>>   {
>> @@ -50,10 +55,24 @@ static struct acpi_pptt_processor
>> *fetch_pptt_node(struct acpi_table_header *tab
>>       return (struct acpi_pptt_processor
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>   }
>>   -static struct acpi_pptt_cache *fetch_pptt_cache(struct
>> acpi_table_header *table_hdr,
>> -                        u32 pptt_ref)
>> +static struct acpi_pptt_cache_v1_full *fetch_pptt_cache(struct
>> acpi_table_header *table_hdr,
>> +                            u32 pptt_ref)
>>   {
>> -    return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
>> pptt_ref);
>> +    return (struct acpi_pptt_cache_v1_full
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +#define ACPI_PPTT_CACHE_V1_LEN sizeof(struct acpi_pptt_cache_v1_full)
>> +
>> +/*
>> + * From PPTT table version 3, a new field cache_id was added at the
>> end of
>> + * the cache type structure.  We now use struct acpi_pptt_cache_v1_full,
>> + * containing the cache_id, everywhere but must check validity before
>> accessing
>> + * the cache_id.
>> + */
>> +static bool acpi_pptt_cache_id_is_valid(struct
>> acpi_pptt_cache_v1_full *cache)
>> +{
>> +    return (cache->f.header.length >= ACPI_PPTT_CACHE_V1_LEN &&
>> +        cache->f.flags & ACPI_PPTT_CACHE_ID_VALID);
>>   }
>>   
> 
> This function is nice fit to 'inline'. Besides, I'm not sure if we can just
> use sizeof(*cache) instead of ACPI_PPTT_CACHE_V1_LEN, which is used for
> once
> in pptt.c

Yes, the define is unnecessary and the function can be inlined. Thanks
for pointing it out. I'm likely to rework this patch though.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
  2025-11-08  4:54   ` Gavin Shan
@ 2025-11-10 15:46   ` Jonathan Cameron
  2025-11-10 16:28     ` Ben Horgan
  2025-11-10 17:00   ` Jeremy Linton
  2025-11-12 20:22   ` Fenghua Yu
  3 siblings, 1 reply; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 15:46 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Fri, 7 Nov 2025 12:34:20 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
> cache type structure. In PPTT table version 3 a new field was added at the
> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
> the existing code to use this new struct. This simplifies the code, removes
> a non-standard use of ACPI_ADD_PTR and allows using the length in the
> header to check if the cache_id is valid.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Whilst I wish the ACPICA stuff did structures like this, I'm not sure
if the ACPI maintainers will feel it is appropriate to work around it
with generic sounding structures like this one.

I'd also say that we should only cast it to your _full structure
if we know we have rev 3 of PPTT.  Otherwise we should continue manipulating
it as a struct acpi_pptt_cache

> ---
> Changes since v3:
> New patch
> ---
>  drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
>  1 file changed, 58 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 1027ca3566b1..1ed2099c0d1a 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -21,6 +21,11 @@
>  #include <linux/cacheinfo.h>
>  #include <acpi/processor.h>
>  
> +struct acpi_pptt_cache_v1_full {
> +	struct acpi_pptt_cache		f;
> +	struct acpi_pptt_cache_v1	extra;
> +} __packed;

> +#define ACPI_PPTT_CACHE_V1_LEN sizeof(struct acpi_pptt_cache_v1_full)
> +
> +/*
> + * From PPTT table version 3, a new field cache_id was added at the end of
> + * the cache type structure.  We now use struct acpi_pptt_cache_v1_full,
> + * containing the cache_id, everywhere but must check validity before accessing
> + * the cache_id.
> + */
> +static bool acpi_pptt_cache_id_is_valid(struct acpi_pptt_cache_v1_full *cache)
> +{
> +	return (cache->f.header.length >= ACPI_PPTT_CACHE_V1_LEN &&

Although I later say I don't think you should pass a v1_full structure in here (as
we don't know it is at least that large until after this check) if you do keep this
why not use sizeof(*cache) and get rid of the V1_LEN definition as providing no obvious
value here?

> +		cache->f.flags & ACPI_PPTT_CACHE_ID_VALID);
>  }

> @@ -355,7 +374,6 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>   * @this_leaf: Kernel cache info structure being updated
>   * @found_cache: The PPTT node describing this cache instance
>   * @cpu_node: A unique reference to describe this cache instance
> - * @revision: The revision of the PPTT table
>   *
>   * The ACPI spec implies that the fields in the cache structures are used to
>   * extend and correct the information probed from the hardware. Lets only
> @@ -364,23 +382,20 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>   * Return: nothing. Side effect of updating the global cacheinfo
>   */
>  static void update_cache_properties(struct cacheinfo *this_leaf,
> -				    struct acpi_pptt_cache *found_cache,
> -				    struct acpi_pptt_processor *cpu_node,
> -				    u8 revision)
> +				    struct acpi_pptt_cache_v1_full *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
>  {
> -	struct acpi_pptt_cache_v1* found_cache_v1;
> -
>  	this_leaf->fw_token = cpu_node;
> -	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> -		this_leaf->size = found_cache->size;
> -	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> -		this_leaf->coherency_line_size = found_cache->line_size;
> -	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> -		this_leaf->number_of_sets = found_cache->number_of_sets;
> -	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> -		this_leaf->ways_of_associativity = found_cache->associativity;
> -	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> -		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +	if (found_cache->f.flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->f.size;
> +	if (found_cache->f.flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->f.line_size;
> +	if (found_cache->f.flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->f.number_of_sets;
> +	if (found_cache->f.flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->f.associativity;
> +	if (found_cache->f.flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>  		case ACPI_PPTT_CACHE_POLICY_WT:
>  			this_leaf->attributes = CACHE_WRITE_THROUGH;
>  			break;
> @@ -389,8 +404,8 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  			break;
>  		}
>  	}
> -	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> -		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +	if (found_cache->f.flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>  		case ACPI_PPTT_CACHE_READ_ALLOCATE:
>  			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>  			break;
> @@ -415,13 +430,11 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  	 * specified in PPTT.
>  	 */
>  	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
> -	    found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)
> +	    found_cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)
>  		this_leaf->type = CACHE_TYPE_UNIFIED;
>  
> -	if (revision >= 3 && (found_cache->flags & ACPI_PPTT_CACHE_ID_VALID)) {
> -		found_cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> -	                                      found_cache, sizeof(struct acpi_pptt_cache));
> -		this_leaf->id = found_cache_v1->cache_id;
> +	if (acpi_pptt_cache_id_is_valid(found_cache)) {

Only here do we know that found_cache is the _full type. 

> +		this_leaf->id = found_cache->extra.cache_id;
>  		this_leaf->attributes |= CACHE_ID;
>  	}
>  }
> @@ -429,7 +442,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>  				 unsigned int cpu)
>  {
> -	struct acpi_pptt_cache *found_cache;
> +	struct acpi_pptt_cache_v1_full *found_cache;

This isn't necessarily valid. Until deep in update_cache_properties() we don't care about the ID
so this structure may be smaller than this implies.

>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>  	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>  	struct cacheinfo *this_leaf;
> @@ -445,8 +458,7 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>  		pr_debug("found = %p %p\n", found_cache, cpu_node);
>  		if (found_cache)
>  			update_cache_properties(this_leaf, found_cache,
> -						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)),
> -						table->revision);
> +						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)));
>  
>  		index++;
>  	}


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-10 15:46   ` Jonathan Cameron
@ 2025-11-10 16:28     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-10 16:28 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Jonathan,

On 11/10/25 15:46, Jonathan Cameron wrote:
> On Fri, 7 Nov 2025 12:34:20 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
>> cache type structure. In PPTT table version 3 a new field was added at the
>> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
>> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
>> the existing code to use this new struct. This simplifies the code, removes
>> a non-standard use of ACPI_ADD_PTR and allows using the length in the
>> header to check if the cache_id is valid.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> 
> Whilst I wish the ACPICA stuff did structures like this, I'm not sure
> if the ACPI maintainers will feel it is appropriate to work around it
> with generic sounding structures like this one.
> 
> I'd also say that we should only cast it to your _full structure
> if we know we have rev 3 of PPTT.  Otherwise we should continue manipulating
> it as a struct acpi_pptt_cache

Fair enough. My thinking was that you had to check the valid flag anyway
to use cache_id but it's less robust. I'll delay the casting to later
which IIUC is what Jeremy Linton suggested offline.

> 
>> ---
>> Changes since v3:
>> New patch
>> ---
>>  drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
>>  1 file changed, 58 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 1027ca3566b1..1ed2099c0d1a 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -21,6 +21,11 @@
>>  #include <linux/cacheinfo.h>
>>  #include <acpi/processor.h>
>>  
>> +struct acpi_pptt_cache_v1_full {
>> +	struct acpi_pptt_cache		f;
>> +	struct acpi_pptt_cache_v1	extra;
>> +} __packed;
> 
>> +#define ACPI_PPTT_CACHE_V1_LEN sizeof(struct acpi_pptt_cache_v1_full)
>> +
>> +/*
>> + * From PPTT table version 3, a new field cache_id was added at the end of
>> + * the cache type structure.  We now use struct acpi_pptt_cache_v1_full,
>> + * containing the cache_id, everywhere but must check validity before accessing
>> + * the cache_id.
>> + */
>> +static bool acpi_pptt_cache_id_is_valid(struct acpi_pptt_cache_v1_full *cache)
>> +{
>> +	return (cache->f.header.length >= ACPI_PPTT_CACHE_V1_LEN &&
> 
> Although I later say I don't think you should pass a v1_full structure in here (as
> we don't know it is at least that large until after this check) if you do keep this
> why not use sizeof(*cache) and get rid of the V1_LEN definition as providing no obvious
> value here?

Yes, the define was never needed.

> 
>> +		cache->f.flags & ACPI_PPTT_CACHE_ID_VALID);
>>  }
> 
>> @@ -355,7 +374,6 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>>   * @this_leaf: Kernel cache info structure being updated
>>   * @found_cache: The PPTT node describing this cache instance
>>   * @cpu_node: A unique reference to describe this cache instance
>> - * @revision: The revision of the PPTT table
>>   *
>>   * The ACPI spec implies that the fields in the cache structures are used to
>>   * extend and correct the information probed from the hardware. Lets only
>> @@ -364,23 +382,20 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>>   * Return: nothing. Side effect of updating the global cacheinfo
>>   */
>>  static void update_cache_properties(struct cacheinfo *this_leaf,
>> -				    struct acpi_pptt_cache *found_cache,
>> -				    struct acpi_pptt_processor *cpu_node,
>> -				    u8 revision)
>> +				    struct acpi_pptt_cache_v1_full *found_cache,
>> +				    struct acpi_pptt_processor *cpu_node)
>>  {
>> -	struct acpi_pptt_cache_v1* found_cache_v1;
>> -
>>  	this_leaf->fw_token = cpu_node;
>> -	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> -		this_leaf->size = found_cache->size;
>> -	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> -		this_leaf->coherency_line_size = found_cache->line_size;
>> -	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> -		this_leaf->number_of_sets = found_cache->number_of_sets;
>> -	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> -		this_leaf->ways_of_associativity = found_cache->associativity;
>> -	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
>> -		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +	if (found_cache->f.flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +		this_leaf->size = found_cache->f.size;
>> +	if (found_cache->f.flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +		this_leaf->coherency_line_size = found_cache->f.line_size;
>> +	if (found_cache->f.flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +		this_leaf->number_of_sets = found_cache->f.number_of_sets;
>> +	if (found_cache->f.flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +		this_leaf->ways_of_associativity = found_cache->f.associativity;
>> +	if (found_cache->f.flags & ACPI_PPTT_WRITE_POLICY_VALID) {
>> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>>  		case ACPI_PPTT_CACHE_POLICY_WT:
>>  			this_leaf->attributes = CACHE_WRITE_THROUGH;
>>  			break;
>> @@ -389,8 +404,8 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>  			break;
>>  		}
>>  	}
>> -	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
>> -		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +	if (found_cache->f.flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
>> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>>  		case ACPI_PPTT_CACHE_READ_ALLOCATE:
>>  			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>>  			break;
>> @@ -415,13 +430,11 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>  	 * specified in PPTT.
>>  	 */
>>  	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
>> -	    found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)
>> +	    found_cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)
>>  		this_leaf->type = CACHE_TYPE_UNIFIED;
>>  
>> -	if (revision >= 3 && (found_cache->flags & ACPI_PPTT_CACHE_ID_VALID)) {
>> -		found_cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> -	                                      found_cache, sizeof(struct acpi_pptt_cache));
>> -		this_leaf->id = found_cache_v1->cache_id;
>> +	if (acpi_pptt_cache_id_is_valid(found_cache)) {
> 
> Only here do we know that found_cache is the _full type. 
> 
>> +		this_leaf->id = found_cache->extra.cache_id;
>>  		this_leaf->attributes |= CACHE_ID;
>>  	}
>>  }
>> @@ -429,7 +442,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>  static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>>  				 unsigned int cpu)
>>  {
>> -	struct acpi_pptt_cache *found_cache;
>> +	struct acpi_pptt_cache_v1_full *found_cache;
> 
> This isn't necessarily valid. Until deep in update_cache_properties() we don't care about the ID
> so this structure may be smaller than this implies.
> 
>>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>>  	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>>  	struct cacheinfo *this_leaf;
>> @@ -445,8 +458,7 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>>  		pr_debug("found = %p %p\n", found_cache, cpu_node);
>>  		if (found_cache)
>>  			update_cache_properties(this_leaf, found_cache,
>> -						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)),
>> -						table->revision);
>> +						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)));
>>  
>>  		index++;
>>  	}
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
  2025-11-08  4:54   ` Gavin Shan
  2025-11-10 15:46   ` Jonathan Cameron
@ 2025-11-10 17:00   ` Jeremy Linton
  2025-11-11 16:48     ` Ben Horgan
  2025-11-12 20:22   ` Fenghua Yu
  3 siblings, 1 reply; 147+ messages in thread
From: Jeremy Linton @ 2025-11-10 17:00 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi,

On 11/7/25 6:34 AM, Ben Horgan wrote:
> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
> cache type structure. In PPTT table version 3 a new field was added at the
> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
> the existing code to use this new struct. This simplifies the code, removes
> a non-standard use of ACPI_ADD_PTR and allows using the length in the
> header to check if the cache_id is valid.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> New patch
> ---
>   drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
>   1 file changed, 58 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 1027ca3566b1..1ed2099c0d1a 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -21,6 +21,11 @@
>   #include <linux/cacheinfo.h>
>   #include <acpi/processor.h>
>   
> +struct acpi_pptt_cache_v1_full {
> +	struct acpi_pptt_cache		f;
> +	struct acpi_pptt_cache_v1	extra;
> +} __packed;

This presumably won't match an acpia change, right? Those structures 
appear to repeat the fields in the newer structure definitions.

Maybe its best to keep this as close to an acpica change and do a quick 
patch posting for acpica to assure they are onboard with the eventual 
structure (IIRC it was fast a few years ago when I had a similar problem).

That would avoid a bunch of the churn here of adding the 'f'/'extra' 
dereferene which would then potentailly have to be reverted at some 
point when acpica corrects the original structure.



> +
>   static struct acpi_subtable_header *fetch_pptt_subtable(struct acpi_table_header *table_hdr,
>   							u32 pptt_ref)
>   {
> @@ -50,10 +55,24 @@ static struct acpi_pptt_processor *fetch_pptt_node(struct acpi_table_header *tab
>   	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
>   }
>   
> -static struct acpi_pptt_cache *fetch_pptt_cache(struct acpi_table_header *table_hdr,
> -						u32 pptt_ref)
> +static struct acpi_pptt_cache_v1_full *fetch_pptt_cache(struct acpi_table_header *table_hdr,
> +							u32 pptt_ref)
>   {
> -	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +	return (struct acpi_pptt_cache_v1_full *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +#define ACPI_PPTT_CACHE_V1_LEN sizeof(struct acpi_pptt_cache_v1_full)
> +
> +/*
> + * From PPTT table version 3, a new field cache_id was added at the end of
> + * the cache type structure.  We now use struct acpi_pptt_cache_v1_full,
> + * containing the cache_id, everywhere but must check validity before accessing
> + * the cache_id.
> + */
> +static bool acpi_pptt_cache_id_is_valid(struct acpi_pptt_cache_v1_full *cache)
> +{
> +	return (cache->f.header.length >= ACPI_PPTT_CACHE_V1_LEN &&
> +		cache->f.flags & ACPI_PPTT_CACHE_ID_VALID);
>   }
>   
>   static struct acpi_subtable_header *acpi_get_pptt_resource(struct acpi_table_header *table_hdr,
> @@ -103,30 +122,30 @@ static unsigned int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>   					 unsigned int local_level,
>   					 unsigned int *split_levels,
>   					 struct acpi_subtable_header *res,
> -					 struct acpi_pptt_cache **found,
> +					 struct acpi_pptt_cache_v1_full **found,
>   					 unsigned int level, int type)
>   {
> -	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1_full *cache;
>   
>   	if (res->type != ACPI_PPTT_TYPE_CACHE)
>   		return 0;
>   
> -	cache = (struct acpi_pptt_cache *) res;
> +	cache = (struct acpi_pptt_cache_v1_full *)res;
>   	while (cache) {
>   		local_level++;
>   
> -		if (!(cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
> -			cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +		if (!(cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)) {
> +			cache = fetch_pptt_cache(table_hdr, cache->f.next_level_of_cache);
>   			continue;
>   		}
>   
>   		if (split_levels &&
> -		    (acpi_pptt_match_type(cache->attributes, ACPI_PPTT_CACHE_TYPE_DATA) ||
> -		     acpi_pptt_match_type(cache->attributes, ACPI_PPTT_CACHE_TYPE_INSTR)))
> +		    (acpi_pptt_match_type(cache->f.attributes, ACPI_PPTT_CACHE_TYPE_DATA) ||
> +		     acpi_pptt_match_type(cache->f.attributes, ACPI_PPTT_CACHE_TYPE_INSTR)))
>   			*split_levels = local_level;
>   
>   		if (local_level == level &&
> -		    acpi_pptt_match_type(cache->attributes, type)) {
> +		    acpi_pptt_match_type(cache->f.attributes, type)) {
>   			if (*found != NULL && cache != *found)
>   				pr_warn("Found duplicate cache level/type unable to determine uniqueness\n");
>   
> @@ -138,12 +157,12 @@ static unsigned int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>   			 * cache node.
>   			 */
>   		}
> -		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +		cache = fetch_pptt_cache(table_hdr, cache->f.next_level_of_cache);
>   	}
>   	return local_level;
>   }
>   
> -static struct acpi_pptt_cache *
> +static struct acpi_pptt_cache_v1_full *
>   acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   		      struct acpi_pptt_processor *cpu_node,
>   		      unsigned int *starting_level, unsigned int *split_levels,
> @@ -152,7 +171,7 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   	struct acpi_subtable_header *res;
>   	unsigned int number_of_levels = *starting_level;
>   	int resource = 0;
> -	struct acpi_pptt_cache *ret = NULL;
> +	struct acpi_pptt_cache_v1_full *ret = NULL;
>   	unsigned int local_level;
>   
>   	/* walk down from processor node */
> @@ -324,14 +343,14 @@ static u8 acpi_cache_type(enum cache_type type)
>   	}
>   }
>   
> -static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *table_hdr,
> -						    u32 acpi_cpu_id,
> -						    enum cache_type type,
> -						    unsigned int level,
> -						    struct acpi_pptt_processor **node)
> +static struct acpi_pptt_cache_v1_full *acpi_find_cache_node(struct acpi_table_header *table_hdr,
> +							    u32 acpi_cpu_id,
> +							    enum cache_type type,
> +							    unsigned int level,
> +							    struct acpi_pptt_processor **node)
>   {
>   	unsigned int total_levels = 0;
> -	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_cache_v1_full *found = NULL;
>   	struct acpi_pptt_processor *cpu_node;
>   	u8 acpi_type = acpi_cache_type(type);
>   
> @@ -355,7 +374,6 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>    * @this_leaf: Kernel cache info structure being updated
>    * @found_cache: The PPTT node describing this cache instance
>    * @cpu_node: A unique reference to describe this cache instance
> - * @revision: The revision of the PPTT table
>    *
>    * The ACPI spec implies that the fields in the cache structures are used to
>    * extend and correct the information probed from the hardware. Lets only
> @@ -364,23 +382,20 @@ static struct acpi_pptt_cache *acpi_find_cache_node(struct acpi_table_header *ta
>    * Return: nothing. Side effect of updating the global cacheinfo
>    */
>   static void update_cache_properties(struct cacheinfo *this_leaf,
> -				    struct acpi_pptt_cache *found_cache,
> -				    struct acpi_pptt_processor *cpu_node,
> -				    u8 revision)
> +				    struct acpi_pptt_cache_v1_full *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
>   {
> -	struct acpi_pptt_cache_v1* found_cache_v1;
> -
>   	this_leaf->fw_token = cpu_node;
> -	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> -		this_leaf->size = found_cache->size;
> -	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> -		this_leaf->coherency_line_size = found_cache->line_size;
> -	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> -		this_leaf->number_of_sets = found_cache->number_of_sets;
> -	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> -		this_leaf->ways_of_associativity = found_cache->associativity;
> -	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> -		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +	if (found_cache->f.flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->f.size;
> +	if (found_cache->f.flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->f.line_size;
> +	if (found_cache->f.flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->f.number_of_sets;
> +	if (found_cache->f.flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->f.associativity;
> +	if (found_cache->f.flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>   		case ACPI_PPTT_CACHE_POLICY_WT:
>   			this_leaf->attributes = CACHE_WRITE_THROUGH;
>   			break;
> @@ -389,8 +404,8 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>   			break;
>   		}
>   	}
> -	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> -		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +	if (found_cache->f.flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> +		switch (found_cache->f.attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>   		case ACPI_PPTT_CACHE_READ_ALLOCATE:
>   			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>   			break;
> @@ -415,13 +430,11 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>   	 * specified in PPTT.
>   	 */
>   	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
> -	    found_cache->flags & ACPI_PPTT_CACHE_TYPE_VALID)
> +	    found_cache->f.flags & ACPI_PPTT_CACHE_TYPE_VALID)
>   		this_leaf->type = CACHE_TYPE_UNIFIED;
>   
> -	if (revision >= 3 && (found_cache->flags & ACPI_PPTT_CACHE_ID_VALID)) {
> -		found_cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> -	                                      found_cache, sizeof(struct acpi_pptt_cache));
> -		this_leaf->id = found_cache_v1->cache_id;
> +	if (acpi_pptt_cache_id_is_valid(found_cache)) {
> +		this_leaf->id = found_cache->extra.cache_id;
>   		this_leaf->attributes |= CACHE_ID;
>   	}
>   }
> @@ -429,7 +442,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>   static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>   				 unsigned int cpu)
>   {
> -	struct acpi_pptt_cache *found_cache;
> +	struct acpi_pptt_cache_v1_full *found_cache;
>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>   	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>   	struct cacheinfo *this_leaf;
> @@ -445,8 +458,7 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>   		pr_debug("found = %p %p\n", found_cache, cpu_node);
>   		if (found_cache)
>   			update_cache_properties(this_leaf, found_cache,
> -						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)),
> -						table->revision);
> +						ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)));
>   
>   		index++;
>   	}


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-10 17:00   ` Jeremy Linton
@ 2025-11-11 16:48     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-11 16:48 UTC (permalink / raw)
  To: Jeremy Linton, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jonathan.cameron, kobak, lcherian, lenb,
	linux-acpi, linux-arm-kernel, linux-kernel, lpieralisi,
	peternewman, quic_jiles, rafael, robh, rohit.mathew, scott,
	sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Jeremy,

On 11/10/25 17:00, Jeremy Linton wrote:
> Hi,
> 
> On 11/7/25 6:34 AM, Ben Horgan wrote:
>> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
>> cache type structure. In PPTT table version 3 a new field was added at
>> the
>> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
>> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
>> the existing code to use this new struct. This simplifies the code,
>> removes
>> a non-standard use of ACPI_ADD_PTR and allows using the length in the
>> header to check if the cache_id is valid.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> New patch
>> ---
>>   drivers/acpi/pptt.c | 104 ++++++++++++++++++++++++--------------------
>>   1 file changed, 58 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 1027ca3566b1..1ed2099c0d1a 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -21,6 +21,11 @@
>>   #include <linux/cacheinfo.h>
>>   #include <acpi/processor.h>
>>   +struct acpi_pptt_cache_v1_full {
>> +    struct acpi_pptt_cache        f;
>> +    struct acpi_pptt_cache_v1    extra;
>> +} __packed;
> 
> This presumably won't match an acpia change, right? Those structures
> appear to repeat the fields in the newer structure definitions.
> 
> Maybe its best to keep this as close to an acpica change and do a quick
> patch posting for acpica to assure they are onboard with the eventual
> structure (IIRC it was fast a few years ago when I had a similar problem).
> 
> That would avoid a bunch of the churn here of adding the 'f'/'extra'
> dereferene which would then potentailly have to be reverted at some
> point when acpica corrects the original structure.
I've created a pull request on their github:
https://github.com/acpica/acpica/pull/1059. This extends 'struct
acpi_pptt_cache_v1' to include all the fields of the Cache Type
Structure. I think this could be acceptable as there are other commits
in the history which make breaking changes to structures in the headers.
Let's see what they say. I got an immediate reply in Chinese but was
just an out of office.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
  2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
                     ` (2 preceding siblings ...)
  2025-11-10 17:00   ` Jeremy Linton
@ 2025-11-12 20:22   ` Fenghua Yu
  3 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-12 20:22 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 11/7/25 04:34, Ben Horgan wrote:
> In actbl2.h, struct acpi_pptt_cache describes the fields in the original
> cache type structure. In PPTT table version 3 a new field was added at the
> end, cache_id. This is described in struct acpi_pptt_cache_v1. Introduce
> the new, acpi_pptt_cache_v1_full to contain both these structures. Update
> the existing code to use this new struct. This simplifies the code, removes
> a non-standard use of ACPI_ADD_PTR and allows using the length in the
> header to check if the cache_id is valid.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (2 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-08  5:11   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id Ben Horgan
                   ` (35 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

From: James Morse <james.morse@arm.com>

The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.

Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.

Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Tags dropped due to rework
Fallout/simplification from adding acpi_pptt_cache_v1_full
Look for each cache type before incrementing level
---
 drivers/acpi/pptt.c  | 63 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  5 ++++
 2 files changed, 68 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 1ed2099c0d1a..71841c106020 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -918,3 +918,66 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
 				     entry->length);
 	}
 }
+
+/**
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the cache
+ *
+ * Determine the level relative to any CPU for the cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later.
+ *
+ * If one CPU's L2 is shared with another CPU as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
+ * the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	int cpu;
+	struct acpi_table_header *table;
+
+	table = acpi_get_pptt();
+	if (!table)
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	for_each_possible_cpu(cpu) {
+		bool not_empty = true;
+		u32 acpi_cpu_id;
+		struct acpi_pptt_cache_v1_full *cache;
+		struct acpi_pptt_processor *cpu_node;
+
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			continue;
+
+		for (int level = 1; not_empty; level++) {
+			int cache_type[] = {CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED};
+
+			not_empty = false;
+			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
+				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
+							     level, &cpu_node);
+				if (!cache)
+					continue;
+
+				not_empty = true;
+
+				if (acpi_pptt_cache_id_is_valid(cache) &&
+				    cache->extra.cache_id == cache_id)
+					return level;
+			}
+		}
+	}
+
+	return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4752ebd48132..be074bdfd4d1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1542,6 +1542,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1565,6 +1566,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
 						     cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	return -ENOENT;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id
  2025-11-07 12:34 ` [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id Ben Horgan
@ 2025-11-08  5:11   ` Gavin Shan
  2025-11-10 16:02   ` Jonathan Cameron
  2025-11-12 20:23   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-08  5:11 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Tags dropped due to rework
> Fallout/simplification from adding acpi_pptt_cache_v1_full
> Look for each cache type before incrementing level
> ---
>   drivers/acpi/pptt.c  | 63 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  5 ++++
>   2 files changed, 68 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id
  2025-11-07 12:34 ` [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id Ben Horgan
  2025-11-08  5:11   ` Gavin Shan
@ 2025-11-10 16:02   ` Jonathan Cameron
  2025-11-11 17:02     ` Ben Horgan
  2025-11-12 20:23   ` Fenghua Yu
  2 siblings, 1 reply; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:02 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Fri, 7 Nov 2025 12:34:21 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

A few things inline.

> ---
> Changes since v3:
> Tags dropped due to rework
> Fallout/simplification from adding acpi_pptt_cache_v1_full
> Look for each cache type before incrementing level
> ---
>  drivers/acpi/pptt.c  | 63 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  5 ++++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 1ed2099c0d1a..71841c106020 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -918,3 +918,66 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>  				     entry->length);
>  	}
>  }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the cache
> + *
> + * Determine the level relative to any CPU for the cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later.
> + *
> + * If one CPU's L2 is shared with another CPU as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
> + * the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	int cpu;
> +	struct acpi_table_header *table;
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		bool not_empty = true;
> +		u32 acpi_cpu_id;
> +		struct acpi_pptt_cache_v1_full *cache;
> +		struct acpi_pptt_processor *cpu_node;
> +
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);

Might as well combine this one with declaration.

> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			continue;
> +
> +		for (int level = 1; not_empty; level++) {

This smells very much like a while loop rather than a for loop. Make
it a do/while and you can avoid the somewhat nasty setting not_empty = true
just to get in for first iteration.

		int level = 1;
		do {
			int cache_type[] = { CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED };

			not_empty = false;
			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
							     level, &cpu_node);
				if (!cache)
					continue;

				not_empty = true;

				if (acpi_pptt_cache_id_is_valid(cache) &&
				    cache->extra.cache_id == cache_id)
					return level;
			}
		} while (not_empty);

Maybe flip sense of that bool to be empty and !empty for the test.


> +			int cache_type[] = {CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED};
> +
> +			not_empty = false;
> +			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
> +				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
> +							     level, &cpu_node);
> +				if (!cache)
> +					continue;
> +
> +				not_empty = true;
> +
> +				if (acpi_pptt_cache_id_is_valid(cache) &&
> +				    cache->extra.cache_id == cache_id)
> +					return level;
> +			}
> +		}
> +	}
> +
> +	return -ENOENT;
> +}



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id
  2025-11-10 16:02   ` Jonathan Cameron
@ 2025-11-11 17:02     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-11 17:02 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Jonathan,

On 11/10/25 16:02, Jonathan Cameron wrote:
> On Fri, 7 Nov 2025 12:34:21 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> From: James Morse <james.morse@arm.com>
>>
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> 
> A few things inline.
> 
>> ---
>> Changes since v3:
>> Tags dropped due to rework
>> Fallout/simplification from adding acpi_pptt_cache_v1_full
>> Look for each cache type before incrementing level
>> ---
>>  drivers/acpi/pptt.c  | 63 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/acpi.h |  5 ++++
>>  2 files changed, 68 insertions(+)
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 1ed2099c0d1a..71841c106020 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -918,3 +918,66 @@ void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>>  				     entry->length);
>>  	}
>>  }
>> +
>> +/**
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the cache
>> + *
>> + * Determine the level relative to any CPU for the cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later.
>> + *
>> + * If one CPU's L2 is shared with another CPU as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, the revision isn't supported or
>> + * the cache cannot be found.
>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	int cpu;
>> +	struct acpi_table_header *table;
>> +
>> +	table = acpi_get_pptt();
>> +	if (!table)
>> +		return -ENOENT;
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		bool not_empty = true;
>> +		u32 acpi_cpu_id;
>> +		struct acpi_pptt_cache_v1_full *cache;
>> +		struct acpi_pptt_processor *cpu_node;
>> +
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> 
> Might as well combine this one with declaration.

Will do.

> 
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			continue;
>> +
>> +		for (int level = 1; not_empty; level++) {
> 
> This smells very much like a while loop rather than a for loop. Make
> it a do/while and you can avoid the somewhat nasty setting not_empty = true
> just to get in for first iteration.
> 
> 		int level = 1;
> 		do {
> 			int cache_type[] = { CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED };
> 
> 			not_empty = false;
> 			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
> 				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
> 							     level, &cpu_node);
> 				if (!cache)
> 					continue;
> 
> 				not_empty = true;
> 
> 				if (acpi_pptt_cache_id_is_valid(cache) &&
> 				    cache->extra.cache_id == cache_id)
> 					return level;
> 			}
> 		} while (not_empty);
> 
> Maybe flip sense of that bool to be empty and !empty for the test.

Yes, this is better. I'll stop abusing the for loop in this patch and
the next. Changing not_empty to !empty make sense too.

> 
> 
>> +			int cache_type[] = {CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED};
>> +
>> +			not_empty = false;
>> +			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
>> +				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
>> +							     level, &cpu_node);
>> +				if (!cache)
>> +					continue;
>> +
>> +				not_empty = true;
>> +
>> +				if (acpi_pptt_cache_id_is_valid(cache) &&
>> +				    cache->extra.cache_id == cache_id)
>> +					return level;
>> +			}
>> +		}
>> +	}
>> +
>> +	return -ENOENT;
>> +}
> 
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id
  2025-11-07 12:34 ` [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id Ben Horgan
  2025-11-08  5:11   ` Gavin Shan
  2025-11-10 16:02   ` Jonathan Cameron
@ 2025-11-12 20:23   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-12 20:23 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (3 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-08  5:10   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 06/33] arm64: kconfig: Add Kconfig entry for MPAM Ben Horgan
                   ` (34 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Rohit Mathew, Ben Horgan

From: James Morse <james.morse@arm.com>

MPAM identifies CPUs by the cache_id in the PPTT cache structure.

The driver needs to know which CPUs are associated with the cache.
The CPUs may not all be online, so cacheinfo does not have the
information.

Add a helper to pull this information out of the PPTT.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Equivalent changes to the previous patch:
 Tags dropped due to rework
 Fallout/simplification from adding acpi_pptt_cache_v1_full
 Look for each cache type before incrementing level
---
 drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  6 +++++
 2 files changed, 68 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 71841c106020..7b4cb17c12c0 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -981,3 +981,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
 
 	return -ENOENT;
 }
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ *					   specified cache
+ * @cache_id: The id field of the cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+	int cpu;
+	struct acpi_table_header *table;
+
+	cpumask_clear(cpus);
+
+	table = acpi_get_pptt();
+	if (!table)
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	for_each_possible_cpu(cpu) {
+		bool not_empty = true;
+		u32 acpi_cpu_id;
+		struct acpi_pptt_cache_v1_full *cache;
+		struct acpi_pptt_processor *cpu_node;
+
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			continue;
+
+		for (int level = 1; not_empty; level++) {
+			int cache_type[] = {CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED};
+
+			not_empty = false;
+			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
+				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
+							     level, &cpu_node);
+
+				if (!cache)
+					continue;
+
+				not_empty = true;
+
+				if (acpi_pptt_cache_id_is_valid(cache) &&
+				    cache->extra.cache_id == cache_id)
+					cpumask_set_cpu(cpu, cpus);
+			}
+		}
+	}
+
+	return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index be074bdfd4d1..a9dbacabdf89 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1543,6 +1543,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1570,6 +1571,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
 {
 	return -ENOENT;
 }
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+						      cpumask_t *cpus)
+{
+	return -ENOENT;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-11-07 12:34 ` [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id Ben Horgan
@ 2025-11-08  5:10   ` Gavin Shan
  2025-11-10 16:04   ` Jonathan Cameron
  2025-11-12 20:26   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-08  5:10 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Equivalent changes to the previous patch:
>   Tags dropped due to rework
>   Fallout/simplification from adding acpi_pptt_cache_v1_full
>   Look for each cache type before incrementing level
> ---
>   drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/acpi.h |  6 +++++
>   2 files changed, 68 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-11-07 12:34 ` [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id Ben Horgan
  2025-11-08  5:10   ` Gavin Shan
@ 2025-11-10 16:04   ` Jonathan Cameron
  2025-11-12 20:26   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:04 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Fri, 7 Nov 2025 12:34:22 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Very similar comments to on previous.

> ---
> Changes since v3:
> Equivalent changes to the previous patch:
>  Tags dropped due to rework
>  Fallout/simplification from adding acpi_pptt_cache_v1_full
>  Look for each cache type before incrementing level
> ---
>  drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  6 +++++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 71841c106020..7b4cb17c12c0 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -981,3 +981,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>  
>  	return -ENOENT;
>  }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	int cpu;
> +	struct acpi_table_header *table;
> +
> +	cpumask_clear(cpus);
> +
> +	table = acpi_get_pptt();
> +	if (!table)
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	for_each_possible_cpu(cpu) {
> +		bool not_empty = true;

Basically same comments.  This dance of setting it not_empty to
get into the loop is a bit nasty as it means that variable is
briefly has an alternative meaning the name doesn't convey.

A do/while() doesn't have that problem as we always do one iteration.


> +		u32 acpi_cpu_id;
> +		struct acpi_pptt_cache_v1_full *cache;
> +		struct acpi_pptt_processor *cpu_node;
> +
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			continue;
> +
> +		for (int level = 1; not_empty; level++) {
> +			int cache_type[] = {CACHE_TYPE_INST, CACHE_TYPE_DATA, CACHE_TYPE_UNIFIED};
> +
> +			not_empty = false;
> +			for (int i = 0; i < ARRAY_SIZE(cache_type); i++) {
> +				cache = acpi_find_cache_node(table, acpi_cpu_id, cache_type[i],
> +							     level, &cpu_node);
> +
> +				if (!cache)
> +					continue;
> +
> +				not_empty = true;
> +
> +				if (acpi_pptt_cache_id_is_valid(cache) &&
> +				    cache->extra.cache_id == cache_id)
> +					cpumask_set_cpu(cpu, cpus);
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-11-07 12:34 ` [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id Ben Horgan
  2025-11-08  5:10   ` Gavin Shan
  2025-11-10 16:04   ` Jonathan Cameron
@ 2025-11-12 20:26   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-12 20:26 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache.
> The CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 06/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (4 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-07 12:34 ` [PATCH 07/33] platform: Define platform_device_put cleanup handler Ben Horgan
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it. As MPAM is only found on arm64
platforms, the arm64 tree is the most natural home for the Kconfig
option.

This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and to register properties
of CPUs with the MPAM driver.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
CC: Dave Martin <dave.martin@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 arch/arm64/Kconfig | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6663ffd23f25..67015d51f7b5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2023,6 +2023,29 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 
+config ARM64_MPAM
+	bool "Enable support for MPAM"
+	help
+	  Memory System Resource Partitioning and Monitoring (MPAM) is an
+	  optional extension to the Arm architecture that allows each
+	  transaction issued to the memory system to be labelled with a
+	  Partition identifier (PARTID) and Performance Monitoring Group
+	  identifier (PMG).
+
+	  Memory system components, such as the caches, can be configured with
+	  policies to control how much of various physical resources (such as
+	  memory bandwidth or cache memory) the transactions labelled with each
+	  PARTID can consume.  Depending on the capabilities of the hardware,
+	  the PARTID and PMG can also be used as filtering criteria to measure
+	  the memory system resource consumption of different parts of a
+	  workload.
+
+	  Use of this extension requires CPU support, support in the
+	  Memory System Components (MSC), and a description from firmware
+	  of where the MSCs are in the address space.
+
+	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 07/33] platform: Define platform_device_put cleanup handler
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (5 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 06/33] arm64: kconfig: Add Kconfig entry for MPAM Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-10  1:03   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
                   ` (32 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

Define a cleanup helper for use with __free to destroy platform devices
automatically when the pointer goes out of scope. This is only intended to
be used in error cases and so should be used with return_ptr() or
no_free_ptr() directly to avoid the automatic destruction on success.

A first use of this is introduced in a subsequent commit.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 include/linux/platform_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 074754c23d33..23a30ada2d4c 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
 extern int platform_device_add(struct platform_device *pdev);
 extern void platform_device_del(struct platform_device *pdev);
 extern void platform_device_put(struct platform_device *pdev);
+DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))
 
 struct platform_driver {
 	int (*probe)(struct platform_device *);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/33] platform: Define platform_device_put cleanup handler
  2025-11-07 12:34 ` [PATCH 07/33] platform: Define platform_device_put cleanup handler Ben Horgan
@ 2025-11-10  1:03   ` Gavin Shan
  2025-11-10 16:07   ` Jonathan Cameron
  2025-11-12 20:32   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-10  1:03 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
> Define a cleanup helper for use with __free to destroy platform devices
> automatically when the pointer goes out of scope. This is only intended to
> be used in error cases and so should be used with return_ptr() or
> no_free_ptr() directly to avoid the automatic destruction on success.
> 
> A first use of this is introduced in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   include/linux/platform_device.h | 1 +
>   1 file changed, 1 insertion(+)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/33] platform: Define platform_device_put cleanup handler
  2025-11-07 12:34 ` [PATCH 07/33] platform: Define platform_device_put cleanup handler Ben Horgan
  2025-11-10  1:03   ` Gavin Shan
@ 2025-11-10 16:07   ` Jonathan Cameron
  2025-11-12 20:32   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:07 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Fri, 7 Nov 2025 12:34:24 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> Define a cleanup helper for use with __free to destroy platform devices
> automatically when the pointer goes out of scope. This is only intended to
> be used in error cases and so should be used with return_ptr() or
> no_free_ptr() directly to avoid the automatic destruction on success.
> 
> A first use of this is introduced in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

I'm fine with this but probably needs a tag from Greg KH.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  include/linux/platform_device.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index 074754c23d33..23a30ada2d4c 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
>  extern int platform_device_add(struct platform_device *pdev);
>  extern void platform_device_del(struct platform_device *pdev);
>  extern void platform_device_put(struct platform_device *pdev);
> +DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))
>  
>  struct platform_driver {
>  	int (*probe)(struct platform_device *);


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/33] platform: Define platform_device_put cleanup handler
  2025-11-07 12:34 ` [PATCH 07/33] platform: Define platform_device_put cleanup handler Ben Horgan
  2025-11-10  1:03   ` Gavin Shan
  2025-11-10 16:07   ` Jonathan Cameron
@ 2025-11-12 20:32   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-12 20:32 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 11/7/25 04:34, Ben Horgan wrote:
> Define a cleanup helper for use with __free to destroy platform devices
> automatically when the pointer goes out of scope. This is only intended to
> be used in error cases and so should be used with return_ptr() or
> no_free_ptr() directly to avoid the automatic destruction on success.
> 
> A first use of this is introduced in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (6 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 07/33] platform: Define platform_device_put cleanup handler Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-10  1:03   ` Gavin Shan
                     ` (3 more replies)
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
                   ` (31 subsequent siblings)
  39 siblings, 4 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

Define a cleanup helper for use with __free to release the acpi table when
the pointer goes out of scope. Also, introduce the helper
acpi_get_table_ret() to simplify a commonly used pattern involving
acpi_get_table().

These are first used in a subsequent commit.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 include/linux/acpi.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index a9dbacabdf89..1124b7dc79fd 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_ACPI_H
 #define _LINUX_ACPI_H
 
+#include <linux/cleanup.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>	/* for struct resource */
 #include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+	struct acpi_table_header *table;
+	int status = acpi_get_table(signature, instance, &table);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-ENOENT);
+	return table;
+}
+DEFINE_FREE(acpi_put_table, struct acpi_table_header *, if (!IS_ERR_OR_NULL(_T)) acpi_put_table(_T))
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
 		unsigned long table_size, int entry_id,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper
  2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
@ 2025-11-10  1:03   ` Gavin Shan
  2025-11-10 16:11   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-10  1:03 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
> Define a cleanup helper for use with __free to release the acpi table when
> the pointer goes out of scope. Also, introduce the helper
> acpi_get_table_ret() to simplify a commonly used pattern involving
> acpi_get_table().
> 
> These are first used in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   include/linux/acpi.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper
  2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
  2025-11-10  1:03   ` Gavin Shan
@ 2025-11-10 16:11   ` Jonathan Cameron
  2025-11-12  7:02   ` Shaopeng Tan (Fujitsu)
  2025-11-12 20:39   ` Fenghua Yu
  3 siblings, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:11 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Fri, 7 Nov 2025 12:34:25 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> Define a cleanup helper for use with __free to release the acpi table when
> the pointer goes out of scope. Also, introduce the helper
> acpi_get_table_ret() to simplify a commonly used pattern involving
> acpi_get_table().
> 
> These are first used in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Seems useful enough to be to be worth having. Needs input from
Rafael.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  include/linux/acpi.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index a9dbacabdf89..1124b7dc79fd 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_put_table, struct acpi_table_header *, if (!IS_ERR_OR_NULL(_T)) acpi_put_table(_T))
> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper
  2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
  2025-11-10  1:03   ` Gavin Shan
  2025-11-10 16:11   ` Jonathan Cameron
@ 2025-11-12  7:02   ` Shaopeng Tan (Fujitsu)
  2025-11-12 20:39   ` Fenghua Yu
  3 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  7:02 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> Define a cleanup helper for use with __free to release the acpi table when the
> pointer goes out of scope. Also, introduce the helper
> acpi_get_table_ret() to simplify a commonly used pattern involving
> acpi_get_table().
> 
> These are first used in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper
  2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
                     ` (2 preceding siblings ...)
  2025-11-12  7:02   ` Shaopeng Tan (Fujitsu)
@ 2025-11-12 20:39   ` Fenghua Yu
  3 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-12 20:39 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 11/7/25 04:34, Ben Horgan wrote:
> Define a cleanup helper for use with __free to release the acpi table when
> the pointer goes out of scope. Also, introduce the helper
> acpi_get_table_ret() to simplify a commonly used pattern involving
> acpi_get_table().
> 
> These are first used in a subsequent commit.
> 
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (7 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-08  8:54   ` Gavin Shan
                     ` (4 more replies)
  2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
                   ` (30 subsequent siblings)
  39 siblings, 5 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.

This happens in two stages. Platform devices are created first for the
MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
to discover the RIS entries the MSC contains.

For now the MPAM hook mpam_ris_create() is stubbed out, but will update
the MPAM driver with optional discovered data about the RIS entries.

CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
return irq from acpi_mpam_register_irq (Jonathan)
err -> len rename (Jonathan)
Move table initialisation after checking (Jonathan)
Add sanity checking in acpi_mpam_count_msc() (Jonathan)
---
 arch/arm64/Kconfig          |   1 +
 drivers/acpi/arm64/Kconfig  |   3 +
 drivers/acpi/arm64/Makefile |   1 +
 drivers/acpi/arm64/mpam.c   | 403 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c       |   2 +-
 include/linux/arm_mpam.h    |  47 +++++
 6 files changed, 456 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 include/linux/arm_mpam.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 67015d51f7b5..c5e66d5d72cd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
 	  optional extension to the Arm architecture that allows each
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
 
 config ACPI_APMT
 	bool
+
+config ACPI_MPAM
+	bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
 obj-$(CONFIG_ACPI_FFH)		+= ffh.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
+obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
 obj-$(CONFIG_ARM_AMBA)		+= amba.o
 obj-y				+= dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..c199944862ed
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,403 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE                              BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                         GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID                    BIT(4)
+
+/*
+ * Encodings for the MSC node body interface type field.
+ * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
+#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
+
+static bool _is_ppi_partition(u32 flags)
+{
+	u32 aff_type, is_ppi;
+	bool ret;
+
+	is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID, flags);
+	if (!is_ppi)
+		return false;
+
+	aff_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
+	ret = (aff_type == ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
+	if (ret)
+		pr_err_once("Partitioned interrupts not supported\n");
+
+	return ret;
+}
+
+static int acpi_mpam_register_irq(struct platform_device *pdev,
+				  int intid, u32 flags)
+{
+	int irq;
+	u32 int_type;
+	int trigger;
+
+	if (!intid)
+		return -EINVAL;
+
+	if (_is_ppi_partition(flags))
+		return -EINVAL;
+
+	trigger = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);
+	int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
+	if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+		return -EINVAL;
+
+	irq = acpi_register_gsi(&pdev->dev, intid, trigger, ACPI_ACTIVE_HIGH);
+	if (irq <= 0)
+		pr_err_once("Failed to register interrupt 0x%x with ACPI\n", intid);
+
+	return irq;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+				 struct acpi_mpam_msc_node *tbl_msc,
+				 struct resource *res, int *res_idx)
+{
+	u32 flags, intid;
+	int irq;
+
+	intid = tbl_msc->overflow_interrupt;
+	flags = tbl_msc->overflow_interrupt_flags;
+	irq = acpi_mpam_register_irq(pdev, intid, flags);
+	if (irq > 0)
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+	intid = tbl_msc->error_interrupt;
+	flags = tbl_msc->error_interrupt_flags;
+	irq = acpi_mpam_register_irq(pdev, intid, flags);
+	if (irq > 0)
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+				    struct acpi_mpam_resource_node *res)
+{
+	int level, nid;
+	u32 cache_id;
+
+	switch (res->locator_type) {
+	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+		cache_id = res->locator.cache_locator.cache_reference;
+		level = find_acpi_cache_level_from_id(cache_id);
+		if (level <= 0) {
+			pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);
+			return -EINVAL;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+				       level, cache_id);
+	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+		if (nid == NUMA_NO_NODE) {
+			pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
+				 res->locator.memory_locator.proximity_domain);
+			nid = 0;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+				       255, nid);
+	default:
+		/* These get discovered later and are treated as unknown */
+		return 0;
+	}
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc)
+{
+	int i, err;
+	char *ptr, *table_end;
+	struct acpi_mpam_resource_node *resource;
+
+	ptr = (char *)(tbl_msc + 1);
+	table_end = ptr + tbl_msc->length;
+	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+		u64 max_deps, remaining_table;
+
+		if (ptr + sizeof(*resource) > table_end)
+			return -EINVAL;
+
+		resource = (struct acpi_mpam_resource_node *)ptr;
+
+		remaining_table = table_end - ptr;
+		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
+		if (resource->num_functional_deps > max_deps) {
+			pr_debug("MSC has impossible number of functional dependencies\n");
+			return -EINVAL;
+		}
+
+		err = acpi_mpam_parse_resource(msc, resource);
+		if (err)
+			return err;
+
+		ptr += sizeof(*resource);
+		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
+	}
+
+	return 0;
+}
+
+/*
+ * Creates the device power management link and returns true if the
+ * acpi id is valid and usable for cpu affinity.  This is the case
+ * when the linked device is a processor or a processor container.
+ */
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+				     struct platform_device *pdev,
+				     u32 *acpi_id)
+{
+	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
+	bool acpi_id_valid = false;
+	struct acpi_device *buddy;
+	char uid[11];
+	int len;
+
+	memcpy(hid, &tbl_msc->hardware_id_linked_device,
+	       sizeof(tbl_msc->hardware_id_linked_device));
+
+	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+		*acpi_id = tbl_msc->instance_id_linked_device;
+		acpi_id_valid = true;
+	}
+
+	len = snprintf(uid, sizeof(uid), "%u",
+		       tbl_msc->instance_id_linked_device);
+	if (len >= sizeof(uid)) {
+		pr_debug("Failed to convert uid of device for power management.");
+		return acpi_id_valid;
+	}
+
+	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+	if (buddy)
+		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+	return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+				 enum mpam_msc_iface *iface)
+{
+	switch (tbl_msc->interface_type) {
+	case ACPI_MPAM_MSC_IFACE_MMIO:
+		*iface = MPAM_IFACE_MMIO;
+		return 0;
+	case ACPI_MPAM_MSC_IFACE_PCC:
+		*iface = MPAM_IFACE_PCC;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static struct platform_device * __init acpi_mpam_parse_msc(struct acpi_mpam_msc_node *tbl_msc)
+{
+	struct platform_device *pdev __free(platform_device_put) =
+		platform_device_alloc("mpam_msc", tbl_msc->identifier);
+	int next_res = 0, next_prop = 0, err;
+	/* pcc, nrdy, affinity and a sentinel */
+	struct property_entry props[4] = { 0 };
+	/* mmio, 2xirq, no sentinel. */
+	struct resource res[3] = { 0 };
+	struct acpi_device *companion;
+	enum mpam_msc_iface iface;
+	char uid[16];
+	u32 acpi_id;
+
+	if (!pdev)
+		return ERR_PTR(-ENOMEM);
+
+	/* Some power management is described in the namespace: */
+	err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+	if (err > 0 && err < sizeof(uid)) {
+		companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+		if (companion)
+			ACPI_COMPANION_SET(&pdev->dev, companion);
+		else
+			pr_debug("MSC.%u: missing namespace entry\n",
+				 tbl_msc->identifier);
+	}
+
+	if (decode_interface_type(tbl_msc, &iface)) {
+		pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (iface == MPAM_IFACE_MMIO)
+		res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+						       tbl_msc->mmio_size,
+						       "MPAM:MSC");
+	else if (iface == MPAM_IFACE_PCC)
+		props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+							tbl_msc->base_address);
+
+	acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+
+	WARN_ON_ONCE(next_res > ARRAY_SIZE(res));
+	err = platform_device_add_resources(pdev, res, next_res);
+	if (err)
+		return ERR_PTR(err);
+
+	props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+						tbl_msc->max_nrdy_usec);
+
+	/*
+	 * The MSC's CPU affinity is described via its linked power
+	 * management device, but only if it points at a Processor or
+	 * Processor Container.
+	 */
+	if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
+		props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity", acpi_id);
+
+	WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));
+	err = device_create_managed_software_node(&pdev->dev, props, NULL);
+	if (err)
+		return ERR_PTR(err);
+
+	/*
+	 * Stash the table entry for acpi_mpam_parse_resources() to discover
+	 * what this MSC controls.
+	 */
+	err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+	if (err)
+		return ERR_PTR(err);
+
+	err = platform_device_add(pdev);
+	if (err)
+		return ERR_PTR(err);
+
+	return_ptr(pdev);
+}
+
+static int __init acpi_mpam_parse(void)
+{
+	char *table_end, *table_offset;
+	struct acpi_mpam_msc_node *tbl_msc;
+	struct platform_device *pdev;
+
+	if (acpi_disabled || !system_supports_mpam())
+		return 0;
+
+	struct acpi_table_header *table __free(acpi_put_table) =
+		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_offset = (char *)(table + 1);
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		table_offset += tbl_msc->length;
+
+		if (table_offset > table_end) {
+			pr_err("MSC entry overlaps end of ACPI table\n");
+			return -EINVAL;
+		}
+
+		/*
+		 * If any of the reserved fields are set, make no attempt to
+		 * parse the MSC structure. This MSC will still be counted by
+		 * acpi_mpam_count_msc(), meaning the MPAM driver can't probe
+		 * against all MSC, and will never be enabled. There is no way
+		 * to enable it safely, because we cannot determine safe
+		 * system-wide partid and pmg ranges in this situation.
+		 */
+		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
+			pr_err_once("Unrecognised MSC, MPAM not usable\n");
+			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
+			continue;
+		}
+
+		if (!tbl_msc->mmio_size) {
+			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
+			continue;
+		}
+
+		pdev = acpi_mpam_parse_msc(tbl_msc);
+		if (IS_ERR(pdev))
+			return PTR_ERR(pdev);
+	}
+
+	return 0;
+}
+
+/**
+ * acpi_mpam_count_msc() - Count the number of MSC described by firmware.
+ *
+ * Returns the number of MSC, or zero for an error.
+ *
+ * This can be called before or in parallel with acpi_mpam_parse().
+ */
+int acpi_mpam_count_msc(void)
+{
+	char *table_end, *table_offset;
+	struct acpi_mpam_msc_node *tbl_msc;
+	int count = 0;
+
+	if (acpi_disabled || !system_supports_mpam())
+		return 0;
+
+	struct acpi_table_header *table __free(acpi_put_table) =
+		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_offset = (char *)(table + 1);
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+
+		if (tbl_msc->length < sizeof(*tbl_msc))
+			return -EINVAL;
+		if (tbl_msc->length > table_end - table_offset)
+			return -EINVAL;
+		table_offset += tbl_msc->length;
+
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		count++;
+	}
+
+	return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 57fc8bc56166..4286e4af1092 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
 	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
 	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
 	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
-	ACPI_SIG_NBFT, ACPI_SIG_SWFT};
+	ACPI_SIG_NBFT, ACPI_SIG_SWFT, ACPI_SIG_MPAM};
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..a3828ef91aee
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+#define GLOBAL_AFFINITY		~0
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
+	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+	MPAM_CLASS_CACHE,       /* Caches, e.g. L2, L3 */
+	MPAM_CLASS_MEMORY,      /* Main memory */
+	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+					    struct acpi_mpam_msc_node *tbl_msc)
+{
+	return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
@ 2025-11-08  8:54   ` Gavin Shan
  2025-11-10 16:27     ` Jonathan Cameron
  2025-11-12 14:45     ` Ben Horgan
  2025-11-10 16:23   ` Jonathan Cameron
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-08  8:54 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> This happens in two stages. Platform devices are created first for the
> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
> to discover the RIS entries the MSC contains.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data about the RIS entries.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> return irq from acpi_mpam_register_irq (Jonathan)
> err -> len rename (Jonathan)
> Move table initialisation after checking (Jonathan)
> Add sanity checking in acpi_mpam_count_msc() (Jonathan)
> ---
>   arch/arm64/Kconfig          |   1 +
>   drivers/acpi/arm64/Kconfig  |   3 +
>   drivers/acpi/arm64/Makefile |   1 +
>   drivers/acpi/arm64/mpam.c   | 403 ++++++++++++++++++++++++++++++++++++
>   drivers/acpi/tables.c       |   2 +-
>   include/linux/arm_mpam.h    |  47 +++++
>   6 files changed, 456 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/acpi/arm64/mpam.c
>   create mode 100644 include/linux/arm_mpam.h
> 

With the following minor comments addressed:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 67015d51f7b5..c5e66d5d72cd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>   
>   config ARM64_MPAM
>   	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>   	help
>   	  Memory System Resource Partitioning and Monitoring (MPAM) is an
>   	  optional extension to the Arm architecture that allows each
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>   
>   config ACPI_APMT
>   	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>   obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>   obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>   obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>   obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>   obj-$(CONFIG_ARM_AMBA)		+= amba.o
>   obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..c199944862ed
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,403 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE                              BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                         GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID                    BIT(4)
> +
> +/*
> + * Encodings for the MSC node body interface type field.
> + * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
> +#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
> +
> +static bool _is_ppi_partition(u32 flags)
> +{
> +	u32 aff_type, is_ppi;
> +	bool ret;
> +
> +	is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID, flags);
> +	if (!is_ppi)
> +		return false;
> +

A error message may be needed since the driver won't fully function without
interrupt enabled. The error message gives a clear indication on what has
happened to system administrator.

> +	aff_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
> +	ret = (aff_type == ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
> +	if (ret)
> +		pr_err_once("Partitioned interrupts not supported\n");
> +
> +	return ret;
> +}
> +
> +static int acpi_mpam_register_irq(struct platform_device *pdev,
> +				  int intid, u32 flags)
> +{

s/int intid/u32 intid

All the callers pass a 'u32' parameter instead of 'int'.

> +	int irq;
> +	u32 int_type;
> +	int trigger;
> +
> +	if (!intid)
> +		return -EINVAL;
> +
> +	if (_is_ppi_partition(flags))
> +		return -EINVAL;
> +
> +	trigger = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);
> +	int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
> +	if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return -EINVAL;
> +

Same as above, a error message may be needed here.

> +	irq = acpi_register_gsi(&pdev->dev, intid, trigger, ACPI_ACTIVE_HIGH);
> +	if (irq <= 0)
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n", intid);
> +

s/if (irq <= 0)/if (irq < 0)

It's impossible for acpi_register_gsi() to return 0, which has been translated
to -EINVAL in the function.

> +	return irq;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, intid;
> +	int irq;
> +
> +	intid = tbl_msc->overflow_interrupt;
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	irq = acpi_mpam_register_irq(pdev, intid, flags);
> +	if (irq > 0)
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> +
> +	intid = tbl_msc->error_interrupt;
> +	flags = tbl_msc->error_interrupt_flags;
> +	irq = acpi_mpam_register_irq(pdev, intid, flags);
> +	if (irq > 0)
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE) {
> +			pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
> +				 res->locator.memory_locator.proximity_domain);
> +			nid = 0;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and are treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	int i, err;
> +	char *ptr, *table_end;
> +	struct acpi_mpam_resource_node *resource;
> +
> +	ptr = (char *)(tbl_msc + 1);
> +	table_end = ptr + tbl_msc->length;
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		u64 max_deps, remaining_table;
> +
> +		if (ptr + sizeof(*resource) > table_end)
> +			return -EINVAL;
> +
> +		resource = (struct acpi_mpam_resource_node *)ptr;
> +
> +		remaining_table = table_end - ptr;
> +		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
> +		if (resource->num_functional_deps > max_deps) {
> +			pr_debug("MSC has impossible number of functional dependencies\n");
> +			return -EINVAL;
> +		}
> +
> +		err = acpi_mpam_parse_resource(msc, resource);
> +		if (err)
> +			return err;
> +
> +		ptr += sizeof(*resource);
> +		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Creates the device power management link and returns true if the
> + * acpi id is valid and usable for cpu affinity.  This is the case
> + * when the linked device is a processor or a processor container.
> + */
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int len;
> +
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	len = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (len >= sizeof(uid)) {
> +		pr_debug("Failed to convert uid of device for power management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case ACPI_MPAM_MSC_IFACE_MMIO:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case ACPI_MPAM_MSC_IFACE_PCC:
> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static struct platform_device * __init acpi_mpam_parse_msc(struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	struct platform_device *pdev __free(platform_device_put) =
> +		platform_device_alloc("mpam_msc", tbl_msc->identifier);
> +	int next_res = 0, next_prop = 0, err;
> +	/* pcc, nrdy, affinity and a sentinel */
> +	struct property_entry props[4] = { 0 };
> +	/* mmio, 2xirq, no sentinel. */
> +	struct resource res[3] = { 0 };
> +	struct acpi_device *companion;
> +	enum mpam_msc_iface iface;
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (!pdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/* Some power management is described in the namespace: */
> +	err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +	if (err > 0 && err < sizeof(uid)) {
> +		companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> +		if (companion)
> +			ACPI_COMPANION_SET(&pdev->dev, companion);
> +		else
> +			pr_debug("MSC.%u: missing namespace entry\n",
> +				 tbl_msc->identifier);
> +	}
> +

{ } is needed for the block of code spanning multiple lines.

> +	if (decode_interface_type(tbl_msc, &iface)) {
> +		pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (iface == MPAM_IFACE_MMIO)
> +		res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +						       tbl_msc->mmio_size,
> +						       "MPAM:MSC");
> +	else if (iface == MPAM_IFACE_PCC)
> +		props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> +							tbl_msc->base_address);
> +

As above, {} is needed here.

> +	acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +
> +	WARN_ON_ONCE(next_res > ARRAY_SIZE(res));
> +	err = platform_device_add_resources(pdev, res, next_res);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +						tbl_msc->max_nrdy_usec);
> +
> +	/*
> +	 * The MSC's CPU affinity is described via its linked power
> +	 * management device, but only if it points at a Processor or
> +	 * Processor Container.
> +	 */
> +	if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
> +		props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity", acpi_id);
> +
> +	WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));
> +	err = device_create_managed_software_node(&pdev->dev, props, NULL);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	/*
> +	 * Stash the table entry for acpi_mpam_parse_resources() to discover
> +	 * what this MSC controls.
> +	 */
> +	err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	err = platform_device_add(pdev);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	return_ptr(pdev);
> +}
> +
> +static int __init acpi_mpam_parse(void)
> +{
> +	char *table_end, *table_offset;
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	struct platform_device *pdev;
> +
> +	if (acpi_disabled || !system_supports_mpam())
> +		return 0;
> +
> +	struct acpi_table_header *table __free(acpi_put_table) =
> +		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +

It's correct to return zero on IS_ERR(table) with an error message, but
a message printed by pr_debug() may be worthywhile on "if (table->revison < 1)".

> +	table_offset = (char *)(table + 1);
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		if (table_offset > table_end) {
> +			pr_err("MSC entry overlaps end of ACPI table\n");
> +			return -EINVAL;
> +		}
> +

Would be:

		if (table_offset + sizeof(*tbl_msc) > table_end)

> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the MSC structure. This MSC will still be counted by
> +		 * acpi_mpam_count_msc(), meaning the MPAM driver can't probe
> +		 * against all MSC, and will never be enabled. There is no way
> +		 * to enable it safely, because we cannot determine safe
> +		 * system-wide partid and pmg ranges in this situation.
> +		 */
> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2) {
> +			pr_err_once("Unrecognised MSC, MPAM not usable\n");
> +			pr_debug("MSC.%u: reserved field set\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (!tbl_msc->mmio_size) {
> +			pr_debug("MSC.%u: marked as disabled\n", tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		pdev = acpi_mpam_parse_msc(tbl_msc);
> +		if (IS_ERR(pdev))
> +			return PTR_ERR(pdev);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * acpi_mpam_count_msc() - Count the number of MSC described by firmware.
> + *
> + * Returns the number of MSC, or zero for an error.

s/MSC/MSCs

> + *
> + * This can be called before or in parallel with acpi_mpam_parse().
> + */
> +int acpi_mpam_count_msc(void)
> +{
> +	char *table_end, *table_offset;
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (acpi_disabled || !system_supports_mpam())
> +		return 0;
> +
> +	struct acpi_table_header *table __free(acpi_put_table) =
> +		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_offset = (char *)(table + 1);
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +

Would be worthy to check:

		if (table_offset + sizeof(*tbl_msc) > table_end)
			return -EINVAL;

> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +		if (tbl_msc->length > table_end - table_offset)
> +			return -EINVAL;
> +		table_offset += tbl_msc->length;
> +
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		count++;
> +	}
> +
> +	return count;
> +}
> +
> +/*
> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index 57fc8bc56166..4286e4af1092 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
>   	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>   	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>   	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> -	ACPI_SIG_NBFT, ACPI_SIG_SWFT};
> +	ACPI_SIG_NBFT, ACPI_SIG_SWFT, ACPI_SIG_MPAM};
>   
>   #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>   
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..a3828ef91aee
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +#define GLOBAL_AFFINITY		~0
> +
> +struct mpam_msc;
> +
> +enum mpam_msc_iface {
> +	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
> +	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
> +};
> +
> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Caches, e.g. L2, L3 */
> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else
> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +					    struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
> +#endif
> +
> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* __LINUX_ARM_MPAM_H */

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-08  8:54   ` Gavin Shan
@ 2025-11-10 16:27     ` Jonathan Cameron
  2025-11-12 14:45     ` Ben Horgan
  1 sibling, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:27 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Ben Horgan, james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, guohanjun, jeremy.linton, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On Sat, 8 Nov 2025 18:54:05 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > Add code to parse the arm64 specific MPAM table, looking up the cache
> > level from the PPTT and feeding the end result into the MPAM driver.
> > 
> > This happens in two stages. Platform devices are created first for the
> > MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
> > to discover the RIS entries the MSC contains.
> > 
> > For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> > the MPAM driver with optional discovered data about the RIS entries.
> > 
> > CC: Carl Worth <carl@os.amperecomputing.com>
> > Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> > Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> > Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> > Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> > Tested-by: Peter Newman <peternewman@google.com>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> > ---
> > Changes since v3:
> > return irq from acpi_mpam_register_irq (Jonathan)
> > err -> len rename (Jonathan)
> > Move table initialisation after checking (Jonathan)
> > Add sanity checking in acpi_mpam_count_msc() (Jonathan)
> > ---
> >   arch/arm64/Kconfig          |   1 +
> >   drivers/acpi/arm64/Kconfig  |   3 +
> >   drivers/acpi/arm64/Makefile |   1 +
> >   drivers/acpi/arm64/mpam.c   | 403 ++++++++++++++++++++++++++++++++++++
> >   drivers/acpi/tables.c       |   2 +-
> >   include/linux/arm_mpam.h    |  47 +++++
> >   6 files changed, 456 insertions(+), 1 deletion(-)
> >   create mode 100644 drivers/acpi/arm64/mpam.c
> >   create mode 100644 include/linux/arm_mpam.h
> >   
> 
> With the following minor comments addressed:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
Just picking out one comment where I think your suggestion
isn't quite the right one.

Jonathan

> > diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> > new file mode 100644
> > index 000000000000..c199944862ed
> > --- /dev/null
> > +++ b/drivers/acpi/arm64/mpam.c

> > +static int __init acpi_mpam_parse(void)
> > +{
> > +	char *table_end, *table_offset;
> > +	struct acpi_mpam_msc_node *tbl_msc;
> > +	struct platform_device *pdev;
> > +
> > +	if (acpi_disabled || !system_supports_mpam())
> > +		return 0;
> > +
> > +	struct acpi_table_header *table __free(acpi_put_table) =
> > +		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> > +
> > +	if (IS_ERR(table))
> > +		return 0;
> > +
> > +	if (table->revision < 1)
> > +		return 0;
> > +  
> 
> It's correct to return zero on IS_ERR(table) with an error message, but
> a message printed by pr_debug() may be worthywhile on "if (table->revison < 1)".
> 
> > +	table_offset = (char *)(table + 1);
> > +	table_end = (char *)table + table->length;
> > +
> > +	while (table_offset < table_end) {
> > +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> > +		table_offset += tbl_msc->length;
> > +
> > +		if (table_offset > table_end) {
> > +			pr_err("MSC entry overlaps end of ACPI table\n");
> > +			return -EINVAL;
> > +		}
> > +  
> 
> Would be:
> 
> 		if (table_offset + sizeof(*tbl_msc) > table_end)

I'm not seeing this one.  table_offset has already been moved on by
tbl_msc->length which should be bigger than sizeof(*tbl_msc).

Could add a check before reading tbl_msc->length that there is enough
there to do so. That to me would make sense (like the other case you point
at later).

Jonathan





^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-08  8:54   ` Gavin Shan
  2025-11-10 16:27     ` Jonathan Cameron
@ 2025-11-12 14:45     ` Ben Horgan
  1 sibling, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 14:45 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

On 11/8/25 08:54, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
>>
>> This happens in two stages. Platform devices are created first for the
>> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
>> to discover the RIS entries the MSC contains.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
>> the MPAM driver with optional discovered data about the RIS entries.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
>> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> return irq from acpi_mpam_register_irq (Jonathan)
>> err -> len rename (Jonathan)
>> Move table initialisation after checking (Jonathan)
>> Add sanity checking in acpi_mpam_count_msc() (Jonathan)
>> ---
>>   arch/arm64/Kconfig          |   1 +
>>   drivers/acpi/arm64/Kconfig  |   3 +
>>   drivers/acpi/arm64/Makefile |   1 +
>>   drivers/acpi/arm64/mpam.c   | 403 ++++++++++++++++++++++++++++++++++++
>>   drivers/acpi/tables.c       |   2 +-
>>   include/linux/arm_mpam.h    |  47 +++++
>>   6 files changed, 456 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/acpi/arm64/mpam.c
>>   create mode 100644 include/linux/arm_mpam.h
>>
> 
> With the following minor comments addressed:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 67015d51f7b5..c5e66d5d72cd 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>>     config ARM64_MPAM
>>       bool "Enable support for MPAM"
>> +    select ACPI_MPAM if ACPI
>>       help
>>         Memory System Resource Partitioning and Monitoring (MPAM) is an
>>         optional extension to the Arm architecture that allows each
>> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
>> index b3ed6212244c..f2fd79f22e7d 100644
>> --- a/drivers/acpi/arm64/Kconfig
>> +++ b/drivers/acpi/arm64/Kconfig
>> @@ -21,3 +21,6 @@ config ACPI_AGDI
>>     config ACPI_APMT
>>       bool
>> +
>> +config ACPI_MPAM
>> +    bool
>> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
>> index 05ecde9eaabe..9390b57cb564 100644
>> --- a/drivers/acpi/arm64/Makefile
>> +++ b/drivers/acpi/arm64/Makefile
>> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT)     += apmt.o
>>   obj-$(CONFIG_ACPI_FFH)        += ffh.o
>>   obj-$(CONFIG_ACPI_GTDT)     += gtdt.o
>>   obj-$(CONFIG_ACPI_IORT)     += iort.o
>> +obj-$(CONFIG_ACPI_MPAM)     += mpam.o
>>   obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>>   obj-$(CONFIG_ARM_AMBA)        += amba.o
>>   obj-y                += dma.o init.o
>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
>> new file mode 100644
>> index 000000000000..c199944862ed
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c
>> @@ -0,0 +1,403 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +/* Parse the MPAM ACPI table feeding the discovered nodes into the
>> driver */
>> +
>> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/arm_mpam.h>
>> +#include <linux/bits.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/platform_device.h>
>> +
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
>> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
>> + */
>> +#define ACPI_MPAM_MSC_IRQ_MODE                              BIT(0)
>> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                        
>> GENMASK(2, 1)
>> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID                    BIT(4)
>> +
>> +/*
>> + * Encodings for the MSC node body interface type field.
>> + * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
>> + */
>> +#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
>> +#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
>> +
>> +static bool _is_ppi_partition(u32 flags)
>> +{
>> +    u32 aff_type, is_ppi;
>> +    bool ret;
>> +
>> +    is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID, flags);
>> +    if (!is_ppi)
>> +        return false;
>> +
> 
> A error message may be needed since the driver won't fully function without
> interrupt enabled. The error message gives a clear indication on what has
> happened to system administrator.

I don't think extra error messages are needed as the error interrupts
can only be caused by driver programming errors and overflow interrupts
are currently not considered.

> 
>> +    aff_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
>> +    ret = (aff_type ==
>> ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
>> +    if (ret)
>> +        pr_err_once("Partitioned interrupts not supported\n");
>> +
>> +    return ret;
>> +}
>> +
>> +static int acpi_mpam_register_irq(struct platform_device *pdev,
>> +                  int intid, u32 flags)
>> +{
> 
> s/int intid/u32 intid
> 
> All the callers pass a 'u32' parameter instead of 'int'.
> 
>> +    int irq;
>> +    u32 int_type;
>> +    int trigger;
>> +
>> +    if (!intid)
>> +        return -EINVAL;
>> +
>> +    if (_is_ppi_partition(flags))
>> +        return -EINVAL;
>> +
>> +    trigger = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);
>> +    int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
>> +    if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
>> +        return -EINVAL;
>> +
> 
> Same as above, a error message may be needed here.

I'd rather leave this as is, see above.

> 
>> +    irq = acpi_register_gsi(&pdev->dev, intid, trigger,
>> ACPI_ACTIVE_HIGH);
>> +    if (irq <= 0)
>> +        pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
>> intid);
>> +
> 
> s/if (irq <= 0)/if (irq < 0)

Done

> 
> It's impossible for acpi_register_gsi() to return 0, which has been
> translated
> to -EINVAL in the function.
> 
>> +    return irq;
>> +}
>> +
>> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
>> +                 struct acpi_mpam_msc_node *tbl_msc,
>> +                 struct resource *res, int *res_idx)
>> +{
>> +    u32 flags, intid;
>> +    int irq;
>> +
>> +    intid = tbl_msc->overflow_interrupt;
>> +    flags = tbl_msc->overflow_interrupt_flags;
>> +    irq = acpi_mpam_register_irq(pdev, intid, flags);
>> +    if (irq > 0)
>> +        res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
>> +
>> +    intid = tbl_msc->error_interrupt;
>> +    flags = tbl_msc->error_interrupt_flags;
>> +    irq = acpi_mpam_register_irq(pdev, intid, flags);
>> +    if (irq > 0)
>> +        res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
>> +}
>> +
>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>> +                    struct acpi_mpam_resource_node *res)
>> +{
>> +    int level, nid;
>> +    u32 cache_id;
>> +
>> +    switch (res->locator_type) {
>> +    case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>> +        cache_id = res->locator.cache_locator.cache_reference;
>> +        level = find_acpi_cache_level_from_id(cache_id);
>> +        if (level <= 0) {
>> +            pr_err_once("Bad level (%d) for cache with id %u\n",
>> level, cache_id);
>> +            return -EINVAL;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>> +                       level, cache_id);
>> +    case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>> +        nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>> +        if (nid == NUMA_NO_NODE) {
>> +            pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
>> +                 res->locator.memory_locator.proximity_domain);
>> +            nid = 0;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +                       255, nid);
>> +    default:
>> +        /* These get discovered later and are treated as unknown */
>> +        return 0;
>> +    }
>> +}
>> +
>> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
>> +                  struct acpi_mpam_msc_node *tbl_msc)
>> +{
>> +    int i, err;
>> +    char *ptr, *table_end;
>> +    struct acpi_mpam_resource_node *resource;
>> +
>> +    ptr = (char *)(tbl_msc + 1);
>> +    table_end = ptr + tbl_msc->length;
>> +    for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
>> +        u64 max_deps, remaining_table;
>> +
>> +        if (ptr + sizeof(*resource) > table_end)
>> +            return -EINVAL;
>> +
>> +        resource = (struct acpi_mpam_resource_node *)ptr;
>> +
>> +        remaining_table = table_end - ptr;
>> +        max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
>> +        if (resource->num_functional_deps > max_deps) {
>> +            pr_debug("MSC has impossible number of functional
>> dependencies\n");
>> +            return -EINVAL;
>> +        }
>> +
>> +        err = acpi_mpam_parse_resource(msc, resource);
>> +        if (err)
>> +            return err;
>> +
>> +        ptr += sizeof(*resource);
>> +        ptr += resource->num_functional_deps * sizeof(struct
>> acpi_mpam_func_deps);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Creates the device power management link and returns true if the
>> + * acpi id is valid and usable for cpu affinity.  This is the case
>> + * when the linked device is a processor or a processor container.
>> + */
>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>> +                     struct platform_device *pdev,
>> +                     u32 *acpi_id)
>> +{
>> +    char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
>> +    bool acpi_id_valid = false;
>> +    struct acpi_device *buddy;
>> +    char uid[11];
>> +    int len;
>> +
>> +    memcpy(hid, &tbl_msc->hardware_id_linked_device,
>> +           sizeof(tbl_msc->hardware_id_linked_device));
>> +
>> +    if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>> +        *acpi_id = tbl_msc->instance_id_linked_device;
>> +        acpi_id_valid = true;
>> +    }
>> +
>> +    len = snprintf(uid, sizeof(uid), "%u",
>> +               tbl_msc->instance_id_linked_device);
>> +    if (len >= sizeof(uid)) {
>> +        pr_debug("Failed to convert uid of device for power
>> management.");
>> +        return acpi_id_valid;
>> +    }
>> +
>> +    buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
>> +    if (buddy)
>> +        device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
>> +
>> +    return acpi_id_valid;
>> +}
>> +
>> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
>> +                 enum mpam_msc_iface *iface)
>> +{
>> +    switch (tbl_msc->interface_type) {
>> +    case ACPI_MPAM_MSC_IFACE_MMIO:
>> +        *iface = MPAM_IFACE_MMIO;
>> +        return 0;
>> +    case ACPI_MPAM_MSC_IFACE_PCC:
>> +        *iface = MPAM_IFACE_PCC;
>> +        return 0;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +}
>> +
>> +static struct platform_device * __init acpi_mpam_parse_msc(struct
>> acpi_mpam_msc_node *tbl_msc)
>> +{
>> +    struct platform_device *pdev __free(platform_device_put) =
>> +        platform_device_alloc("mpam_msc", tbl_msc->identifier);
>> +    int next_res = 0, next_prop = 0, err;
>> +    /* pcc, nrdy, affinity and a sentinel */
>> +    struct property_entry props[4] = { 0 };
>> +    /* mmio, 2xirq, no sentinel. */
>> +    struct resource res[3] = { 0 };
>> +    struct acpi_device *companion;
>> +    enum mpam_msc_iface iface;
>> +    char uid[16];
>> +    u32 acpi_id;
>> +
>> +    if (!pdev)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    /* Some power management is described in the namespace: */
>> +    err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
>> +    if (err > 0 && err < sizeof(uid)) {
>> +        companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
>> +        if (companion)
>> +            ACPI_COMPANION_SET(&pdev->dev, companion);
>> +        else
>> +            pr_debug("MSC.%u: missing namespace entry\n",
>> +                 tbl_msc->identifier);
>> +    }
>> +
> 
> { } is needed for the block of code spanning multiple lines.

Just made it one line.

> 
>> +    if (decode_interface_type(tbl_msc, &iface)) {
>> +        pr_debug("MSC.%u: unknown interface type\n", tbl_msc-
>> >identifier);
>> +        return ERR_PTR(-EINVAL);
>> +    }
>> +
>> +    if (iface == MPAM_IFACE_MMIO)
>> +        res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
>> +                               tbl_msc->mmio_size,
>> +                               "MPAM:MSC");
>> +    else if (iface == MPAM_IFACE_PCC)
>> +        props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
>> +                            tbl_msc->base_address);
>> +
> 
> As above, {} is needed here.

Done

> 
>> +    acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
>> +
>> +    WARN_ON_ONCE(next_res > ARRAY_SIZE(res));
>> +    err = platform_device_add_resources(pdev, res, next_res);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>> +    props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
>> +                        tbl_msc->max_nrdy_usec);
>> +
>> +    /*
>> +     * The MSC's CPU affinity is described via its linked power
>> +     * management device, but only if it points at a Processor or
>> +     * Processor Container.
>> +     */
>> +    if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
>> +        props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
>> acpi_id);
>> +
>> +    WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));
>> +    err = device_create_managed_software_node(&pdev->dev, props, NULL);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>> +    /*
>> +     * Stash the table entry for acpi_mpam_parse_resources() to discover
>> +     * what this MSC controls.
>> +     */
>> +    err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>> +    err = platform_device_add(pdev);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>> +    return_ptr(pdev);
>> +}
>> +
>> +static int __init acpi_mpam_parse(void)
>> +{
>> +    char *table_end, *table_offset;
>> +    struct acpi_mpam_msc_node *tbl_msc;
>> +    struct platform_device *pdev;
>> +
>> +    if (acpi_disabled || !system_supports_mpam())
>> +        return 0;
>> +
>> +    struct acpi_table_header *table __free(acpi_put_table) =
>> +        acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +
>> +    if (IS_ERR(table))
>> +        return 0;
>> +
>> +    if (table->revision < 1)
>> +        return 0;
>> +
> 
> It's correct to return zero on IS_ERR(table) with an error message, but
> a message printed by pr_debug() may be worthywhile on "if (table-
>>revison < 1)".

Ok, adding:
pr_debug("MPAM ACPI table revision %d not supported\n",
          table->revision);

> 
>> +    table_offset = (char *)(table + 1);
>> +    table_end = (char *)table + table->length;
>> +
>> +    while (table_offset < table_end) {
>> +        tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +        table_offset += tbl_msc->length;
>> +
>> +        if (table_offset > table_end) {
>> +            pr_err("MSC entry overlaps end of ACPI table\n");
>> +            return -EINVAL;
>> +        }
>> +
> 
> Would be:
> 
>         if (table_offset + sizeof(*tbl_msc) > table_end)

As Jonathan said I don't think this is quite valid but we can add a
check that makes sure ->length is within the table bounds. I've changed
this to be:

tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
if (table_offset + sizeof(*tbl_msc) > table_end ||
    table_offset + tbl_msc->length > table_end) {
	pr_err("MSC entry overlaps end of ACPI table\n");
		return -EINVAL;
}
table_offset += tbl_msc->length;

> 
>> +        /*
>> +         * If any of the reserved fields are set, make no attempt to
>> +         * parse the MSC structure. This MSC will still be counted by
>> +         * acpi_mpam_count_msc(), meaning the MPAM driver can't probe
>> +         * against all MSC, and will never be enabled. There is no way
>> +         * to enable it safely, because we cannot determine safe
>> +         * system-wide partid and pmg ranges in this situation.
>> +         */
>> +        if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc-
>> >reserved2) {
>> +            pr_err_once("Unrecognised MSC, MPAM not usable\n");
>> +            pr_debug("MSC.%u: reserved field set\n", tbl_msc-
>> >identifier);
>> +            continue;
>> +        }
>> +
>> +        if (!tbl_msc->mmio_size) {
>> +            pr_debug("MSC.%u: marked as disabled\n", tbl_msc-
>> >identifier);
>> +            continue;
>> +        }
>> +
>> +        pdev = acpi_mpam_parse_msc(tbl_msc);
>> +        if (IS_ERR(pdev))
>> +            return PTR_ERR(pdev);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * acpi_mpam_count_msc() - Count the number of MSC described by
>> firmware.
>> + *
>> + * Returns the number of MSC, or zero for an error.
> 
> s/MSC/MSCs

Ack

> 
>> + *
>> + * This can be called before or in parallel with acpi_mpam_parse().
>> + */
>> +int acpi_mpam_count_msc(void)
>> +{
>> +    char *table_end, *table_offset;
>> +    struct acpi_mpam_msc_node *tbl_msc;
>> +    int count = 0;
>> +
>> +    if (acpi_disabled || !system_supports_mpam())
>> +        return 0;
>> +
>> +    struct acpi_table_header *table __free(acpi_put_table) =
>> +        acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +
>> +    if (IS_ERR(table))
>> +        return 0;
>> +
>> +    if (table->revision < 1)
>> +        return 0;
>> +
>> +    table_offset = (char *)(table + 1);
>> +    table_end = (char *)table + table->length;
>> +
>> +    while (table_offset < table_end) {
>> +        tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +
> 
> Would be worthy to check:
> 
>         if (table_offset + sizeof(*tbl_msc) > table_end)
>             return -EINVAL;

Ok, it will ensure tbl_msc->length exists.

> 
>> +        if (tbl_msc->length < sizeof(*tbl_msc))
>> +            return -EINVAL;
>> +        if (tbl_msc->length > table_end - table_offset)
>> +            return -EINVAL;
>> +        table_offset += tbl_msc->length;
>> +
>> +        if (!tbl_msc->mmio_size)
>> +            continue;
>> +
>> +        count++;
>> +    }
>> +
>> +    return count;
>> +}
>> +
>> +/*
>> + * Call after ACPI devices have been created, which happens behind
>> acpi_scan_init()
>> + * called from subsys_initcall(). PCC requires the mailbox driver,
>> which is
>> + * initialised from postcore_initcall().
>> + */
>> +subsys_initcall_sync(acpi_mpam_parse);
>> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>> index 57fc8bc56166..4286e4af1092 100644
>> --- a/drivers/acpi/tables.c
>> +++ b/drivers/acpi/tables.c
>> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE]
>> __nonstring_array __initconst
>>       ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>>       ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>>       ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
>> -    ACPI_SIG_NBFT, ACPI_SIG_SWFT};
>> +    ACPI_SIG_NBFT, ACPI_SIG_SWFT, ACPI_SIG_MPAM};
>>     #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>>   diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
>> new file mode 100644
>> index 000000000000..a3828ef91aee
>> --- /dev/null
>> +++ b/include/linux/arm_mpam.h
>> @@ -0,0 +1,47 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright (C) 2025 Arm Ltd. */
>> +
>> +#ifndef __LINUX_ARM_MPAM_H
>> +#define __LINUX_ARM_MPAM_H
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/types.h>
>> +
>> +#define GLOBAL_AFFINITY        ~0
>> +
>> +struct mpam_msc;
>> +
>> +enum mpam_msc_iface {
>> +    MPAM_IFACE_MMIO,    /* a real MPAM MSC */
>> +    MPAM_IFACE_PCC,        /* a fake MPAM MSC */
>> +};
>> +
>> +enum mpam_class_types {
>> +    MPAM_CLASS_CACHE,       /* Caches, e.g. L2, L3 */
>> +    MPAM_CLASS_MEMORY,      /* Main memory */
>> +    MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
>> +};
>> +
>> +#ifdef CONFIG_ACPI_MPAM
>> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
>> +                  struct acpi_mpam_msc_node *tbl_msc);
>> +
>> +int acpi_mpam_count_msc(void);
>> +#else
>> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
>> +                        struct acpi_mpam_msc_node *tbl_msc)
>> +{
>> +    return -EINVAL;
>> +}
>> +
>> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>> +#endif
>> +
>> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>> +                  enum mpam_class_types type, u8 class_id,
>> +                  int component_id)
>> +{
>> +    return -EINVAL;
>> +}
>> +
>> +#endif /* __LINUX_ARM_MPAM_H */
> 
> Thanks,
> Gavin
> 

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
  2025-11-08  8:54   ` Gavin Shan
@ 2025-11-10 16:23   ` Jonathan Cameron
  2025-11-12  7:01   ` Shaopeng Tan (Fujitsu)
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:23 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On Fri, 7 Nov 2025 12:34:26 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> This happens in two stages. Platform devices are created first for the
> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
> to discover the RIS entries the MSC contains.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data about the RIS entries.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Nothing to add to Gavin's comments.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>





^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
  2025-11-08  8:54   ` Gavin Shan
  2025-11-10 16:23   ` Jonathan Cameron
@ 2025-11-12  7:01   ` Shaopeng Tan (Fujitsu)
  2025-11-12 14:55     ` Ben Horgan
  2025-11-13  2:16   ` Fenghua Yu
  2025-11-13  2:33   ` Fenghua Yu
  4 siblings, 1 reply; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  7:01 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

Hello Ben,

> From: James Morse <james.morse@arm.com>
> 
> Add code to parse the arm64 specific MPAM table, looking up the cache level
> from the PPTT and feeding the end result into the MPAM driver.
> 
> This happens in two stages. Platform devices are created first for the MSC
> devices. Once the driver probes it calls acpi_mpam_parse_resources() to
> discover the RIS entries the MSC contains.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update the
> MPAM driver with optional discovered data about the RIS entries.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> return irq from acpi_mpam_register_irq (Jonathan) err -> len rename
> (Jonathan) Move table initialisation after checking (Jonathan) Add sanity
> checking in acpi_mpam_count_msc() (Jonathan)
> ---
>  arch/arm64/Kconfig          |   1 +
>  drivers/acpi/arm64/Kconfig  |   3 +
>  drivers/acpi/arm64/Makefile |   1 +
>  drivers/acpi/arm64/mpam.c   | 403
> ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c       |   2 +-
>  include/linux/arm_mpam.h    |  47 +++++
>  6 files changed, 456 insertions(+), 1 deletion(-)  create mode 100644
> drivers/acpi/arm64/mpam.c  create mode 100644
> include/linux/arm_mpam.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 67015d51f7b5..c5e66d5d72cd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
> 
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>  	help
>  	  Memory System Resource Partitioning and Monitoring (MPAM) is an
>  	  optional extension to the Arm architecture that allows each diff --git
> a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index
> b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
> 
>  config ACPI_APMT
>  	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index
> 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>  obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>  obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>  obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>  obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>  obj-$(CONFIG_ARM_AMBA)		+= amba.o
>  obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c new
> file mode 100644 index 000000000000..c199944862ed
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,403 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the
> +driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE
> BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK
> GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK
> BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
> +#define
> ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID
> BIT(4)
> +
> +/*
> + * Encodings for the MSC node body interface type field.
> + * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
> +#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
> +
> +static bool _is_ppi_partition(u32 flags) {
> +	u32 aff_type, is_ppi;
> +	bool ret;
> +
> +	is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID,
> flags);
> +	if (!is_ppi)
> +		return false;
> +
> +	aff_type =
> FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
> +	ret = (aff_type ==
> ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
> +	if (ret)
> +		pr_err_once("Partitioned interrupts not supported\n");
> +
> +	return ret;
> +}
> +
> +static int acpi_mpam_register_irq(struct platform_device *pdev,
> +				  int intid, u32 flags)
> +{
> +	int irq;
> +	u32 int_type;
> +	int trigger;
> +
> +	if (!intid)
> +		return -EINVAL;
> +
> +	if (_is_ppi_partition(flags))
> +		return -EINVAL;
> +
> +	trigger = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);
> +	int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
> +	if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return -EINVAL;
> +
> +	irq = acpi_register_gsi(&pdev->dev, intid, trigger,
> ACPI_ACTIVE_HIGH);
> +	if (irq <= 0)
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> intid);
> +
> +	return irq;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx) {
> +	u32 flags, intid;
> +	int irq;
> +
> +	intid = tbl_msc->overflow_interrupt;
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	irq = acpi_mpam_register_irq(pdev, intid, flags);
> +	if (irq > 0)
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq,
> "overflow");
> +
> +	intid = tbl_msc->error_interrupt;
> +	flags = tbl_msc->error_interrupt_flags;
> +	irq = acpi_mpam_register_irq(pdev, intid, flags);
> +	if (irq > 0)
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error"); }
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res) {
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%d) for cache with id %u\n",
> level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index,
> MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid =
> pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE) {
> +			pr_debug("Bad proxmity domain %lld, using node 0
> instead\n",
> +
> res->locator.memory_locator.proximity_domain);
> +			nid = 0;
> +		}
> +		return mpam_ris_create(msc, res->ris_index,
> MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and are treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc) {
> +	int i, err;
> +	char *ptr, *table_end;
> +	struct acpi_mpam_resource_node *resource;
> +
> +	ptr = (char *)(tbl_msc + 1);
> +	table_end = ptr + tbl_msc->length;
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		u64 max_deps, remaining_table;
> +
> +		if (ptr + sizeof(*resource) > table_end)
> +			return -EINVAL;
> +
> +		resource = (struct acpi_mpam_resource_node *)ptr;
> +
> +		remaining_table = table_end - ptr;
> +		max_deps = remaining_table / sizeof(struct
> acpi_mpam_func_deps);
> +		if (resource->num_functional_deps > max_deps) {
> +			pr_debug("MSC has impossible number of functional
> dependencies\n");
> +			return -EINVAL;
> +		}
> +
> +		err = acpi_mpam_parse_resource(msc, resource);
> +		if (err)
> +			return err;
> +
> +		ptr += sizeof(*resource);
> +		ptr += resource->num_functional_deps * sizeof(struct
> acpi_mpam_func_deps);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Creates the device power management link and returns true if the
> + * acpi id is valid and usable for cpu affinity.  This is the case
> + * when the linked device is a processor or a processor container.
> + */
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node
> *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int len;
> +
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	len = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (len >= sizeof(uid)) {
> +		pr_debug("Failed to convert uid of device for power
> management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev,
> DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case ACPI_MPAM_MSC_IFACE_MMIO:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case ACPI_MPAM_MSC_IFACE_PCC:
> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static struct platform_device * __init acpi_mpam_parse_msc(struct
> +acpi_mpam_msc_node *tbl_msc) {
> +	struct platform_device *pdev __free(platform_device_put) =
> +		platform_device_alloc("mpam_msc", tbl_msc->identifier);
> +	int next_res = 0, next_prop = 0, err;
> +	/* pcc, nrdy, affinity and a sentinel */
> +	struct property_entry props[4] = { 0 };
> +	/* mmio, 2xirq, no sentinel. */
> +	struct resource res[3] = { 0 };
> +	struct acpi_device *companion;
> +	enum mpam_msc_iface iface;
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (!pdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/* Some power management is described in the namespace: */
> +	err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);

It's a bit strange to store the uid length in the variable err.
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

> +	if (err > 0 && err < sizeof(uid)) {
> +		companion = acpi_dev_get_first_match_dev("ARMHAA5C",
> uid, -1);
> +		if (companion)
> +			ACPI_COMPANION_SET(&pdev->dev, companion);
> +		else
> +			pr_debug("MSC.%u: missing namespace entry\n",
> +				 tbl_msc->identifier);
> +	}
> +
> +	if (decode_interface_type(tbl_msc, &iface)) {
> +		pr_debug("MSC.%u: unknown interface type\n",
> tbl_msc->identifier);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (iface == MPAM_IFACE_MMIO)
> +		res[next_res++] =
> DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +
> tbl_msc->mmio_size,
> +						       "MPAM:MSC");
> +	else if (iface == MPAM_IFACE_PCC)
> +		props[next_prop++] =
> PROPERTY_ENTRY_U32("pcc-channel",
> +
> 	tbl_msc->base_address);
> +
> +	acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +
> +	WARN_ON_ONCE(next_res > ARRAY_SIZE(res));
> +	err = platform_device_add_resources(pdev, res, next_res);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +						tbl_msc->max_nrdy_usec);
> +
> +	/*
> +	 * The MSC's CPU affinity is described via its linked power
> +	 * management device, but only if it points at a Processor or
> +	 * Processor Container.
> +	 */
> +	if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
> +		props[next_prop++] =
> PROPERTY_ENTRY_U32("cpu_affinity", acpi_id);
> +
> +	WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));
> +	err = device_create_managed_software_node(&pdev->dev, props,
> NULL);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	/*
> +	 * Stash the table entry for acpi_mpam_parse_resources() to discover
> +	 * what this MSC controls.
> +	 */
> +	err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	err = platform_device_add(pdev);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	return_ptr(pdev);
> +}
> +
> +static int __init acpi_mpam_parse(void) {
> +	char *table_end, *table_offset;
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	struct platform_device *pdev;
> +
> +	if (acpi_disabled || !system_supports_mpam())
> +		return 0;
> +
> +	struct acpi_table_header *table __free(acpi_put_table) =
> +		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_offset = (char *)(table + 1);
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		if (table_offset > table_end) {
> +			pr_err("MSC entry overlaps end of ACPI table\n");
> +			return -EINVAL;
> +		}
> +
> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the MSC structure. This MSC will still be counted by
> +		 * acpi_mpam_count_msc(), meaning the MPAM driver can't
> probe
> +		 * against all MSC, and will never be enabled. There is no way
> +		 * to enable it safely, because we cannot determine safe
> +		 * system-wide partid and pmg ranges in this situation.
> +		 */
> +		if (tbl_msc->reserved || tbl_msc->reserved1 ||
> tbl_msc->reserved2) {
> +			pr_err_once("Unrecognised MSC, MPAM not
> usable\n");
> +			pr_debug("MSC.%u: reserved field set\n",
> tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		if (!tbl_msc->mmio_size) {
> +			pr_debug("MSC.%u: marked as disabled\n",
> tbl_msc->identifier);
> +			continue;
> +		}
> +
> +		pdev = acpi_mpam_parse_msc(tbl_msc);
> +		if (IS_ERR(pdev))
> +			return PTR_ERR(pdev);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * acpi_mpam_count_msc() - Count the number of MSC described by
> firmware.
> + *
> + * Returns the number of MSC, or zero for an error.
> + *
> + * This can be called before or in parallel with acpi_mpam_parse().
> + */
> +int acpi_mpam_count_msc(void)
> +{
> +	char *table_end, *table_offset;
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (acpi_disabled || !system_supports_mpam())
> +		return 0;
> +
> +	struct acpi_table_header *table __free(acpi_put_table) =
> +		acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_offset = (char *)(table + 1);
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +		if (tbl_msc->length > table_end - table_offset)
> +			return -EINVAL;
> +		table_offset += tbl_msc->length;
> +
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		count++;
> +	}
> +
> +	return count;
> +}
> +
> +/*
> + * Call after ACPI devices have been created, which happens behind
> +acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver,
> +which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index
> 57fc8bc56166..4286e4af1092 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE]
> __nonstring_array __initconst
>  	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>  	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>  	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> -	ACPI_SIG_NBFT, ACPI_SIG_SWFT};
> +	ACPI_SIG_NBFT, ACPI_SIG_SWFT, ACPI_SIG_MPAM};
> 
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
> 
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h new file
> mode 100644 index 000000000000..a3828ef91aee
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +#define GLOBAL_AFFINITY		~0
> +
> +struct mpam_msc;
> +
> +enum mpam_msc_iface {
> +	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
> +	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
> +};
> +
> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Caches, e.g. L2, L3 */
> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else
> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +					    struct acpi_mpam_msc_node
> *tbl_msc) {
> +	return -EINVAL;
> +}
> +
> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; } #endif
> +
> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8
> class_id,
> +				  int component_id)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* __LINUX_ARM_MPAM_H */
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-12  7:01   ` Shaopeng Tan (Fujitsu)
@ 2025-11-12 14:55     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 14:55 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

Hi Shaopeng,

On 11/12/25 07:01, Shaopeng Tan (Fujitsu) wrote:
> Hello Ben,
> 
>> From: James Morse <james.morse@arm.com>
>>
>> Add code to parse the arm64 specific MPAM table, looking up the cache level
>> from the PPTT and feeding the end result into the MPAM driver.
>>
>> This happens in two stages. Platform devices are created first for the MSC
>> devices. Once the driver probes it calls acpi_mpam_parse_resources() to
>> discover the RIS entries the MSC contains.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update the
>> MPAM driver with optional discovered data about the RIS entries.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
>> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

>> +static struct platform_device * __init acpi_mpam_parse_msc(struct
>> +acpi_mpam_msc_node *tbl_msc) {
>> +	struct platform_device *pdev __free(platform_device_put) =
>> +		platform_device_alloc("mpam_msc", tbl_msc->identifier);
>> +	int next_res = 0, next_prop = 0, err;
>> +	/* pcc, nrdy, affinity and a sentinel */
>> +	struct property_entry props[4] = { 0 };
>> +	/* mmio, 2xirq, no sentinel. */
>> +	struct resource res[3] = { 0 };
>> +	struct acpi_device *companion;
>> +	enum mpam_msc_iface iface;
>> +	char uid[16];
>> +	u32 acpi_id;
>> +
>> +	if (!pdev)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	/* Some power management is described in the namespace: */
>> +	err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> 
> It's a bit strange to store the uid length in the variable err.

A little, yes. The value is only used for error checking and it's not
that uncommon so I'll leave it as is.

linux$ git grep 'err = snprintf' | wc -l
17

> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> 

Thanks!
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
                     ` (2 preceding siblings ...)
  2025-11-12  7:01   ` Shaopeng Tan (Fujitsu)
@ 2025-11-13  2:16   ` Fenghua Yu
  2025-11-13 12:09     ` Ben Horgan
  2025-11-13  2:33   ` Fenghua Yu
  4 siblings, 1 reply; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13  2:16 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi, Ben and James,

On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
> 
> This happens in two stages. Platform devices are created first for the
> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
> to discover the RIS entries the MSC contains.
> 
> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
> the MPAM driver with optional discovered data about the RIS entries.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> return irq from acpi_mpam_register_irq (Jonathan)
> err -> len rename (Jonathan)
> Move table initialisation after checking (Jonathan)
> Add sanity checking in acpi_mpam_count_msc() (Jonathan)
> ---
>   arch/arm64/Kconfig          |   1 +
>   drivers/acpi/arm64/Kconfig  |   3 +
>   drivers/acpi/arm64/Makefile |   1 +
>   drivers/acpi/arm64/mpam.c   | 403 ++++++++++++++++++++++++++++++++++++
>   drivers/acpi/tables.c       |   2 +-
>   include/linux/arm_mpam.h    |  47 +++++
>   6 files changed, 456 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/acpi/arm64/mpam.c
>   create mode 100644 include/linux/arm_mpam.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 67015d51f7b5..c5e66d5d72cd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>   
>   config ARM64_MPAM
>   	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>   	help
>   	  Memory System Resource Partitioning and Monitoring (MPAM) is an
>   	  optional extension to the Arm architecture that allows each
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>   
>   config ACPI_APMT
>   	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>   obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>   obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>   obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>   obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>   obj-$(CONFIG_ARM_AMBA)		+= amba.o
>   obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..c199944862ed
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,403 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE                              BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                         GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                        0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK                BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR           0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER 1
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID                    BIT(4)
> +
> +/*
> + * Encodings for the MSC node body interface type field.
> + * See 2.1 MPAM MSC node, Table 4 of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IFACE_MMIO   0x00
> +#define ACPI_MPAM_MSC_IFACE_PCC    0x0a
> +
> +static bool _is_ppi_partition(u32 flags)
> +{
> +	u32 aff_type, is_ppi;
> +	bool ret;
> +
> +	is_ppi = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_VALID, flags);
> +	if (!is_ppi)
> +		return false;
> +
> +	aff_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_MASK, flags);
> +	ret = (aff_type == ACPI_MPAM_MSC_IRQ_AFFINITY_TYPE_PROCESSOR_CONTAINER);
> +	if (ret)
> +		pr_err_once("Partitioned interrupts not supported\n");
> +
> +	return ret;
> +}
> +
> +static int acpi_mpam_register_irq(struct platform_device *pdev,
> +				  int intid, u32 flags)
> +{
> +	int irq;
> +	u32 int_type;
> +	int trigger;
> +
> +	if (!intid)
> +		return -EINVAL;
> +
> +	if (_is_ppi_partition(flags))
> +		return -EINVAL;
> +
> +	trigger = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE, flags);
> +	int_type = FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags);
> +	if (int_type != ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return -EINVAL;
> +
> +	irq = acpi_register_gsi(&pdev->dev, intid, trigger, ACPI_ACTIVE_HIGH);
> +	if (irq <= 0)
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n", intid);
> +
> +	return irq;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, intid;
> +	int irq;
> +
> +	intid = tbl_msc->overflow_interrupt;
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	irq = acpi_mpam_register_irq(pdev, intid, flags);
> +	if (irq > 0)
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> +
> +	intid = tbl_msc->error_interrupt;
> +	flags = tbl_msc->error_interrupt_flags;
> +	irq = acpi_mpam_register_irq(pdev, intid, flags);
> +	if (irq > 0)
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE) {
> +			pr_debug("Bad proxmity domain %lld, using node 0 instead\n",

Typo.
s/proxmity/proximity/

> +				 res->locator.memory_locator.proximity_domain);
> +			nid = 0;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and are treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	int i, err;
> +	char *ptr, *table_end;
> +	struct acpi_mpam_resource_node *resource;
> +
> +	ptr = (char *)(tbl_msc + 1);
> +	table_end = ptr + tbl_msc->length;

tbl_msc->length equals size of the ENTIRE msc node. ptr points to the 
end of tbl_msc. ptr + tbl_msc->length is past the end of the msc node. 
This will access data outside of this MSC node.

Better to change to:
+	table_end = (char *)tbl_msc + tbl_msc->length;

> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		u64 max_deps, remaining_table;
> +
> +		if (ptr + sizeof(*resource) > table_end)
> +			return -EINVAL;
> +
> +		resource = (struct acpi_mpam_resource_node *)ptr;
> +
> +		remaining_table = table_end - ptr;
> +		max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
> +		if (resource->num_functional_deps > max_deps) {
> +			pr_debug("MSC has impossible number of functional dependencies\n");
> +			return -EINVAL;
> +		}
> +
> +		err = acpi_mpam_parse_resource(msc, resource);
> +		if (err)
> +			return err;
> +
> +		ptr += sizeof(*resource);
> +		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Creates the device power management link and returns true if the
> + * acpi id is valid and usable for cpu affinity.  This is the case
> + * when the linked device is a processor or a processor container.
> + */
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char uid[11];
> +	int len;
> +
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	len = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (len >= sizeof(uid)) {
> +		pr_debug("Failed to convert uid of device for power management.");
> +		return acpi_id_valid;
> +	}
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);

Refcount leak here?

Refcount of the device object pointed by buddy is not released and 
refcount leaks.

Better to change to:
+	if (buddy) {
+		device_link_add(...);
+		acpi_dev_put(buddy);  <====== release refcount here
+	}

or free refcount automatically:
+DEFINE_FREE(acpi_dev_put, struct acpi_device *, if (_T) acpi_dev_put(_T))
...
+	struct acpi_device *buddy __free(acpi_dev_put);
...

> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case ACPI_MPAM_MSC_IFACE_MMIO:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case ACPI_MPAM_MSC_IFACE_PCC:
> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static struct platform_device * __init acpi_mpam_parse_msc(struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	struct platform_device *pdev __free(platform_device_put) =
> +		platform_device_alloc("mpam_msc", tbl_msc->identifier);
> +	int next_res = 0, next_prop = 0, err;
> +	/* pcc, nrdy, affinity and a sentinel */
> +	struct property_entry props[4] = { 0 };
> +	/* mmio, 2xirq, no sentinel. */
> +	struct resource res[3] = { 0 };
> +	struct acpi_device *companion;
> +	enum mpam_msc_iface iface;
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (!pdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/* Some power management is described in the namespace: */
> +	err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +	if (err > 0 && err < sizeof(uid)) {
> +		companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> +		if (companion)
> +			ACPI_COMPANION_SET(&pdev->dev, companion);

Ditto. companion's refcount leak here as well?

> +		else
> +			pr_debug("MSC.%u: missing namespace entry\n",
> +				 tbl_msc->identifier);
> +	}
> +
> +	if (decode_interface_type(tbl_msc, &iface)) {
> +		pr_debug("MSC.%u: unknown interface type\n", tbl_msc->identifier);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (iface == MPAM_IFACE_MMIO)
> +		res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +						       tbl_msc->mmio_size,
> +						       "MPAM:MSC");
> +	else if (iface == MPAM_IFACE_PCC)
> +		props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> +							tbl_msc->base_address);
> +
> +	acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +
> +	WARN_ON_ONCE(next_res > ARRAY_SIZE(res));

Not sure if this WARN_ON_ONCE() is really helpful.

Even before this WARN happens, previously res[next_res] accesseing 
outside of res[] may hit panic or data corruption already.

Maybe it's better to add a helper to access res[] and report error when 
accessing out of res[] scope. A few places can call the helper to access 
res[]:

+static int add_resource(struct resource *res, int *idx, int max,
+			struct resource new_res)
+{
+	if (*idx >= max) {
+		pr_err("Too many resources (max %d)\n", max);
+		return -ENOSPC;
+	}
+	res[(*idx)++] = new_res;
+	return 0;
+}

Then can call the helper to replace res[next_res++]:
+	if (iface == MPAM_IFACE_MMIO) {
+		err = add_resource(res, &next_res, ARRAY_SIZE(res),
+			     DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+						       tbl_msc, +							mmio_size,
+						       "MPAM:MSC"));
+		if (err)
+			return ERR_PTR(-ENOSPC);
+	}

> +	err = platform_device_add_resources(pdev, res, next_res);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +						tbl_msc->max_nrdy_usec);
> +
> +	/*
> +	 * The MSC's CPU affinity is described via its linked power
> +	 * management device, but only if it points at a Processor or
> +	 * Processor Container.
> +	 */
> +	if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
> +		props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity", acpi_id);
> +
> +	WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));

Ditto for this WARN here?

[SNIP]

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-13  2:16   ` Fenghua Yu
@ 2025-11-13 12:09     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-13 12:09 UTC (permalink / raw)
  To: Fenghua Yu, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Fenghua,

On 11/13/25 02:16, Fenghua Yu wrote:
> Hi, Ben and James,
> 
> On 11/7/25 04:34, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
>>
>> This happens in two stages. Platform devices are created first for the
>> MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
>> to discover the RIS entries the MSC contains.
>>
>> For now the MPAM hook mpam_ris_create() is stubbed out, but will update
>> the MPAM driver with optional discovered data about the RIS entries.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
>> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[...]

>> +        if (nid == NUMA_NO_NODE) {
>> +            pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
> 
> Typo.
> s/proxmity/proximity/

Done.

> 
>> +                 res->locator.memory_locator.proximity_domain);
>> +            nid = 0;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +                       255, nid);
>> +    default:
>> +        /* These get discovered later and are treated as unknown */
>> +        return 0;
>> +    }
>> +}
>> +
>> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
>> +                  struct acpi_mpam_msc_node *tbl_msc)
>> +{
>> +    int i, err;
>> +    char *ptr, *table_end;
>> +    struct acpi_mpam_resource_node *resource;
>> +
>> +    ptr = (char *)(tbl_msc + 1);
>> +    table_end = ptr + tbl_msc->length;
> 
> tbl_msc->length equals size of the ENTIRE msc node. ptr points to the
> end of tbl_msc. ptr + tbl_msc->length is past the end of the msc node.
> This will access data outside of this MSC node.
> 
> Better to change to:
> +    table_end = (char *)tbl_msc + tbl_msc->length;

Yes, makes sense.

> 
>> +    for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
>> +        u64 max_deps, remaining_table;
>> +
>> +        if (ptr + sizeof(*resource) > table_end)
>> +            return -EINVAL;
>> +
>> +        resource = (struct acpi_mpam_resource_node *)ptr;
>> +
>> +        remaining_table = table_end - ptr;
>> +        max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
>> +        if (resource->num_functional_deps > max_deps) {
>> +            pr_debug("MSC has impossible number of functional
>> dependencies\n");
>> +            return -EINVAL;
>> +        }
>> +
>> +        err = acpi_mpam_parse_resource(msc, resource);
>> +        if (err)
>> +            return err;
>> +
>> +        ptr += sizeof(*resource);
>> +        ptr += resource->num_functional_deps * sizeof(struct
>> acpi_mpam_func_deps);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Creates the device power management link and returns true if the
>> + * acpi id is valid and usable for cpu affinity.  This is the case
>> + * when the linked device is a processor or a processor container.
>> + */
>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>> +                     struct platform_device *pdev,
>> +                     u32 *acpi_id)
>> +{
>> +    char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1] = { 0 };
>> +    bool acpi_id_valid = false;
>> +    struct acpi_device *buddy;
>> +    char uid[11];
>> +    int len;
>> +
>> +    memcpy(hid, &tbl_msc->hardware_id_linked_device,
>> +           sizeof(tbl_msc->hardware_id_linked_device));
>> +
>> +    if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>> +        *acpi_id = tbl_msc->instance_id_linked_device;
>> +        acpi_id_valid = true;
>> +    }
>> +
>> +    len = snprintf(uid, sizeof(uid), "%u",
>> +               tbl_msc->instance_id_linked_device);
>> +    if (len >= sizeof(uid)) {
>> +        pr_debug("Failed to convert uid of device for power
>> management.");
>> +        return acpi_id_valid;
>> +    }
>> +
>> +    buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
>> +    if (buddy)
>> +        device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> 
> Refcount leak here?
> 
> Refcount of the device object pointed by buddy is not released and
> refcount leaks.
> > Better to change to:
> +    if (buddy) {
> +        device_link_add(...);
> +        acpi_dev_put(buddy);  <====== release refcount here
> +    }


Yes, device_link_add() calls get_device() to increment the refcount and
so the acpi_dev_put() is required.

> 
> or free refcount automatically:
> +DEFINE_FREE(acpi_dev_put, struct acpi_device *, if (_T) acpi_dev_put(_T))
> ...
> +    struct acpi_device *buddy __free(acpi_dev_put);
> ...
> 
>> +
>> +    return acpi_id_valid;
>> +}
>> +
>> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
>> +                 enum mpam_msc_iface *iface)
>> +{
>> +    switch (tbl_msc->interface_type) {
>> +    case ACPI_MPAM_MSC_IFACE_MMIO:
>> +        *iface = MPAM_IFACE_MMIO;
>> +        return 0;
>> +    case ACPI_MPAM_MSC_IFACE_PCC:
>> +        *iface = MPAM_IFACE_PCC;
>> +        return 0;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +}
>> +
>> +static struct platform_device * __init acpi_mpam_parse_msc(struct
>> acpi_mpam_msc_node *tbl_msc)
>> +{
>> +    struct platform_device *pdev __free(platform_device_put) =
>> +        platform_device_alloc("mpam_msc", tbl_msc->identifier);
>> +    int next_res = 0, next_prop = 0, err;
>> +    /* pcc, nrdy, affinity and a sentinel */
>> +    struct property_entry props[4] = { 0 };
>> +    /* mmio, 2xirq, no sentinel. */
>> +    struct resource res[3] = { 0 };
>> +    struct acpi_device *companion;
>> +    enum mpam_msc_iface iface;
>> +    char uid[16];
>> +    u32 acpi_id;
>> +
>> +    if (!pdev)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    /* Some power management is described in the namespace: */
>> +    err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
>> +    if (err > 0 && err < sizeof(uid)) {
>> +        companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
>> +        if (companion)
>> +            ACPI_COMPANION_SET(&pdev->dev, companion);
> 
> Ditto. companion's refcount leak here as well?

Looks like it. Added a acpi_dev_put().

> 
>> +        else
>> +            pr_debug("MSC.%u: missing namespace entry\n",
>> +                 tbl_msc->identifier);
>> +    }
>> +
>> +    if (decode_interface_type(tbl_msc, &iface)) {
>> +        pr_debug("MSC.%u: unknown interface type\n", tbl_msc-
>> >identifier);
>> +        return ERR_PTR(-EINVAL);
>> +    }
>> +
>> +    if (iface == MPAM_IFACE_MMIO)
>> +        res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
>> +                               tbl_msc->mmio_size,
>> +                               "MPAM:MSC");
>> +    else if (iface == MPAM_IFACE_PCC)
>> +        props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
>> +                            tbl_msc->base_address);
>> +
>> +    acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
>> +
>> +    WARN_ON_ONCE(next_res > ARRAY_SIZE(res));
> 
> Not sure if this WARN_ON_ONCE() is really helpful.
> 
> Even before this WARN happens, previously res[next_res] accesseing
> outside of res[] may hit panic or data corruption already.
> 
> Maybe it's better to add a helper to access res[] and report error when
> accessing out of res[] scope. A few places can call the helper to access
> res[]:

This warning looks to be there just to catch programming errors. There
are 3 places next_res could be incremented and res[] has 3 entries so I
don't see how an out of bounds access could occur without code changes
and as accesses are made using res[next_res++], or equivalent, it is
likely this warning would be hit if a new res is added without
remembering to increase the size of the array. Given this, I'll keep
this code as it is.

> 
> +static int add_resource(struct resource *res, int *idx, int max,
> +            struct resource new_res)
> +{
> +    if (*idx >= max) {
> +        pr_err("Too many resources (max %d)\n", max);
> +        return -ENOSPC;
> +    }
> +    res[(*idx)++] = new_res;
> +    return 0;
> +}
> 
> Then can call the helper to replace res[next_res++]:
> +    if (iface == MPAM_IFACE_MMIO) {
> +        err = add_resource(res, &next_res, ARRAY_SIZE(res),
> +                 DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +                               tbl_msc, +                           
> mmio_size,
> +                               "MPAM:MSC"));
> +        if (err)
> +            return ERR_PTR(-ENOSPC);
> +    }
> 
>> +    err = platform_device_add_resources(pdev, res, next_res);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>> +    props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
>> +                        tbl_msc->max_nrdy_usec);
>> +
>> +    /*
>> +     * The MSC's CPU affinity is described via its linked power
>> +     * management device, but only if it points at a Processor or
>> +     * Processor Container.
>> +     */
>> +    if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id))
>> +        props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
>> acpi_id);
>> +
>> +    WARN_ON_ONCE(next_prop > ARRAY_SIZE(props));
> 
> Ditto for this WARN here?

Similarly to above, this is just guarding against later adding more
properties. This one can be tightened though as
device_create_managed_software_node() expects a zero terminated array.
Changing to:

WARN_ON_ONCE(next_prop > ARRAY_SIZE(props) - 1);

> 
> [SNIP]
> 
> Thanks.
> 
> -Fenghua
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
                     ` (3 preceding siblings ...)
  2025-11-13  2:16   ` Fenghua Yu
@ 2025-11-13  2:33   ` Fenghua Yu
  2025-11-13 14:24     ` Ben Horgan
  4 siblings, 1 reply; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13  2:33 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi, Ben and James,

On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>

[SNIP]

> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%d) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE) {
> +			pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
> +				 res->locator.memory_locator.proximity_domain);
> +			nid = 0;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);

nit.

Seems "255" is an ad-hoc value which won't be used for memory type?
The "class_id" in mpam_ris_create() is confused: it may be level for 
cache or it may be 255 for memory.

To be clearer, maybe it's better to define and enum for class_id? 
Something like:
enum mpam_class_id {
	CLASS_ID_LEVEL_1 = 1,
	CLASS_ID_LEVEL_2,
	CLASS_ID_LEVEL_3,
	CLASS_ID_NOT_USED = 255  <--- for memory type
};

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/33] ACPI / MPAM: Parse the MPAM table
  2025-11-13  2:33   ` Fenghua Yu
@ 2025-11-13 14:24     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-13 14:24 UTC (permalink / raw)
  To: Fenghua Yu, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Fenghua,

On 11/13/25 02:33, Fenghua Yu wrote:
> Hi, Ben and James,
> 
> On 11/7/25 04:34, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
> 
> [SNIP]
> 
>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>> +                    struct acpi_mpam_resource_node *res)
>> +{
>> +    int level, nid;
>> +    u32 cache_id;
>> +
>> +    switch (res->locator_type) {
>> +    case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>> +        cache_id = res->locator.cache_locator.cache_reference;
>> +        level = find_acpi_cache_level_from_id(cache_id);
>> +        if (level <= 0) {
>> +            pr_err_once("Bad level (%d) for cache with id %u\n",
>> level, cache_id);
>> +            return -EINVAL;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>> +                       level, cache_id);
>> +    case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>> +        nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>> +        if (nid == NUMA_NO_NODE) {
>> +            pr_debug("Bad proxmity domain %lld, using node 0 instead\n",
>> +                 res->locator.memory_locator.proximity_domain);
>> +            nid = 0;
>> +        }
>> +        return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +                       255, nid);
> 
> nit.
> 
> Seems "255" is an ad-hoc value which won't be used for memory type?
> The "class_id" in mpam_ris_create() is confused: it may be level for
> cache or it may be 255 for memory.
> 
> To be clearer, maybe it's better to define and enum for class_id?
> Something like:
> enum mpam_class_id {
>     CLASS_ID_LEVEL_1 = 1,
>     CLASS_ID_LEVEL_2,
>     CLASS_ID_LEVEL_3,
>     CLASS_ID_NOT_USED = 255  <--- for memory type
> };

I've added a new define, MPAM_CLASS_ID_DEFAULT, which can be used for
memory and anything else that only needs one class_id.

> 
> Thanks.
> 
> -Fenghua

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (8 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-08  9:28   ` Gavin Shan
                     ` (3 more replies)
  2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
                   ` (29 subsequent siblings)
  39 siblings, 4 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.

Start with driver probe/remove and mapping the MSC.

CC: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
From Jonathan:
Include cleanup
Use devm_mutex_init()
Add an ERR_CAST()
Fenghua:
Return zero from update_msc_accessibility()
Additional:
Fail probe if MSC doesn't have an MMIO interface
---
 arch/arm64/Kconfig              |   1 +
 drivers/Kconfig                 |   2 +
 drivers/Makefile                |   1 +
 drivers/resctrl/Kconfig         |  15 +++
 drivers/resctrl/Makefile        |   4 +
 drivers/resctrl/mpam_devices.c  | 194 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  49 ++++++++
 7 files changed, 266 insertions(+)
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c5e66d5d72cd..004d58cfbff8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
 	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
 
 source "drivers/cdx/Kconfig"
 
+source "drivers/resctrl/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 8e1ffa4358d5..20eb17596b89 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,6 +194,7 @@ obj-$(CONFIG_HTE)		+= hte/
 obj-$(CONFIG_DRM_ACCEL)		+= accel/
 obj-$(CONFIG_CDX_BUS)		+= cdx/
 obj-$(CONFIG_DPLL)		+= dpll/
+obj-y				+= resctrl/
 
 obj-$(CONFIG_DIBS)		+= dibs/
 obj-$(CONFIG_S390)		+= s390/
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..ef2f3adf64a9
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,15 @@
+menuconfig ARM64_MPAM_DRIVER
+	bool "MPAM driver"
+	depends on ARM64 && ARM64_MPAM && EXPERT
+	help
+	  Memory System Resource Partitioning and Monitoring (MPAM) driver for
+	  System IP, e,g. caches and memory controllers.
+
+if ARM64_MPAM_DRIVER
+
+config ARM64_MPAM_DRIVER_DEBUG
+	bool "Enable debug messages from the MPAM driver"
+	help
+	  Say yes here to enable debug messages from the MPAM driver.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..898199dcf80d
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
+mpam-y						+= mpam_devices.o
+
+ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..6c6be133d73a
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+struct srcu_struct mpam_srcu;
+
+/*
+ * Number of MSCs that have been probed. Once all MSC have been probed MPAM
+ * can be enabled.
+ */
+static atomic_t mpam_num_msc;
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+	u32 affinity_id;
+	int err;
+
+	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+				       &affinity_id);
+	if (err)
+		cpumask_copy(&msc->accessibility, cpu_possible_mask);
+	else
+		acpi_pptt_get_cpus_from_container(affinity_id,
+						  &msc->accessibility);
+	return 0;
+}
+
+static int fw_num_msc;
+
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+	struct platform_device *pdev = msc->pdev;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&msc->all_msc_list);
+	platform_set_drvdata(pdev, NULL);
+}
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+	struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+	if (!msc)
+		return;
+
+	mutex_lock(&mpam_list_lock);
+	mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+
+	synchronize_srcu(&mpam_srcu);
+}
+
+static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	u32 tmp;
+	struct mpam_msc *msc;
+	struct resource *msc_res;
+	struct device *dev = &pdev->dev;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+	if (!msc)
+		return ERR_PTR(-ENOMEM);
+
+	err = devm_mutex_init(dev, &msc->probe_lock);
+	if (err)
+		return ERR_PTR(err);
+	err = devm_mutex_init(dev, &msc->part_sel_lock);
+	if (err)
+		return ERR_PTR(err);
+	msc->id = pdev->id;
+	msc->pdev = pdev;
+	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
+	INIT_LIST_HEAD_RCU(&msc->ris);
+
+	err = update_msc_accessibility(msc);
+	if (err)
+		return ERR_PTR(err);
+	if (cpumask_empty(&msc->accessibility)) {
+		dev_err_once(dev, "MSC is not accessible from any CPU!");
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
+		msc->iface = MPAM_IFACE_MMIO;
+	else
+		msc->iface = MPAM_IFACE_PCC;
+
+	if (msc->iface == MPAM_IFACE_MMIO) {
+		void __iomem *io;
+
+		io = devm_platform_get_and_ioremap_resource(pdev, 0,
+							    &msc_res);
+		if (IS_ERR(io)) {
+			dev_err_once(dev, "Failed to map MSC base address\n");
+			return ERR_CAST(io);
+		}
+		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+		msc->mapped_hwpage = io;
+	} else {
+		return ERR_PTR(-ENOENT);
+	}
+
+	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
+	platform_set_drvdata(pdev, msc);
+
+	return msc;
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	struct mpam_msc *msc = NULL;
+	void *plat_data = pdev->dev.platform_data;
+
+	mutex_lock(&mpam_list_lock);
+	msc = do_mpam_msc_drv_probe(pdev);
+	mutex_unlock(&mpam_list_lock);
+	if (!IS_ERR(msc)) {
+		/* Create RIS entries described by firmware */
+		err = acpi_mpam_parse_resources(msc, plat_data);
+		if (err)
+			mpam_msc_drv_remove(pdev);
+	} else {
+		err = PTR_ERR(msc);
+	}
+
+	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
+		pr_info("Discovered all MSC\n");
+
+	return err;
+}
+
+static struct platform_driver mpam_msc_driver = {
+	.driver = {
+		.name = "mpam_msc",
+	},
+	.probe = mpam_msc_drv_probe,
+	.remove = mpam_msc_drv_remove,
+};
+
+static int __init mpam_msc_driver_init(void)
+{
+	if (!system_supports_mpam())
+		return -EOPNOTSUPP;
+
+	init_srcu_struct(&mpam_srcu);
+
+	fw_num_msc = acpi_mpam_count_msc();
+
+	if (fw_num_msc <= 0) {
+		pr_err("No MSC devices found in firmware\n");
+		return -EINVAL;
+	}
+
+	return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..540066903eca
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2025 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct platform_device;
+
+struct mpam_msc {
+	/* member of mpam_all_msc */
+	struct list_head	all_msc_list;
+
+	int			id;
+	struct platform_device	*pdev;
+
+	/* Not modified after mpam_is_enabled() becomes true */
+	enum mpam_msc_iface	iface;
+	u32			nrdy_usec;
+	cpumask_t		accessibility;
+
+	/*
+	 * probe_lock is only taken during discovery. After discovery these
+	 * properties become read-only and the lists are protected by SRCU.
+	 */
+	struct mutex		probe_lock;
+	unsigned long		ris_idxs;
+	u32			ris_max;
+
+	/* mpam_msc_ris of this component */
+	struct list_head	ris;
+
+	/*
+	 * part_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+	 * by RIS).
+	 * If needed, take msc->probe_lock first.
+	 */
+	struct mutex		part_sel_lock;
+
+	void __iomem		*mapped_hwpage;
+	size_t			mapped_hwpage_sz;
+};
+#endif /* MPAM_INTERNAL_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
@ 2025-11-08  9:28   ` Gavin Shan
  2025-11-10 16:44     ` Jonathan Cameron
  2025-11-12 15:32     ` Ben Horgan
  2025-11-10 16:58   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-08  9:28 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 

I'm not sure if below commit log is more clearer as I'm not a English
native speaker:

MPAM probing is convoluted. MSCs that are integrated to a set of CPUs
may only be accessible from those CPUs, ...

> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
>  From Jonathan:
> Include cleanup
> Use devm_mutex_init()
> Add an ERR_CAST()
> Fenghua:
> Return zero from update_msc_accessibility()
> Additional:
> Fail probe if MSC doesn't have an MMIO interface
> ---
>   arch/arm64/Kconfig              |   1 +
>   drivers/Kconfig                 |   2 +
>   drivers/Makefile                |   1 +
>   drivers/resctrl/Kconfig         |  15 +++
>   drivers/resctrl/Makefile        |   4 +
>   drivers/resctrl/mpam_devices.c  | 194 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  49 ++++++++
>   7 files changed, 266 insertions(+)
>   create mode 100644 drivers/resctrl/Kconfig
>   create mode 100644 drivers/resctrl/Makefile
>   create mode 100644 drivers/resctrl/mpam_devices.c
>   create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c5e66d5d72cd..004d58cfbff8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>   
>   config ARM64_MPAM
>   	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
>   	select ACPI_MPAM if ACPI
>   	help
>   	  Memory System Resource Partitioning and Monitoring (MPAM) is an
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>   
>   source "drivers/cdx/Kconfig"
>   
> +source "drivers/resctrl/Kconfig"
> +
>   endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 8e1ffa4358d5..20eb17596b89 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,6 +194,7 @@ obj-$(CONFIG_HTE)		+= hte/
>   obj-$(CONFIG_DRM_ACCEL)		+= accel/
>   obj-$(CONFIG_CDX_BUS)		+= cdx/
>   obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
>   
>   obj-$(CONFIG_DIBS)		+= dibs/
>   obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..ef2f3adf64a9
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,15 @@
> +menuconfig ARM64_MPAM_DRIVER
> +	bool "MPAM driver"
> +	depends on ARM64 && ARM64_MPAM && EXPERT
> +	help
> +	  Memory System Resource Partitioning and Monitoring (MPAM) driver for
> +	  System IP, e,g. caches and memory controllers.
> +
> +if ARM64_MPAM_DRIVER
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver"
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> +
> +endif

I am asking myself why "depends on ARM64_MPAM_DRIVER" can't be used here? :-)

> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..898199dcf80d
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+= mpam_devices.o
> +
> +ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..6c6be133d73a
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,194 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */

s/when writing/for writing
s/are read-only/become read-only

> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +struct srcu_struct mpam_srcu;
> +
> +/*
> + * Number of MSCs that have been probed. Once all MSC have been probed MPAM
> + * can be enabled.
> + */

s/all MSC/all MSCs  (?)

> +static atomic_t mpam_num_msc;
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */

s/An MSC/A MSC (?)
s/from a/from the
s/isn't the case/is the case (?)

> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	u32 affinity_id;
> +	int err;
> +
> +	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +				       &affinity_id);
> +	if (err)
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +	else
> +		acpi_pptt_get_cpus_from_container(affinity_id,
> +						  &msc->accessibility);
> +	return 0;
> +}
> +

{} is needed for the block spanning multiple lines.

I would validate msc->accessibility here instead of its caller
(do_mpam_msc_drv_probe()).

		if (cpumask_empty(&msc->accessibility))
			return {-EINVAL, -ENOENT};

> +static int fw_num_msc;
> +
> +static void mpam_msc_destroy(struct mpam_msc *msc)
> +{
> +	struct platform_device *pdev = msc->pdev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&msc->all_msc_list);
> +	platform_set_drvdata(pdev, NULL);
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;

'msc' is unlikely to be NULL here, so the check could be droped.

> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_msc_destroy(msc);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	synchronize_srcu(&mpam_srcu);
> +}
> +
> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	u32 tmp;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +	if (!msc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	err = devm_mutex_init(dev, &msc->probe_lock);
> +	if (err)
> +		return ERR_PTR(err);
> +	err = devm_mutex_init(dev, &msc->part_sel_lock);
> +	if (err)
> +		return ERR_PTR(err);
> +	msc->id = pdev->id;
> +	msc->pdev = pdev;
> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +	INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +	err = update_msc_accessibility(msc);
> +	if (err)
> +		return ERR_PTR(err);
> +	if (cpumask_empty(&msc->accessibility)) {
> +		dev_err_once(dev, "MSC is not accessible from any CPU!");
> +		return ERR_PTR(-EINVAL);
> +	}
> +

As suggested above, this check would be done inside update_msc_accessibility().

> +	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
> +		msc->iface = MPAM_IFACE_MMIO;
> +	else
> +		msc->iface = MPAM_IFACE_PCC;
> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		void __iomem *io;
> +
> +		io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +							    &msc_res);
> +		if (IS_ERR(io)) {
> +			dev_err_once(dev, "Failed to map MSC base address\n");
> +			return ERR_CAST(io);
> +		}
> +		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +		msc->mapped_hwpage = io;
> +	} else {
> +		return ERR_PTR(-ENOENT);

Would be:
		return ERR_PTR(-EINVAL);

> +	}
> +
> +	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +	platform_set_drvdata(pdev, msc);
> +
> +	return msc;
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc = NULL;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	msc = do_mpam_msc_drv_probe(pdev);
> +	mutex_unlock(&mpam_list_lock);
> +	if (!IS_ERR(msc)) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +		if (err)
> +			mpam_msc_drv_remove(pdev);
> +	} else {
> +		err = PTR_ERR(msc);
> +	}
> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> +		pr_info("Discovered all MSC\n");

s/all MSC/all MSCs

> +
> +	return err;
> +}
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	fw_num_msc = acpi_mpam_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..540066903eca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +struct platform_device;
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head	all_msc_list;
> +
> +	int			id;
> +	struct platform_device	*pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only taken during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs;
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +#endif /* MPAM_INTERNAL_H */

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-08  9:28   ` Gavin Shan
@ 2025-11-10 16:44     ` Jonathan Cameron
  2025-11-12 15:32     ` Ben Horgan
  1 sibling, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:44 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Ben Horgan, james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, guohanjun, jeremy.linton, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On Sat, 8 Nov 2025 19:28:43 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> > only be accessible from those CPUs, and they may not be online.
> > Touching the hardware early is pointless as MPAM can't be used until
> > the system-wide common values for num_partid and num_pmg have been
> > discovered.
> >   
> 
> I'm not sure if below commit log is more clearer as I'm not a English
> native speaker:

I'm normally very flexible on English usage as long as meaning is
conveyed but I'm not keen on changing authors Engish in a fashion that
doesn't necessarily improve it. So a few thoughts from me as a native
speaker. Note don't mind changing text for clarity if the English usage
is too obscure.

> 
> MPAM probing is convoluted. MSCs that are integrated to a set of CPUs
> may only be accessible from those CPUs, ...

To me both are equally valid.

Probing MPAM -> implies MPAM is a thing which is probed (MPAM as noun)
MPAM probing -> implies that MPAM is a special form of probing (MPAM as kind
of adjective giving the flavor of probing that is happening)

So slight preference from me for current text.

> 
> > Start with driver probe/remove and mapping the MSC.
> > 
> > CC: Carl Worth <carl@os.amperecomputing.com>
> > Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> > Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> > Tested-by: Peter Newman <peternewman@google.com>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Signed-off-by: Ben Horgan <ben.horgan@arm.com>

> > diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> > new file mode 100644
> > index 000000000000..ef2f3adf64a9
> > --- /dev/null
> > +++ b/drivers/resctrl/Kconfig
> > @@ -0,0 +1,15 @@
> > +menuconfig ARM64_MPAM_DRIVER
> > +	bool "MPAM driver"
> > +	depends on ARM64 && ARM64_MPAM && EXPERT
> > +	help
> > +	  Memory System Resource Partitioning and Monitoring (MPAM) driver for
> > +	  System IP, e,g. caches and memory controllers.
> > +
> > +if ARM64_MPAM_DRIVER
> > +
> > +config ARM64_MPAM_DRIVER_DEBUG
> > +	bool "Enable debug messages from the MPAM driver"
> > +	help
> > +	  Say yes here to enable debug messages from the MPAM driver.
> > +
> > +endif  
> 
> I am asking myself why "depends on ARM64_MPAM_DRIVER" can't be used here? :-)

Fairly sure that came up in an earlier review. IIRC other stuff going to be added
later in this if/endif block.

> 
> > diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> > new file mode 100644
> > index 000000000000..898199dcf80d
> > --- /dev/null
> > +++ b/drivers/resctrl/Makefile
> > @@ -0,0 +1,4 @@
> > +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> > +mpam-y						+= mpam_devices.o
> > +
> > +ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> > diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> > new file mode 100644
> > index 000000000000..6c6be133d73a
> > --- /dev/null
> > +++ b/drivers/resctrl/mpam_devices.c
> > @@ -0,0 +1,194 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2025 Arm Ltd.
> > +
> > +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> > +
> > +#include <linux/acpi.h>
> > +#include <linux/arm_mpam.h>
> > +#include <linux/cacheinfo.h>
> > +#include <linux/cpumask.h>
> > +#include <linux/device.h>
> > +#include <linux/errno.h>
> > +#include <linux/gfp.h>
> > +#include <linux/list.h>
> > +#include <linux/lockdep.h>
> > +#include <linux/mutex.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/printk.h>
> > +#include <linux/srcu.h>
> > +#include <linux/types.h>
> > +
> > +#include "mpam_internal.h"
> > +
> > +/*
> > + * mpam_list_lock protects the SRCU lists when writing. Once the
> > + * mpam_enabled key is enabled these lists are read-only,
> > + * unless the error interrupt disables the driver.
> > + */  
> 
> s/when writing/for writing
To me not a clear improvement.  The when writing is a bit passive in that
it is talking about something that 'is the case' rather than something 'we
make' the case. 

> s/are read-only/become read-only
Nope to this one.  Become implies it happens at a later stage, are
implies that it happens before mpam_enabled_key is enabled which is more
correct (subtle though!)

> 
> > +static DEFINE_MUTEX(mpam_list_lock);
> > +static LIST_HEAD(mpam_all_msc);
> > +
> > +struct srcu_struct mpam_srcu;
> > +
> > +/*
> > + * Number of MSCs that have been probed. Once all MSC have been probed MPAM
> > + * can be enabled.
> > + */  
> 
> s/all MSC/all MSCs  (?)
yes.
> 
> > +static atomic_t mpam_num_msc;
> > +
> > +/*
> > + * An MSC can control traffic from a set of CPUs, but may only be accessible
> > + * from a (hopefully wider) set of CPUs. The common reason for this is power
> > + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> > + * corresponding cache may also be powered off. By making accesses from
> > + * one of those CPUs, we ensure this isn't the case.
> > + */  
> 
> s/An MSC/A MSC (?)

No on this one.  I would pronounce MSC and Em-ess-sea
so rules are you therefore use An.  This is a truely weird quirk of
English!


> s/from a/from the
Another subtle case.  "the" implies that we all know exactly which wider
set of CPUs.  "a" implies that there is at least one that can use. So
in my mind, a is slightly more correct, but others may disagree.

> s/isn't the case/is the case (?)

I think original is correct.  The thing that isn't the case is the
".. may also be powered off." and that's what we want to avoid.
I'm not 100% sure on intention of that comment though - so good
one to query!

Thanks,

Jonathan


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-08  9:28   ` Gavin Shan
  2025-11-10 16:44     ` Jonathan Cameron
@ 2025-11-12 15:32     ` Ben Horgan
  1 sibling, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 15:32 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

On 11/8/25 09:28, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
> 
> I'm not sure if below commit log is more clearer as I'm not a English
> native speaker:

Thanks for the detailed review of the messages and comments. I've
skipped the ones that I think don't improve the clarity. (I see Jonathan
has a detailed reply which matches my understanding of English.)

> 
> MPAM probing is convoluted. MSCs that are integrated to a set of CPUs
> may only be accessible from those CPUs, ...
> 
>> Start with driver probe/remove and mapping the MSC.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>>  From Jonathan:
>> Include cleanup
>> Use devm_mutex_init()
>> Add an ERR_CAST()
>> Fenghua:
>> Return zero from update_msc_accessibility()
>> Additional:
>> Fail probe if MSC doesn't have an MMIO interface
>> ---
>>   arch/arm64/Kconfig              |   1 +
>>   drivers/Kconfig                 |   2 +
>>   drivers/Makefile                |   1 +
>>   drivers/resctrl/Kconfig         |  15 +++
>>   drivers/resctrl/Makefile        |   4 +
>>   drivers/resctrl/mpam_devices.c  | 194 ++++++++++++++++++++++++++++++++
>>   drivers/resctrl/mpam_internal.h |  49 ++++++++
>>   7 files changed, 266 insertions(+)
>>   create mode 100644 drivers/resctrl/Kconfig
>>   create mode 100644 drivers/resctrl/Makefile
>>   create mode 100644 drivers/resctrl/mpam_devices.c
>>   create mode 100644 drivers/resctrl/mpam_internal.h
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index c5e66d5d72cd..004d58cfbff8 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
>>     config ARM64_MPAM
>>       bool "Enable support for MPAM"
>> +    select ARM64_MPAM_DRIVER if EXPERT    # does nothing yet
>>       select ACPI_MPAM if ACPI
>>       help
>>         Memory System Resource Partitioning and Monitoring (MPAM) is an
>> diff --git a/drivers/Kconfig b/drivers/Kconfig
>> index 4915a63866b0..3054b50a2f4c 100644
>> --- a/drivers/Kconfig
>> +++ b/drivers/Kconfig
>> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>>     source "drivers/cdx/Kconfig"
>>   +source "drivers/resctrl/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/Makefile b/drivers/Makefile
>> index 8e1ffa4358d5..20eb17596b89 100644
>> --- a/drivers/Makefile
>> +++ b/drivers/Makefile
>> @@ -194,6 +194,7 @@ obj-$(CONFIG_HTE)        += hte/
>>   obj-$(CONFIG_DRM_ACCEL)        += accel/
>>   obj-$(CONFIG_CDX_BUS)        += cdx/
>>   obj-$(CONFIG_DPLL)        += dpll/
>> +obj-y                += resctrl/
>>     obj-$(CONFIG_DIBS)        += dibs/
>>   obj-$(CONFIG_S390)        += s390/
>> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
>> new file mode 100644
>> index 000000000000..ef2f3adf64a9
>> --- /dev/null
>> +++ b/drivers/resctrl/Kconfig
>> @@ -0,0 +1,15 @@
>> +menuconfig ARM64_MPAM_DRIVER
>> +    bool "MPAM driver"
>> +    depends on ARM64 && ARM64_MPAM && EXPERT
>> +    help
>> +      Memory System Resource Partitioning and Monitoring (MPAM)
>> driver for
>> +      System IP, e,g. caches and memory controllers.
>> +
>> +if ARM64_MPAM_DRIVER
>> +
>> +config ARM64_MPAM_DRIVER_DEBUG
>> +    bool "Enable debug messages from the MPAM driver"
>> +    help
>> +      Say yes here to enable debug messages from the MPAM driver.
>> +
>> +endif
> 
> I am asking myself why "depends on ARM64_MPAM_DRIVER" can't be used
> here? :-)
> 
>> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
>> new file mode 100644
>> index 000000000000..898199dcf80d
>> --- /dev/null
>> +++ b/drivers/resctrl/Makefile
>> @@ -0,0 +1,4 @@
>> +obj-$(CONFIG_ARM64_MPAM_DRIVER)            += mpam.o
>> +mpam-y                        += mpam_devices.o
>> +
>> +ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)    += -DDEBUG
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> new file mode 100644
>> index 000000000000..6c6be133d73a
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -0,0 +1,194 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cacheinfo.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/device.h>
>> +#include <linux/errno.h>
>> +#include <linux/gfp.h>
>> +#include <linux/list.h>
>> +#include <linux/lockdep.h>
>> +#include <linux/mutex.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/printk.h>
>> +#include <linux/srcu.h>
>> +#include <linux/types.h>
>> +
>> +#include "mpam_internal.h"
>> +
>> +/*
>> + * mpam_list_lock protects the SRCU lists when writing. Once the
>> + * mpam_enabled key is enabled these lists are read-only,
>> + * unless the error interrupt disables the driver.
>> + */
> 
> s/when writing/for writing
> s/are read-only/become read-only
> 
>> +static DEFINE_MUTEX(mpam_list_lock);
>> +static LIST_HEAD(mpam_all_msc);
>> +
>> +struct srcu_struct mpam_srcu;
>> +
>> +/*
>> + * Number of MSCs that have been probed. Once all MSC have been
>> probed MPAM
>> + * can be enabled.
>> + */
> 
> s/all MSC/all MSCs  (?)
Changed.
> 
>> +static atomic_t mpam_num_msc;
>> +
>> +/*
>> + * An MSC can control traffic from a set of CPUs, but may only be
>> accessible
>> + * from a (hopefully wider) set of CPUs. The common reason for this
>> is power
>> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
>> + * corresponding cache may also be powered off. By making accesses from
>> + * one of those CPUs, we ensure this isn't the case.
>> + */
> 
> s/An MSC/A MSC (?)
> s/from a/from the
> s/isn't the case/is the case (?)
Updated this last one to be:

By making accesses from one of those CPUs, we ensure we don't access a
cache that's powered off.

> 
>> +static int update_msc_accessibility(struct mpam_msc *msc)
>> +{
>> +    u32 affinity_id;
>> +    int err;
>> +
>> +    err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
>> +                       &affinity_id);
>> +    if (err)
>> +        cpumask_copy(&msc->accessibility, cpu_possible_mask);
>> +    else
>> +        acpi_pptt_get_cpus_from_container(affinity_id,
>> +                          &msc->accessibility);
>> +    return 0;
>> +}
>> +
> 
> {} is needed for the block spanning multiple lines.

Made it one line.

> 
> I would validate msc->accessibility here instead of its caller
> (do_mpam_msc_drv_probe()).
> 
>         if (cpumask_empty(&msc->accessibility))
>             return {-EINVAL, -ENOENT};
> 
>> +static int fw_num_msc;
>> +
>> +static void mpam_msc_destroy(struct mpam_msc *msc)
>> +{
>> +    struct platform_device *pdev = msc->pdev;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_del_rcu(&msc->all_msc_list);
>> +    platform_set_drvdata(pdev, NULL);
>> +}
>> +
>> +static void mpam_msc_drv_remove(struct platform_device *pdev)
>> +{
>> +    struct mpam_msc *msc = platform_get_drvdata(pdev);
>> +
>> +    if (!msc)
>> +        return;
> 
> 'msc' is unlikely to be NULL here, so the check could be droped.

Dropped.

> 
>> +
>> +    mutex_lock(&mpam_list_lock);
>> +    mpam_msc_destroy(msc);
>> +    mutex_unlock(&mpam_list_lock);
>> +
>> +    synchronize_srcu(&mpam_srcu);
>> +}
>> +
>> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device
>> *pdev)
>> +{
>> +    int err;
>> +    u32 tmp;
>> +    struct mpam_msc *msc;
>> +    struct resource *msc_res;
>> +    struct device *dev = &pdev->dev;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +    if (!msc)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    err = devm_mutex_init(dev, &msc->probe_lock);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +    err = devm_mutex_init(dev, &msc->part_sel_lock);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +    msc->id = pdev->id;
>> +    msc->pdev = pdev;
>> +    INIT_LIST_HEAD_RCU(&msc->all_msc_list);
>> +    INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +    err = update_msc_accessibility(msc);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +    if (cpumask_empty(&msc->accessibility)) {
>> +        dev_err_once(dev, "MSC is not accessible from any CPU!");
>> +        return ERR_PTR(-EINVAL);
>> +    }
>> +
> 
> As suggested above, this check would be done inside
> update_msc_accessibility().

Unless you object I'll keep this as is and make void
update_msc_accessibility() a void function. I think this works better
with the naming.

> 
>> +    if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
>> +        msc->iface = MPAM_IFACE_MMIO;
>> +    else
>> +        msc->iface = MPAM_IFACE_PCC;
>> +
>> +    if (msc->iface == MPAM_IFACE_MMIO) {
>> +        void __iomem *io;
>> +
>> +        io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +                                &msc_res);
>> +        if (IS_ERR(io)) {
>> +            dev_err_once(dev, "Failed to map MSC base address\n");
>> +            return ERR_CAST(io);
>> +        }
>> +        msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +        msc->mapped_hwpage = io;
>> +    } else {
>> +        return ERR_PTR(-ENOENT);
> 
> Would be:
>         return ERR_PTR(-EINVAL);

Sure.

> 
>> +    }
>> +
>> +    list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
>> +    platform_set_drvdata(pdev, msc);
>> +
>> +    return msc;
>> +}
>> +
>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +    int err;
>> +    struct mpam_msc *msc = NULL;
>> +    void *plat_data = pdev->dev.platform_data;
>> +
>> +    mutex_lock(&mpam_list_lock);
>> +    msc = do_mpam_msc_drv_probe(pdev);
>> +    mutex_unlock(&mpam_list_lock);
>> +    if (!IS_ERR(msc)) {
>> +        /* Create RIS entries described by firmware */
>> +        err = acpi_mpam_parse_resources(msc, plat_data);
>> +        if (err)
>> +            mpam_msc_drv_remove(pdev);
>> +    } else {
>> +        err = PTR_ERR(msc);
>> +    }
>> +
>> +    if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
>> +        pr_info("Discovered all MSC\n");
> 
> s/all MSC/all MSCs
> 
>> +
>> +    return err;
>> +}
>> +
>> +static struct platform_driver mpam_msc_driver = {
>> +    .driver = {
>> +        .name = "mpam_msc",
>> +    },
>> +    .probe = mpam_msc_drv_probe,
>> +    .remove = mpam_msc_drv_remove,
>> +};
>> +
>> +static int __init mpam_msc_driver_init(void)
>> +{
>> +    if (!system_supports_mpam())
>> +        return -EOPNOTSUPP;
>> +
>> +    init_srcu_struct(&mpam_srcu);
>> +
>> +    fw_num_msc = acpi_mpam_count_msc();
>> +
>> +    if (fw_num_msc <= 0) {
>> +        pr_err("No MSC devices found in firmware\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    return platform_driver_register(&mpam_msc_driver);
>> +}
>> +subsys_initcall(mpam_msc_driver_init);
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/
>> mpam_internal.h
>> new file mode 100644
>> index 000000000000..540066903eca
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -0,0 +1,49 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +#ifndef MPAM_INTERNAL_H
>> +#define MPAM_INTERNAL_H
>> +
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/io.h>
>> +#include <linux/mutex.h>
>> +#include <linux/types.h>
>> +
>> +struct platform_device;
>> +
>> +struct mpam_msc {
>> +    /* member of mpam_all_msc */
>> +    struct list_head    all_msc_list;
>> +
>> +    int            id;
>> +    struct platform_device    *pdev;
>> +
>> +    /* Not modified after mpam_is_enabled() becomes true */
>> +    enum mpam_msc_iface    iface;
>> +    u32            nrdy_usec;
>> +    cpumask_t        accessibility;
>> +
>> +    /*
>> +     * probe_lock is only taken during discovery. After discovery these
>> +     * properties become read-only and the lists are protected by SRCU.
>> +     */
>> +    struct mutex        probe_lock;
>> +    unsigned long        ris_idxs;
>> +    u32            ris_max;
>> +
>> +    /* mpam_msc_ris of this component */
>> +    struct list_head    ris;
>> +
>> +    /*
>> +     * part_sel_lock protects access to the MSC hardware registers
>> that are
>> +     * affected by MPAMCFG_PART_SEL. (including the ID registers that
>> vary
>> +     * by RIS).
>> +     * If needed, take msc->probe_lock first.
>> +     */
>> +    struct mutex        part_sel_lock;
>> +
>> +    void __iomem        *mapped_hwpage;
>> +    size_t            mapped_hwpage_sz;
>> +};
>> +#endif /* MPAM_INTERNAL_H */
> 
> Thanks,
> Gavin
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
  2025-11-08  9:28   ` Gavin Shan
@ 2025-11-10 16:58   ` Jonathan Cameron
  2025-11-12 16:05     ` Ben Horgan
  2025-11-12  7:22   ` Shaopeng Tan (Fujitsu)
  2025-11-13  2:46   ` Fenghua Yu
  3 siblings, 1 reply; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 16:58 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On Fri, 7 Nov 2025 12:34:27 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Hi Ben,

A few minor things from a fresh read.
Nothing to prevent a tag though.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..6c6be133d73a
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c


> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;

Agree with Gavin on this. If there is a reason this might be NULL
then a comment would avoid the question being raised again. If not
drop the check.

> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_msc_destroy(msc);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	synchronize_srcu(&mpam_srcu);

Trivial but perhaps a comment on why. I assume this is because the
devm_ cleanup isn't safe until after an RCU grace period?

> +}
> +
> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	u32 tmp;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +	if (!msc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	err = devm_mutex_init(dev, &msc->probe_lock);
> +	if (err)
> +		return ERR_PTR(err);

Trivial but I'd add a blank line here.

> +	err = devm_mutex_init(dev, &msc->part_sel_lock);
> +	if (err)
> +		return ERR_PTR(err);

Trivial but I'd add a blank line here.

> +	msc->id = pdev->id;
> +	msc->pdev = pdev;
> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +	INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +	err = update_msc_accessibility(msc);
> +	if (err)
> +		return ERR_PTR(err);
> +	if (cpumask_empty(&msc->accessibility)) {
> +		dev_err_once(dev, "MSC is not accessible from any CPU!");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
> +		msc->iface = MPAM_IFACE_MMIO;
> +	else
> +		msc->iface = MPAM_IFACE_PCC;
> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		void __iomem *io;
> +
> +		io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +							    &msc_res);
> +		if (IS_ERR(io)) {
> +			dev_err_once(dev, "Failed to map MSC base address\n");
> +			return ERR_CAST(io);
> +		}
> +		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +		msc->mapped_hwpage = io;
> +	} else {
> +		return ERR_PTR(-ENOENT);
> +	}
> +
> +	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +	platform_set_drvdata(pdev, msc);
> +
> +	return msc;
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc = NULL;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	msc = do_mpam_msc_drv_probe(pdev);
> +	mutex_unlock(&mpam_list_lock);
> +	if (!IS_ERR(msc)) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +		if (err)
> +			mpam_msc_drv_remove(pdev);
> +	} else {
> +		err = PTR_ERR(msc);
> +	}

Seems convoluted. Not obvious to me why you can't do early exits on err and
having simpler flow. Maybe something more messy happens in patches after this
series to justify the complex approach.

	if (IS_ERR(msc))
		return PTR_ERR(msc);

	/* Create RIS entries described by firmware */
	err = acpi_mpam_parse_resources(msc, plat_data);
	if (err) {
		mpam_msc_drv_remove(pdev);
		return err;
	}

	if (atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
		pr_info("Discovered all MSC\n");

	return 0;

> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> +		pr_info("Discovered all MSC\n");
> +
> +	return err;
> +}
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	fw_num_msc = acpi_mpam_count_msc();
> +

Trivial but I'd drop this blank line to keep the call closely
associated with the error check.

> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-10 16:58   ` Jonathan Cameron
@ 2025-11-12 16:05     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 16:05 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Jonathan,

On 11/10/25 16:58, Jonathan Cameron wrote:
> On Fri, 7 Nov 2025 12:34:27 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> From: James Morse <james.morse@arm.com>
>>
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> 
> Hi Ben,
> 
> A few minor things from a fresh read.
> Nothing to prevent a tag though.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks!

> 
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644
>> index 000000000000..6c6be133d73a
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
> 
> 
>> +
>> +static void mpam_msc_drv_remove(struct platform_device *pdev)
>> +{
>> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
>> +
>> +	if (!msc)
>> +		return;
> 
> Agree with Gavin on this. If there is a reason this might be NULL
> then a comment would avoid the question being raised again. If not
> drop the check.

Dropped.

> 
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	mpam_msc_destroy(msc);
>> +	mutex_unlock(&mpam_list_lock);
>> +
>> +	synchronize_srcu(&mpam_srcu);
> 
> Trivial but perhaps a comment on why. I assume this is because the
> devm_ cleanup isn't safe until after an RCU grace period?

This becomes clearer in the next patch where it is moved into
mpam_free_garbage() so I'll leave this bare.

> 
>> +}
>> +
>> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +	int err;
>> +	u32 tmp;
>> +	struct mpam_msc *msc;
>> +	struct resource *msc_res;
>> +	struct device *dev = &pdev->dev;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +	if (!msc)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	err = devm_mutex_init(dev, &msc->probe_lock);
>> +	if (err)
>> +		return ERR_PTR(err);
> 
> Trivial but I'd add a blank line here.

done

> 
>> +	err = devm_mutex_init(dev, &msc->part_sel_lock);
>> +	if (err)
>> +		return ERR_PTR(err);
> 
> Trivial but I'd add a blank line here.

done

> 
>> +	msc->id = pdev->id;
>> +	msc->pdev = pdev;
>> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
>> +	INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +	err = update_msc_accessibility(msc);
>> +	if (err)
>> +		return ERR_PTR(err);
>> +	if (cpumask_empty(&msc->accessibility)) {
>> +		dev_err_once(dev, "MSC is not accessible from any CPU!");
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +
>> +	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
>> +		msc->iface = MPAM_IFACE_MMIO;
>> +	else
>> +		msc->iface = MPAM_IFACE_PCC;
>> +
>> +	if (msc->iface == MPAM_IFACE_MMIO) {
>> +		void __iomem *io;
>> +
>> +		io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +							    &msc_res);
>> +		if (IS_ERR(io)) {
>> +			dev_err_once(dev, "Failed to map MSC base address\n");
>> +			return ERR_CAST(io);
>> +		}
>> +		msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +		msc->mapped_hwpage = io;
>> +	} else {
>> +		return ERR_PTR(-ENOENT);
>> +	}
>> +
>> +	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
>> +	platform_set_drvdata(pdev, msc);
>> +
>> +	return msc;
>> +}
>> +
>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +	int err;
>> +	struct mpam_msc *msc = NULL;
>> +	void *plat_data = pdev->dev.platform_data;
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	msc = do_mpam_msc_drv_probe(pdev);
>> +	mutex_unlock(&mpam_list_lock);
>> +	if (!IS_ERR(msc)) {
>> +		/* Create RIS entries described by firmware */
>> +		err = acpi_mpam_parse_resources(msc, plat_data);
>> +		if (err)
>> +			mpam_msc_drv_remove(pdev);
>> +	} else {
>> +		err = PTR_ERR(msc);
>> +	}
> 
> Seems convoluted. Not obvious to me why you can't do early exits on err and
> having simpler flow. Maybe something more messy happens in patches after this
> series to justify the complex approach.
> 
> 	if (IS_ERR(msc))
> 		return PTR_ERR(msc);
> 
> 	/* Create RIS entries described by firmware */
> 	err = acpi_mpam_parse_resources(msc, plat_data);
> 	if (err) {
> 		mpam_msc_drv_remove(pdev);
> 		return err;
> 	}
> 
> 	if (atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
> 		pr_info("Discovered all MSC\n");
> 
> 	return 0;

It's still like this at the end of the current mpam snapshot branch so
I'll simplify based on your suggestion.

> 
>> +
>> +	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
>> +		pr_info("Discovered all MSC\n");
>> +
>> +	return err;
>> +}
>> +
>> +static struct platform_driver mpam_msc_driver = {
>> +	.driver = {
>> +		.name = "mpam_msc",
>> +	},
>> +	.probe = mpam_msc_drv_probe,
>> +	.remove = mpam_msc_drv_remove,
>> +};
>> +
>> +static int __init mpam_msc_driver_init(void)
>> +{
>> +	if (!system_supports_mpam())
>> +		return -EOPNOTSUPP;
>> +
>> +	init_srcu_struct(&mpam_srcu);
>> +
>> +	fw_num_msc = acpi_mpam_count_msc();
>> +
> 
> Trivial but I'd drop this blank line to keep the call closely
> associated with the error check.

done

> 
>> +	if (fw_num_msc <= 0) {
>> +		pr_err("No MSC devices found in firmware\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	return platform_driver_register(&mpam_msc_driver);
>> +}
>> +subsys_initcall(mpam_msc_driver_init);
> 
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
  2025-11-08  9:28   ` Gavin Shan
  2025-11-10 16:58   ` Jonathan Cameron
@ 2025-11-12  7:22   ` Shaopeng Tan (Fujitsu)
  2025-11-12 15:37     ` Ben Horgan
  2025-11-13  2:46   ` Fenghua Yu
  3 siblings, 1 reply; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  7:22 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

Hello Ben,

> From: James Morse <james.morse@arm.com>
> 
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may only be
> accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until the
> system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> From Jonathan:
> Include cleanup
> Use devm_mutex_init()
> Add an ERR_CAST()
> Fenghua:
> Return zero from update_msc_accessibility()
> Additional:
> Fail probe if MSC doesn't have an MMIO interface
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/resctrl/Kconfig         |  15 +++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 194
> ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h |  49 ++++++++
>  7 files changed, 266 insertions(+)
>  create mode 100644 drivers/resctrl/Kconfig  create mode 100644
> drivers/resctrl/Makefile  create mode 100644 drivers/resctrl/mpam_devices.c
> create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> c5e66d5d72cd..004d58cfbff8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,6 +2025,7 @@ config ARM64_TLB_RANGE
> 
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
>  	select ACPI_MPAM if ACPI
>  	help
>  	  Memory System Resource Partitioning and Monitoring (MPAM) is an
> diff --git a/drivers/Kconfig b/drivers/Kconfig index
> 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
> 
>  source "drivers/cdx/Kconfig"
> 
> +source "drivers/resctrl/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile index
> 8e1ffa4358d5..20eb17596b89 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,6 +194,7 @@ obj-$(CONFIG_HTE)		+= hte/
>  obj-$(CONFIG_DRM_ACCEL)		+= accel/
>  obj-$(CONFIG_CDX_BUS)		+= cdx/
>  obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
> 
>  obj-$(CONFIG_DIBS)		+= dibs/
>  obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig new file mode
> 100644 index 000000000000..ef2f3adf64a9
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,15 @@
> +menuconfig ARM64_MPAM_DRIVER
> +	bool "MPAM driver"
> +	depends on ARM64 && ARM64_MPAM && EXPERT
> +	help
> +	  Memory System Resource Partitioning and Monitoring (MPAM)
> driver for
> +	  System IP, e,g. caches and memory controllers.
> +
> +if ARM64_MPAM_DRIVER
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver"
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> +
> +endif
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile new file mode
> 100644 index 000000000000..898199dcf80d
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+=
> mpam_devices.o
> +
> +ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644 index 000000000000..6c6be133d73a
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,194 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +struct srcu_struct mpam_srcu;
> +
> +/*
> + * Number of MSCs that have been probed. Once all MSC have been probed
> +MPAM
> + * can be enabled.
> + */
> +static atomic_t mpam_num_msc;
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be
> +accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is
> +power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND,
> +the
> + * corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc) {
> +	u32 affinity_id;
> +	int err;
> +
> +	err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +				       &affinity_id);
> +	if (err)
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +	else
> +		acpi_pptt_get_cpus_from_container(affinity_id,
> +						  &msc->accessibility);
> +	return 0;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_msc_destroy(struct mpam_msc *msc) {
> +	struct platform_device *pdev = msc->pdev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&msc->all_msc_list);
> +	platform_set_drvdata(pdev, NULL);
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev) {
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_msc_destroy(msc);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	synchronize_srcu(&mpam_srcu);
> +}
> +
> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device
> +*pdev) {
> +	int err;
> +	u32 tmp;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	struct device *dev = &pdev->dev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +	if (!msc)
> +		return ERR_PTR(-ENOMEM);
> +
> +	err = devm_mutex_init(dev, &msc->probe_lock);
> +	if (err)
> +		return ERR_PTR(err);
> +	err = devm_mutex_init(dev, &msc->part_sel_lock);
> +	if (err)
> +		return ERR_PTR(err);
> +	msc->id = pdev->id;
> +	msc->pdev = pdev;
> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
> +	INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +	err = update_msc_accessibility(msc);
> +	if (err)
> +		return ERR_PTR(err);

Since the return value of update_msc_accessibility(msc) is always 0,
this check is unnecessary. 

Best regards,
Shaopeng TAN

> +	if (cpumask_empty(&msc->accessibility)) {
> +		dev_err_once(dev, "MSC is not accessible from any CPU!");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
> +		msc->iface = MPAM_IFACE_MMIO;
> +	else
> +		msc->iface = MPAM_IFACE_PCC;
> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		void __iomem *io;
> +
> +		io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +							    &msc_res);
> +		if (IS_ERR(io)) {
> +			dev_err_once(dev, "Failed to map MSC base
> address\n");
> +			return ERR_CAST(io);
> +		}
> +		msc->mapped_hwpage_sz = msc_res->end -
> msc_res->start;
> +		msc->mapped_hwpage = io;
> +	} else {
> +		return ERR_PTR(-ENOENT);
> +	}
> +
> +	list_add_rcu(&msc->all_msc_list, &mpam_all_msc);
> +	platform_set_drvdata(pdev, msc);
> +
> +	return msc;
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev) {
> +	int err;
> +	struct mpam_msc *msc = NULL;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	msc = do_mpam_msc_drv_probe(pdev);
> +	mutex_unlock(&mpam_list_lock);
> +	if (!IS_ERR(msc)) {
> +		/* Create RIS entries described by firmware */
> +		err = acpi_mpam_parse_resources(msc, plat_data);
> +		if (err)
> +			mpam_msc_drv_remove(pdev);
> +	} else {
> +		err = PTR_ERR(msc);
> +	}
> +
> +	if (!err && atomic_add_return(1, &mpam_num_msc) ==
> fw_num_msc)
> +		pr_info("Discovered all MSC\n");
> +
> +	return err;
> +}
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +static int __init mpam_msc_driver_init(void) {
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	fw_num_msc = acpi_mpam_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h
> b/drivers/resctrl/mpam_internal.h new file mode 100644 index
> 000000000000..540066903eca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +struct platform_device;
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head	all_msc_list;
> +
> +	int			id;
> +	struct platform_device	*pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only taken during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs;
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that
> are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that
> vary
> +	 * by RIS).
> +	 * If needed, take msc->probe_lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +#endif /* MPAM_INTERNAL_H */
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-12  7:22   ` Shaopeng Tan (Fujitsu)
@ 2025-11-12 15:37     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 15:37 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

Hi Shaopeng,

On 11/12/25 07:22, Shaopeng Tan (Fujitsu) wrote:
> Hello Ben,
> 
>> From: James Morse <james.morse@arm.com>
>>
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may only be
>> accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until the
>> system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>>
>> CC: Carl Worth <carl@os.amperecomputing.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
[...]
>> +static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device
>> +*pdev) {
>> +	int err;
>> +	u32 tmp;
>> +	struct mpam_msc *msc;
>> +	struct resource *msc_res;
>> +	struct device *dev = &pdev->dev;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +	if (!msc)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	err = devm_mutex_init(dev, &msc->probe_lock);
>> +	if (err)
>> +		return ERR_PTR(err);
>> +	err = devm_mutex_init(dev, &msc->part_sel_lock);
>> +	if (err)
>> +		return ERR_PTR(err);
>> +	msc->id = pdev->id;
>> +	msc->pdev = pdev;
>> +	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
>> +	INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +	err = update_msc_accessibility(msc);
>> +	if (err)
>> +		return ERR_PTR(err);
> 
> Since the return value of update_msc_accessibility(msc) is always 0,
> this check is unnecessary.

Yes, I've changed update_msc_accessibility() to return void.
> 
> Best regards,
> Shaopeng TAN
> 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
                     ` (2 preceding siblings ...)
  2025-11-12  7:22   ` Shaopeng Tan (Fujitsu)
@ 2025-11-13  2:46   ` Fenghua Yu
  3 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13  2:46 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan



On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (9 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09  0:07   ` Gavin Shan
                     ` (3 more replies)
  2025-11-07 12:34 ` [PATCH 12/33] arm_mpam: Add MPAM MSC register layout definitions Ben Horgan
                   ` (28 subsequent siblings)
  39 siblings, 4 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.

To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)

Add support for creating and destroying structures to allow a hierarchy
of resources to be created.

CC: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Jonathan:
Code reordering.
Comments.
---
 drivers/resctrl/mpam_devices.c  | 393 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  94 ++++++++
 include/linux/arm_mpam.h        |   5 +
 3 files changed, 491 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 6c6be133d73a..48a344d5cb43 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -36,6 +36,384 @@ struct srcu_struct mpam_srcu;
  */
 static atomic_t mpam_num_msc;
 
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+static inline void init_garbage(struct mpam_garbage *garbage)
+{
+	init_llist_node(&garbage->llist);
+}
+
+#define add_to_garbage(x)				\
+do {							\
+	__typeof__(x) _x = (x);				\
+	_x->garbage.to_free = _x;			\
+	llist_add(&_x->garbage.llist, &mpam_garbage);	\
+} while (0)
+
+static void mpam_free_garbage(void)
+{
+	struct mpam_garbage *iter, *tmp;
+	struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+	if (!to_free)
+		return;
+
+	synchronize_srcu(&mpam_srcu);
+
+	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+		if (iter->pdev)
+			devm_kfree(&iter->pdev->dev, iter->to_free);
+		else
+			kfree(iter->to_free);
+	}
+}
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
+	if (!vmsc)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(&vmsc->garbage);
+
+	INIT_LIST_HEAD_RCU(&vmsc->ris);
+	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+	vmsc->comp = comp;
+	vmsc->msc = msc;
+
+	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+	return vmsc;
+}
+
+static void mpam_component_destroy(struct mpam_component *comp);
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+	struct mpam_component *comp = vmsc->comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&vmsc->comp_list);
+	add_to_garbage(vmsc);
+
+	if (list_empty(&comp->vmsc))
+		mpam_component_destroy(comp);
+}
+
+static struct mpam_vmsc *
+mpam_vmsc_find(struct mpam_component *comp, struct mpam_msc *msc)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (vmsc->msc->id == msc->id)
+			return vmsc;
+	}
+
+	return mpam_vmsc_alloc(comp, msc);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(&comp->garbage);
+
+	comp->comp_id = id;
+	INIT_LIST_HEAD_RCU(&comp->vmsc);
+	/* affinity is updated when ris are added */
+	INIT_LIST_HEAD_RCU(&comp->class_list);
+	comp->class = class;
+
+	list_add_rcu(&comp->class_list, &class->components);
+
+	return comp;
+}
+
+static void mpam_class_destroy(struct mpam_class *class);
+
+static void mpam_component_destroy(struct mpam_component *comp)
+{
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&comp->class_list);
+	add_to_garbage(comp);
+
+	if (list_empty(&class->components))
+		mpam_class_destroy(class);
+}
+
+static struct mpam_component *
+mpam_component_find(struct mpam_class *class, int id)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(comp, &class->components, class_list) {
+		if (comp->comp_id == id)
+			return comp;
+	}
+
+	return mpam_component_alloc(class, id);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	class = kzalloc(sizeof(*class), GFP_KERNEL);
+	if (!class)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(&class->garbage);
+
+	INIT_LIST_HEAD_RCU(&class->components);
+	/* affinity is updated when ris are added */
+	class->level = level_idx;
+	class->type = type;
+	INIT_LIST_HEAD_RCU(&class->classes_list);
+
+	list_add_rcu(&class->classes_list, &mpam_classes);
+
+	return class;
+}
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&class->classes_list);
+	add_to_garbage(class);
+}
+
+static struct mpam_class *
+mpam_class_find(u8 level_idx, enum mpam_class_types type)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		if (class->type == type && class->level == level_idx)
+			return class;
+	}
+
+	return mpam_class_alloc(level_idx, type);
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ * This helper walks the acpi tables to include offline CPUs too.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity)
+{
+	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (node_id == cpu_to_node(cpu))
+			cpumask_set_cpu(cpu, affinity);
+	}
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+				 enum mpam_class_types type,
+				 struct mpam_class *class,
+				 struct mpam_component *comp)
+{
+	int err;
+
+	switch (type) {
+	case MPAM_CLASS_CACHE:
+		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+						     affinity);
+		if (err)
+			return err;
+
+		if (cpumask_empty(affinity))
+			dev_warn_once(&msc->pdev->dev,
+				      "no CPUs associated with cache node\n");
+
+		break;
+	case MPAM_CLASS_MEMORY:
+		get_cpumask_from_node_id(comp->comp_id, affinity);
+		/* affinity may be empty for CPU-less memory nodes */
+		break;
+	case MPAM_CLASS_UNKNOWN:
+		return 0;
+	}
+
+	cpumask_and(affinity, affinity, &msc->accessibility);
+
+	return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	int err;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+	struct platform_device *pdev = msc->pdev;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
+		return -EINVAL;
+
+	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
+		return -EBUSY;
+
+	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
+	if (!ris)
+		return -ENOMEM;
+	init_garbage(&ris->garbage);
+	ris->garbage.pdev = pdev;
+
+	class = mpam_class_find(class_id, type);
+	if (IS_ERR(class))
+		return PTR_ERR(class);
+
+	comp = mpam_component_find(class, component_id);
+	if (IS_ERR(comp)) {
+		if (list_empty(&class->components))
+			mpam_class_destroy(class);
+		return PTR_ERR(comp);
+	}
+
+	vmsc = mpam_vmsc_find(comp, msc);
+	if (IS_ERR(vmsc)) {
+		if (list_empty(&comp->vmsc))
+			mpam_component_destroy(comp);
+		return PTR_ERR(vmsc);
+	}
+
+	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+	if (err) {
+		if (list_empty(&vmsc->ris))
+			mpam_vmsc_destroy(vmsc);
+		return err;
+	}
+
+	ris->ris_idx = ris_idx;
+	INIT_LIST_HEAD_RCU(&ris->msc_list);
+	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+	ris->vmsc = vmsc;
+
+	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+	list_add_rcu(&ris->msc_list, &msc->ris);
+
+	return 0;
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+	struct mpam_vmsc *vmsc = ris->vmsc;
+	struct mpam_msc *msc = vmsc->msc;
+	struct mpam_component *comp = vmsc->comp;
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	/*
+	 * It is assumed affinities don't overlap. If they do the class becomes
+	 * unusable immediately.
+	 */
+	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+	clear_bit(ris->ris_idx, &msc->ris_idxs);
+	list_del_rcu(&ris->msc_list);
+	list_del_rcu(&ris->vmsc_list);
+	add_to_garbage(ris);
+
+	if (list_empty(&vmsc->ris))
+		mpam_vmsc_destroy(vmsc);
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id)
+{
+	int err;
+
+	mutex_lock(&mpam_list_lock);
+	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+				     component_id);
+	mutex_unlock(&mpam_list_lock);
+	if (err)
+		mpam_free_garbage();
+
+	return err;
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -60,14 +438,25 @@ static int update_msc_accessibility(struct mpam_msc *msc)
 
 static int fw_num_msc;
 
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
 static void mpam_msc_destroy(struct mpam_msc *msc)
 {
 	struct platform_device *pdev = msc->pdev;
+	struct mpam_msc_ris *ris, *tmp;
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+		mpam_ris_destroy(ris);
+
 	list_del_rcu(&msc->all_msc_list);
 	platform_set_drvdata(pdev, NULL);
+
+	add_to_garbage(msc);
 }
 
 static void mpam_msc_drv_remove(struct platform_device *pdev)
@@ -81,7 +470,7 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
 	mpam_msc_destroy(msc);
 	mutex_unlock(&mpam_list_lock);
 
-	synchronize_srcu(&mpam_srcu);
+	mpam_free_garbage();
 }
 
 static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
@@ -97,6 +486,8 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
 	if (!msc)
 		return ERR_PTR(-ENOMEM);
+	init_garbage(&msc->garbage);
+	msc->garbage.pdev = pdev;
 
 	err = devm_mutex_init(dev, &msc->probe_lock);
 	if (err)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 540066903eca..8f7a28d2c021 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,11 +7,30 @@
 #include <linux/arm_mpam.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/llist.h>
 #include <linux/mutex.h>
+#include <linux/srcu.h>
 #include <linux/types.h>
 
+#define MPAM_MSC_MAX_NUM_RIS	16
+
 struct platform_device;
 
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be garbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+	/* member of mpam_garbage */
+	struct llist_node	llist;
+
+	void			*to_free;
+	struct platform_device	*pdev;
+};
+
 struct mpam_msc {
 	/* member of mpam_all_msc */
 	struct list_head	all_msc_list;
@@ -45,5 +64,80 @@ struct mpam_msc {
 
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_class {
+	/* mpam_components in this class */
+	struct list_head	components;
+
+	cpumask_t		affinity;
+
+	u8			level;
+	enum mpam_class_types	type;
+
+	/* member of mpam_classes */
+	struct list_head	classes_list;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_component {
+	u32			comp_id;
+
+	/* mpam_vmsc in this component */
+	struct list_head	vmsc;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_class:components */
+	struct list_head	class_list;
+
+	/* parent: */
+	struct mpam_class	*class;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_vmsc {
+	/* member of mpam_component:vmsc_list */
+	struct list_head	comp_list;
+
+	/* mpam_msc_ris in this vmsc */
+	struct list_head	ris;
+
+	/* All RIS in this vMSC are members of this MSC */
+	struct mpam_msc		*msc;
+
+	/* parent: */
+	struct mpam_component	*comp;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_msc_ris {
+	u8			ris_idx;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_vmsc:ris */
+	struct list_head	vmsc_list;
+
+	/* member of mpam_msc:ris */
+	struct list_head	msc_list;
+
+	/* parent: */
+	struct mpam_vmsc	*vmsc;
+
+	struct mpam_garbage	garbage;
 };
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity);
+
 #endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index a3828ef91aee..5a3aab6bb1d4 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -37,11 +37,16 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
 static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 #endif
 
+#ifdef CONFIG_ARM64_MPAM_DRIVER
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id);
+#else
 static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 				  enum mpam_class_types type, u8 class_id,
 				  int component_id)
 {
 	return -EINVAL;
 }
+#endif
 
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
@ 2025-11-09  0:07   ` Gavin Shan
  2025-11-12 16:39     ` Ben Horgan
  2025-11-10 17:10   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-09  0:07 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> Add support for creating and destroying structures to allow a hierarchy
> of resources to be created.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Jonathan:
> Code reordering.
> Comments.
> ---
>   drivers/resctrl/mpam_devices.c  | 393 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  94 ++++++++
>   include/linux/arm_mpam.h        |   5 +
>   3 files changed, 491 insertions(+), 1 deletion(-)
> 

Some minor comments below and some of them may be invalid. Nothing really
looks incorrect to me:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 6c6be133d73a..48a344d5cb43 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -36,6 +36,384 @@ struct srcu_struct mpam_srcu;
>    */
>   static atomic_t mpam_num_msc;
>   
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.

s/a virtual container for RIS/a virtual container for RISes
s/all RIS/all RISes

An empty line may be needed here as paragraph separator.

> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *

s/the RIS/the RISes

> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +static inline void init_garbage(struct mpam_garbage *garbage)
> +{
> +	init_llist_node(&garbage->llist);
> +}
> +
> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = (x);				\
> +	_x->garbage.to_free = _x;			\
> +	llist_add(&_x->garbage.llist, &mpam_garbage);	\
> +} while (0)
> +
> +static void mpam_free_garbage(void)
> +{
> +	struct mpam_garbage *iter, *tmp;
> +	struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +
> +	if (!to_free)
> +		return;
> +
> +	synchronize_srcu(&mpam_srcu);
> +
> +	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> +		if (iter->pdev)
> +			devm_kfree(&iter->pdev->dev, iter->to_free);
> +		else
> +			kfree(iter->to_free);
> +	}
> +}
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
> +	if (!vmsc)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(&vmsc->garbage);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
> +
> +static void mpam_component_destroy(struct mpam_component *comp);
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_component_destroy(comp);
> +}
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_find(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (vmsc->msc->id == msc->id)
> +			return vmsc;
> +	}
> +
> +	return mpam_vmsc_alloc(comp, msc);
> +}
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(&comp->garbage);
> +
> +	comp->comp_id = id;
> +	INIT_LIST_HEAD_RCU(&comp->vmsc);
> +	/* affinity is updated when ris are added */

	/* Affinity is updated when RIS is added */

> +	INIT_LIST_HEAD_RCU(&comp->class_list);
> +	comp->class = class;
> +
> +	list_add_rcu(&comp->class_list, &class->components);
> +
> +	return comp;
> +}
> +
> +static void mpam_class_destroy(struct mpam_class *class);
> +
> +static void mpam_component_destroy(struct mpam_component *comp)
> +{
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&comp->class_list);
> +	add_to_garbage(comp);
> +
> +	if (list_empty(&class->components))
> +		mpam_class_destroy(class);
> +}
> +
> +static struct mpam_component *
> +mpam_component_find(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		if (comp->comp_id == id)
> +			return comp;
> +	}
> +
> +	return mpam_component_alloc(class, id);
> +}
> +
> +static struct mpam_class *
> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
> +{
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	class = kzalloc(sizeof(*class), GFP_KERNEL);
> +	if (!class)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(&class->garbage);
> +
> +	INIT_LIST_HEAD_RCU(&class->components);
> +	/* affinity is updated when ris are added */

	/* Affinity is updated when RIS is added */

> +	class->level = level_idx;
> +	class->type = type;
> +	INIT_LIST_HEAD_RCU(&class->classes_list);
> +
> +	list_add_rcu(&class->classes_list, &mpam_classes);
> +
> +	return class;
> +}
> +
> +static void mpam_class_destroy(struct mpam_class *class)
> +{
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&class->classes_list);
> +	add_to_garbage(class);
> +}
> +
> +static struct mpam_class *
> +mpam_class_find(u8 level_idx, enum mpam_class_types type)
> +{
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		if (class->type == type && class->level == level_idx)
> +			return class;
> +	}
> +
> +	return mpam_class_alloc(level_idx, type);
> +}
> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + * This helper walks the acpi tables to include offline CPUs too.
> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity)
> +{
> +	return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
> +}
> +

This function is only used in mpam_devices.c and won't be exposed in the
future, we can make it 'static' and 'inline'.

> +/*
> + * cpumask_of_node() only knows about online CPUs. This can't tell us whether
> + * a class is represented on all possible CPUs.
> + */
> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (node_id == cpu_to_node(cpu))
> +			cpumask_set_cpu(cpu, affinity);
> +	}
> +}
> +
> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
> +				 enum mpam_class_types type,
> +				 struct mpam_class *class,
> +				 struct mpam_component *comp)
> +{
> +	int err;
> +
> +	switch (type) {
> +	case MPAM_CLASS_CACHE:
> +		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
> +						     affinity);
> +		if (err)
> +			return err;

It's worthy to add a warning message here.

> +
> +		if (cpumask_empty(affinity))
> +			dev_warn_once(&msc->pdev->dev,
> +				      "no CPUs associated with cache node\n");

{} is needed here.

> +
> +		break;
> +	case MPAM_CLASS_MEMORY:
> +		get_cpumask_from_node_id(comp->comp_id, affinity);
> +		/* affinity may be empty for CPU-less memory nodes */
> +		break;
> +	case MPAM_CLASS_UNKNOWN:
> +		return 0;
> +	}
> +
> +	cpumask_and(affinity, affinity, &msc->accessibility);
> +
> +	return 0;
> +}
> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +	struct platform_device *pdev = msc->pdev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
> +		return -EINVAL;
> +
> +	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
> +		return -EBUSY;
> +
> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(&ris->garbage);
> +	ris->garbage.pdev = pdev;
> +
> +	class = mpam_class_find(class_id, type);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);
> +
> +	comp = mpam_component_find(class, component_id);
> +	if (IS_ERR(comp)) {
> +		if (list_empty(&class->components))
> +			mpam_class_destroy(class);
> +		return PTR_ERR(comp);
> +	}
> +
> +	vmsc = mpam_vmsc_find(comp, msc);
> +	if (IS_ERR(vmsc)) {
> +		if (list_empty(&comp->vmsc))
> +			mpam_component_destroy(comp);
> +		return PTR_ERR(vmsc);
> +	}
> +
> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> +	if (err) {
> +		if (list_empty(&vmsc->ris))
> +			mpam_vmsc_destroy(vmsc);
> +		return err;
> +	}
> +
> +	ris->ris_idx = ris_idx;
> +	INIT_LIST_HEAD_RCU(&ris->msc_list);
> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> +	ris->vmsc = vmsc;
> +
> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +	list_add_rcu(&ris->msc_list, &msc->ris);
> +
> +	return 0;
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	/*
> +	 * It is assumed affinities don't overlap. If they do the class becomes
> +	 * unusable immediately.
> +	 */
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, &msc->ris_idxs);
> +	list_del_rcu(&ris->msc_list);
> +	list_del_rcu(&ris->vmsc_list);
> +	add_to_garbage(ris);
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
> +
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id)
> +{
> +	int err;
> +
> +	mutex_lock(&mpam_list_lock);
> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> +				     component_id);
> +	mutex_unlock(&mpam_list_lock);
> +	if (err)
> +		mpam_free_garbage();
> +
> +	return err;
> +}
> +
>   /*
>    * An MSC can control traffic from a set of CPUs, but may only be accessible
>    * from a (hopefully wider) set of CPUs. The common reason for this is power
> @@ -60,14 +438,25 @@ static int update_msc_accessibility(struct mpam_msc *msc)
>   
>   static int fw_num_msc;
>   
> +/*
> + * There are two ways of reaching a struct mpam_msc_ris. Via the
> + * class->component->vmsc->ris, or via the msc.
> + * When destroying the msc, the other side needs unlinking and cleaning up too.
> + */
>   static void mpam_msc_destroy(struct mpam_msc *msc)
>   {
>   	struct platform_device *pdev = msc->pdev;
> +	struct mpam_msc_ris *ris, *tmp;
>   
>   	lockdep_assert_held(&mpam_list_lock);
>   
> +	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> +		mpam_ris_destroy(ris);
> +
>   	list_del_rcu(&msc->all_msc_list);
>   	platform_set_drvdata(pdev, NULL);
> +
> +	add_to_garbage(msc);
>   }
>   
>   static void mpam_msc_drv_remove(struct platform_device *pdev)
> @@ -81,7 +470,7 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>   	mpam_msc_destroy(msc);
>   	mutex_unlock(&mpam_list_lock);
>   
> -	synchronize_srcu(&mpam_srcu);
> +	mpam_free_garbage();
>   }
>   
>   static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
> @@ -97,6 +486,8 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
>   	msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>   	if (!msc)
>   		return ERR_PTR(-ENOMEM);
> +	init_garbage(&msc->garbage);
> +	msc->garbage.pdev = pdev;
>   
>   	err = devm_mutex_init(dev, &msc->probe_lock);
>   	if (err)
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 540066903eca..8f7a28d2c021 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -7,11 +7,30 @@
>   #include <linux/arm_mpam.h>
>   #include <linux/cpumask.h>
>   #include <linux/io.h>
> +#include <linux/llist.h>
>   #include <linux/mutex.h>
> +#include <linux/srcu.h>
>   #include <linux/types.h>
>   
> +#define MPAM_MSC_MAX_NUM_RIS	16
> +
>   struct platform_device;
>   
> +/*
> + * Structures protected by SRCU may not be freed for a surprising amount of
> + * time (especially if perf is running). To ensure the MPAM error interrupt can
> + * tear down all the structures, build a list of objects that can be garbage
> + * collected once synchronize_srcu() has returned.
> + * If pdev is non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> +	/* member of mpam_garbage */
> +	struct llist_node	llist;
> +
> +	void			*to_free;
> +	struct platform_device	*pdev;
> +};
> +
>   struct mpam_msc {
>   	/* member of mpam_all_msc */
>   	struct list_head	all_msc_list;
> @@ -45,5 +64,80 @@ struct mpam_msc {
>   
>   	void __iomem		*mapped_hwpage;
>   	size_t			mapped_hwpage_sz;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_class {
> +	/* mpam_components in this class */
> +	struct list_head	components;
> +
> +	cpumask_t		affinity;
> +
> +	u8			level;
> +	enum mpam_class_types	type;
> +
> +	/* member of mpam_classes */
> +	struct list_head	classes_list;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_component {
> +	u32			comp_id;
> +
> +	/* mpam_vmsc in this component */
> +	struct list_head	vmsc;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_class:components */
> +	struct list_head	class_list;
> +
> +	/* parent: */
> +	struct mpam_class	*class;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_vmsc {
> +	/* member of mpam_component:vmsc_list */
> +	struct list_head	comp_list;
> +
> +	/* mpam_msc_ris in this vmsc */
> +	struct list_head	ris;
> +
> +	/* All RIS in this vMSC are members of this MSC */
> +	struct mpam_msc		*msc;
> +
> +	/* parent: */
> +	struct mpam_component	*comp;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_msc_ris {
> +	u8			ris_idx;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_vmsc:ris */
> +	struct list_head	vmsc_list;
> +
> +	/* member of mpam_msc:ris */
> +	struct list_head	msc_list;
> +
> +	/* parent: */
> +	struct mpam_vmsc	*vmsc;
> +
> +	struct mpam_garbage	garbage;
>   };
> +
> +/* List of all classes - protected by srcu*/
> +extern struct srcu_struct mpam_srcu;
> +extern struct list_head mpam_classes;
> +
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity);
> +
>   #endif /* MPAM_INTERNAL_H */
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index a3828ef91aee..5a3aab6bb1d4 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -37,11 +37,16 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
>   static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>   #endif
>   
> +#ifdef CONFIG_ARM64_MPAM_DRIVER
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id);
> +#else
>   static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>   				  enum mpam_class_types type, u8 class_id,
>   				  int component_id)
>   {
>   	return -EINVAL;
>   }
> +#endif
>   
>   #endif /* __LINUX_ARM_MPAM_H */

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-09  0:07   ` Gavin Shan
@ 2025-11-12 16:39     ` Ben Horgan
  2025-11-12 16:48       ` Ben Horgan
  0 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 16:39 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

On 11/9/25 00:07, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the
>> system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> Add support for creating and destroying structures to allow a hierarchy
>> of resources to be created.
>>
>> CC: Ben Horgan <ben.horgan@arm.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> Jonathan:
>> Code reordering.
>> Comments.
>> ---
>>   drivers/resctrl/mpam_devices.c  | 393 +++++++++++++++++++++++++++++++-
>>   drivers/resctrl/mpam_internal.h |  94 ++++++++
>>   include/linux/arm_mpam.h        |   5 +
>>   3 files changed, 491 insertions(+), 1 deletion(-)
>>
> 
> Some minor comments below and some of them may be invalid. Nothing really
> looks incorrect to me:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> index 6c6be133d73a..48a344d5cb43 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -36,6 +36,384 @@ struct srcu_struct mpam_srcu;
>>    */
>>   static atomic_t mpam_num_msc;
>>   +/*
>> + * An MSC is a physical container for controls and monitors, each
>> identified by
>> + * their RIS index. These share a base-address, interrupts and some MMIO
>> + * registers. A vMSC is a virtual container for RIS in an MSC that
>> control or
>> + * monitor the same thing. Members of a vMSC are all RIS in the same
>> MSC, but
>> + * not all RIS in an MSC share a vMSC.
> 
> s/a virtual container for RIS/a virtual container for RISes
> s/all RIS/all RISes

The comments/message in the driver treat RIS is as an invariable noun;
that is the singular and plural are the same. This makes sense when read
out loud. I'll keep it as is unless you think making a singular/plural
disctinction makes it significantly easier to understand.

> 
> An empty line may be needed here as paragraph separator.

Done

> 
>> + * Components are a group of vMSC that control or monitor the same
>> thing but
>> + * are from different MSC, so have different base-address, interrupts
>> etc.
>> + * Classes are the set components of the same type.
>> + *
>> + * The features of a vMSC is the union of the RIS it contains.
>> + * The features of a Class and Component are the common subset of the
>> vMSC
>> + * they contain.
>> + *
> 
> s/the RIS/the RISes
> 
>> + * e.g. The system cache may have bandwidth controls on multiple
>> interfaces,
>> + * for regulating traffic from devices independently of traffic from
>> CPUs.
>> + * If these are two RIS in one MSC, they will be treated as controlling
>> + * different things, and will not share a vMSC/component/class.
>> + *
>> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls
>> another
>> + * for bandwidth. These two RIS are members of the same vMSC.
>> + *
>> + * e.g. The set of RIS that make up the L2 are grouped as a
>> component. These
>> + * are sometimes termed slices. They should be configured the same,
>> as if there
>> + * were only one.
>> + *
>> + * e.g. The SoC probably has more than one L2, each attached to a
>> distinct set
>> + * of CPUs. All the L2 components are grouped as a class.
>> + *
>> + * When creating an MSC, struct mpam_msc is added to the all
>> mpam_all_msc list,
>> + * then linked via struct mpam_ris to a vmsc, component and class.
>> + * The same MSC may exist under different class->component->vmsc
>> paths, but the
>> + * RIS index will be unique.
>> + */
>> +LIST_HEAD(mpam_classes);
>> +
>> +/* List of all objects that can be free()d after synchronise_srcu() */
>> +static LLIST_HEAD(mpam_garbage);
>> +
>> +static inline void init_garbage(struct mpam_garbage *garbage)
>> +{
>> +    init_llist_node(&garbage->llist);
>> +}
>> +
>> +#define add_to_garbage(x)                \
>> +do {                            \
>> +    __typeof__(x) _x = (x);                \
>> +    _x->garbage.to_free = _x;            \
>> +    llist_add(&_x->garbage.llist, &mpam_garbage);    \
>> +} while (0)
>> +
>> +static void mpam_free_garbage(void)
>> +{
>> +    struct mpam_garbage *iter, *tmp;
>> +    struct llist_node *to_free = llist_del_all(&mpam_garbage);
>> +
>> +    if (!to_free)
>> +        return;
>> +
>> +    synchronize_srcu(&mpam_srcu);
>> +
>> +    llist_for_each_entry_safe(iter, tmp, to_free, llist) {
>> +        if (iter->pdev)
>> +            devm_kfree(&iter->pdev->dev, iter->to_free);
>> +        else
>> +            kfree(iter->to_free);
>> +    }
>> +}
>> +
>> +static struct mpam_vmsc *
>> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
>> +{
>> +    struct mpam_vmsc *vmsc;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
>> +    if (!vmsc)
>> +        return ERR_PTR(-ENOMEM);
>> +    init_garbage(&vmsc->garbage);
>> +
>> +    INIT_LIST_HEAD_RCU(&vmsc->ris);
>> +    INIT_LIST_HEAD_RCU(&vmsc->comp_list);
>> +    vmsc->comp = comp;
>> +    vmsc->msc = msc;
>> +
>> +    list_add_rcu(&vmsc->comp_list, &comp->vmsc);
>> +
>> +    return vmsc;
>> +}
>> +
>> +static void mpam_component_destroy(struct mpam_component *comp);
>> +
>> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
>> +{
>> +    struct mpam_component *comp = vmsc->comp;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_del_rcu(&vmsc->comp_list);
>> +    add_to_garbage(vmsc);
>> +
>> +    if (list_empty(&comp->vmsc))
>> +        mpam_component_destroy(comp);
>> +}
>> +
>> +static struct mpam_vmsc *
>> +mpam_vmsc_find(struct mpam_component *comp, struct mpam_msc *msc)
>> +{
>> +    struct mpam_vmsc *vmsc;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
>> +        if (vmsc->msc->id == msc->id)
>> +            return vmsc;
>> +    }
>> +
>> +    return mpam_vmsc_alloc(comp, msc);
>> +}
>> +
>> +static struct mpam_component *
>> +mpam_component_alloc(struct mpam_class *class, int id)
>> +{
>> +    struct mpam_component *comp;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    comp = kzalloc(sizeof(*comp), GFP_KERNEL);
>> +    if (!comp)
>> +        return ERR_PTR(-ENOMEM);
>> +    init_garbage(&comp->garbage);
>> +
>> +    comp->comp_id = id;
>> +    INIT_LIST_HEAD_RCU(&comp->vmsc);
>> +    /* affinity is updated when ris are added */
> 
>     /* Affinity is updated when RIS is added */
> 
>> +    INIT_LIST_HEAD_RCU(&comp->class_list);
>> +    comp->class = class;
>> +
>> +    list_add_rcu(&comp->class_list, &class->components);
>> +
>> +    return comp;
>> +}
>> +
>> +static void mpam_class_destroy(struct mpam_class *class);
>> +
>> +static void mpam_component_destroy(struct mpam_component *comp)
>> +{
>> +    struct mpam_class *class = comp->class;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_del_rcu(&comp->class_list);
>> +    add_to_garbage(comp);
>> +
>> +    if (list_empty(&class->components))
>> +        mpam_class_destroy(class);
>> +}
>> +
>> +static struct mpam_component *
>> +mpam_component_find(struct mpam_class *class, int id)
>> +{
>> +    struct mpam_component *comp;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_for_each_entry(comp, &class->components, class_list) {
>> +        if (comp->comp_id == id)
>> +            return comp;
>> +    }
>> +
>> +    return mpam_component_alloc(class, id);
>> +}
>> +
>> +static struct mpam_class *
>> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
>> +{
>> +    struct mpam_class *class;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    class = kzalloc(sizeof(*class), GFP_KERNEL);
>> +    if (!class)
>> +        return ERR_PTR(-ENOMEM);
>> +    init_garbage(&class->garbage);
>> +
>> +    INIT_LIST_HEAD_RCU(&class->components);
>> +    /* affinity is updated when ris are added */
> 
>     /* Affinity is updated when RIS is added */
> 
>> +    class->level = level_idx;
>> +    class->type = type;
>> +    INIT_LIST_HEAD_RCU(&class->classes_list);
>> +
>> +    list_add_rcu(&class->classes_list, &mpam_classes);
>> +
>> +    return class;
>> +}
>> +
>> +static void mpam_class_destroy(struct mpam_class *class)
>> +{
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_del_rcu(&class->classes_list);
>> +    add_to_garbage(class);
>> +}
>> +
>> +static struct mpam_class *
>> +mpam_class_find(u8 level_idx, enum mpam_class_types type)
>> +{
>> +    struct mpam_class *class;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_for_each_entry(class, &mpam_classes, classes_list) {
>> +        if (class->type == type && class->level == level_idx)
>> +            return class;
>> +    }
>> +
>> +    return mpam_class_alloc(level_idx, type);
>> +}
>> +
>> +/*
>> + * The cacheinfo structures are only populated when CPUs are online.
>> + * This helper walks the acpi tables to include offline CPUs too.
>> + */
>> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
>> cache_level,
>> +                   cpumask_t *affinity)
>> +{
>> +    return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
>> +}
>> +
> 
> This function is only used in mpam_devices.c and won't be exposed in the
> future, we can make it 'static' and 'inline'.

Done

> 
>> +/*
>> + * cpumask_of_node() only knows about online CPUs. This can't tell us
>> whether
>> + * a class is represented on all possible CPUs.
>> + */
>> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
>> +{
>> +    int cpu;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        if (node_id == cpu_to_node(cpu))
>> +            cpumask_set_cpu(cpu, affinity);
>> +    }
>> +}
>> +
>> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t
>> *affinity,
>> +                 enum mpam_class_types type,
>> +                 struct mpam_class *class,
>> +                 struct mpam_component *comp)
>> +{
>> +    int err;
>> +
>> +    switch (type) {
>> +    case MPAM_CLASS_CACHE:
>> +        err = mpam_get_cpumask_from_cache_id(comp->comp_id, class-
>> >level,
>> +                             affinity);
>> +        if (err)
>> +            return err;
> 
> It's worthy to add a warning message here.

Added: Failed to determine CPU affinity

> 
>> +
>> +        if (cpumask_empty(affinity))
>> +            dev_warn_once(&msc->pdev->dev,
>> +                      "no CPUs associated with cache node\n");
> 
> {} is needed here.

Made it one line.

> 
>> +
>> +        break;
>> +    case MPAM_CLASS_MEMORY:
>> +        get_cpumask_from_node_id(comp->comp_id, affinity);
>> +        /* affinity may be empty for CPU-less memory nodes */
>> +        break;
>> +    case MPAM_CLASS_UNKNOWN:
>> +        return 0;
>> +    }
>> +
>> +    cpumask_and(affinity, affinity, &msc->accessibility);
>> +
>> +    return 0;
>> +}
>> +
>> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>> +                  enum mpam_class_types type, u8 class_id,
>> +                  int component_id)
>> +{
>> +    int err;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +    struct mpam_class *class;
>> +    struct mpam_component *comp;
>> +    struct platform_device *pdev = msc->pdev;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
>> +        return -EINVAL;
>> +
>> +    if (test_and_set_bit(ris_idx, &msc->ris_idxs))
>> +        return -EBUSY;
>> +
>> +    ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
>> +    if (!ris)
>> +        return -ENOMEM;
>> +    init_garbage(&ris->garbage);
>> +    ris->garbage.pdev = pdev;
>> +
>> +    class = mpam_class_find(class_id, type);
>> +    if (IS_ERR(class))
>> +        return PTR_ERR(class);
>> +
>> +    comp = mpam_component_find(class, component_id);
>> +    if (IS_ERR(comp)) {
>> +        if (list_empty(&class->components))
>> +            mpam_class_destroy(class);
>> +        return PTR_ERR(comp);
>> +    }
>> +
>> +    vmsc = mpam_vmsc_find(comp, msc);
>> +    if (IS_ERR(vmsc)) {
>> +        if (list_empty(&comp->vmsc))
>> +            mpam_component_destroy(comp);
>> +        return PTR_ERR(vmsc);
>> +    }
>> +
>> +    err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
>> +    if (err) {
>> +        if (list_empty(&vmsc->ris))
>> +            mpam_vmsc_destroy(vmsc);
>> +        return err;
>> +    }
>> +
>> +    ris->ris_idx = ris_idx;
>> +    INIT_LIST_HEAD_RCU(&ris->msc_list);
>> +    INIT_LIST_HEAD_RCU(&ris->vmsc_list);
>> +    ris->vmsc = vmsc;
>> +
>> +    cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>> +    cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>> +    list_add_rcu(&ris->vmsc_list, &vmsc->ris);
>> +    list_add_rcu(&ris->msc_list, &msc->ris);
>> +
>> +    return 0;
>> +}
>> +
>> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
>> +{
>> +    struct mpam_vmsc *vmsc = ris->vmsc;
>> +    struct mpam_msc *msc = vmsc->msc;
>> +    struct mpam_component *comp = vmsc->comp;
>> +    struct mpam_class *class = comp->class;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    /*
>> +     * It is assumed affinities don't overlap. If they do the class
>> becomes
>> +     * unusable immediately.
>> +     */
>> +    cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
>> +    cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
>> +    clear_bit(ris->ris_idx, &msc->ris_idxs);
>> +    list_del_rcu(&ris->msc_list);
>> +    list_del_rcu(&ris->vmsc_list);
>> +    add_to_garbage(ris);
>> +
>> +    if (list_empty(&vmsc->ris))
>> +        mpam_vmsc_destroy(vmsc);
>> +}
>> +
>> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>> +            enum mpam_class_types type, u8 class_id, int component_id)
>> +{
>> +    int err;
>> +
>> +    mutex_lock(&mpam_list_lock);
>> +    err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
>> +                     component_id);
>> +    mutex_unlock(&mpam_list_lock);
>> +    if (err)
>> +        mpam_free_garbage();
>> +
>> +    return err;
>> +}
>> +
>>   /*
>>    * An MSC can control traffic from a set of CPUs, but may only be
>> accessible
>>    * from a (hopefully wider) set of CPUs. The common reason for this
>> is power
>> @@ -60,14 +438,25 @@ static int update_msc_accessibility(struct
>> mpam_msc *msc)
>>     static int fw_num_msc;
>>   +/*
>> + * There are two ways of reaching a struct mpam_msc_ris. Via the
>> + * class->component->vmsc->ris, or via the msc.
>> + * When destroying the msc, the other side needs unlinking and
>> cleaning up too.
>> + */
>>   static void mpam_msc_destroy(struct mpam_msc *msc)
>>   {
>>       struct platform_device *pdev = msc->pdev;
>> +    struct mpam_msc_ris *ris, *tmp;
>>         lockdep_assert_held(&mpam_list_lock);
>>   +    list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
>> +        mpam_ris_destroy(ris);
>> +
>>       list_del_rcu(&msc->all_msc_list);
>>       platform_set_drvdata(pdev, NULL);
>> +
>> +    add_to_garbage(msc);
>>   }
>>     static void mpam_msc_drv_remove(struct platform_device *pdev)
>> @@ -81,7 +470,7 @@ static void mpam_msc_drv_remove(struct
>> platform_device *pdev)
>>       mpam_msc_destroy(msc);
>>       mutex_unlock(&mpam_list_lock);
>>   -    synchronize_srcu(&mpam_srcu);
>> +    mpam_free_garbage();
>>   }
>>     static struct mpam_msc *do_mpam_msc_drv_probe(struct
>> platform_device *pdev)
>> @@ -97,6 +486,8 @@ static struct mpam_msc
>> *do_mpam_msc_drv_probe(struct platform_device *pdev)
>>       msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>>       if (!msc)
>>           return ERR_PTR(-ENOMEM);
>> +    init_garbage(&msc->garbage);
>> +    msc->garbage.pdev = pdev;
>>         err = devm_mutex_init(dev, &msc->probe_lock);
>>       if (err)
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/
>> mpam_internal.h
>> index 540066903eca..8f7a28d2c021 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -7,11 +7,30 @@
>>   #include <linux/arm_mpam.h>
>>   #include <linux/cpumask.h>
>>   #include <linux/io.h>
>> +#include <linux/llist.h>
>>   #include <linux/mutex.h>
>> +#include <linux/srcu.h>
>>   #include <linux/types.h>
>>   +#define MPAM_MSC_MAX_NUM_RIS    16
>> +
>>   struct platform_device;
>>   +/*
>> + * Structures protected by SRCU may not be freed for a surprising
>> amount of
>> + * time (especially if perf is running). To ensure the MPAM error
>> interrupt can
>> + * tear down all the structures, build a list of objects that can be
>> garbage
>> + * collected once synchronize_srcu() has returned.
>> + * If pdev is non-NULL, use devm_kfree().
>> + */
>> +struct mpam_garbage {
>> +    /* member of mpam_garbage */
>> +    struct llist_node    llist;
>> +
>> +    void            *to_free;
>> +    struct platform_device    *pdev;
>> +};
>> +
>>   struct mpam_msc {
>>       /* member of mpam_all_msc */
>>       struct list_head    all_msc_list;
>> @@ -45,5 +64,80 @@ struct mpam_msc {
>>         void __iomem        *mapped_hwpage;
>>       size_t            mapped_hwpage_sz;
>> +
>> +    struct mpam_garbage    garbage;
>> +};
>> +
>> +struct mpam_class {
>> +    /* mpam_components in this class */
>> +    struct list_head    components;
>> +
>> +    cpumask_t        affinity;
>> +
>> +    u8            level;
>> +    enum mpam_class_types    type;
>> +
>> +    /* member of mpam_classes */
>> +    struct list_head    classes_list;
>> +
>> +    struct mpam_garbage    garbage;
>> +};
>> +
>> +struct mpam_component {
>> +    u32            comp_id;
>> +
>> +    /* mpam_vmsc in this component */
>> +    struct list_head    vmsc;
>> +
>> +    cpumask_t        affinity;
>> +
>> +    /* member of mpam_class:components */
>> +    struct list_head    class_list;
>> +
>> +    /* parent: */
>> +    struct mpam_class    *class;
>> +
>> +    struct mpam_garbage    garbage;
>> +};
>> +
>> +struct mpam_vmsc {
>> +    /* member of mpam_component:vmsc_list */
>> +    struct list_head    comp_list;
>> +
>> +    /* mpam_msc_ris in this vmsc */
>> +    struct list_head    ris;
>> +
>> +    /* All RIS in this vMSC are members of this MSC */
>> +    struct mpam_msc        *msc;
>> +
>> +    /* parent: */
>> +    struct mpam_component    *comp;
>> +
>> +    struct mpam_garbage    garbage;
>> +};
>> +
>> +struct mpam_msc_ris {
>> +    u8            ris_idx;
>> +
>> +    cpumask_t        affinity;
>> +
>> +    /* member of mpam_vmsc:ris */
>> +    struct list_head    vmsc_list;
>> +
>> +    /* member of mpam_msc:ris */
>> +    struct list_head    msc_list;
>> +
>> +    /* parent: */
>> +    struct mpam_vmsc    *vmsc;
>> +
>> +    struct mpam_garbage    garbage;
>>   };
>> +
>> +/* List of all classes - protected by srcu*/
>> +extern struct srcu_struct mpam_srcu;
>> +extern struct list_head mpam_classes;
>> +
>> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
>> cache_level,
>> +                   cpumask_t *affinity);
>> +
>>   #endif /* MPAM_INTERNAL_H */
>> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
>> index a3828ef91aee..5a3aab6bb1d4 100644
>> --- a/include/linux/arm_mpam.h
>> +++ b/include/linux/arm_mpam.h
>> @@ -37,11 +37,16 @@ static inline int acpi_mpam_parse_resources(struct
>> mpam_msc *msc,
>>   static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>>   #endif
>>   +#ifdef CONFIG_ARM64_MPAM_DRIVER
>> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>> +            enum mpam_class_types type, u8 class_id, int component_id);
>> +#else
>>   static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>                     enum mpam_class_types type, u8 class_id,
>>                     int component_id)
>>   {
>>       return -EINVAL;
>>   }
>> +#endif
>>     #endif /* __LINUX_ARM_MPAM_H */
> 
> Thanks,
> Gavin
> 

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-12 16:39     ` Ben Horgan
@ 2025-11-12 16:48       ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 16:48 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

I was a bit hasty on one of those changes.

On 11/12/25 16:39, Ben Horgan wrote:
> Hi Gavin,
> 
> On 11/9/25 00:07, Gavin Shan wrote:
>> Hi Ben,
>>
>> On 11/7/25 10:34 PM, Ben Horgan wrote:
>>> From: James Morse <james.morse@arm.com>
>>>
>>> An MSC is a container of resources, each identified by their RIS index.
>>> Some RIS are described by firmware to provide their position in the
>>> system.
>>> Others are discovered when the driver probes the hardware.
>>>
>>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>>> There are two kinds of grouping, a class is a set of components, which
>>> are visible to user-space as there are likely to be multiple instances
>>> of the L2 cache. (e.g. one per cluster or package)
>>>
>>> Add support for creating and destroying structures to allow a hierarchy
>>> of resources to be created.
>>>
>>> CC: Ben Horgan <ben.horgan@arm.com>
>>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>>> Tested-by: Peter Newman <peternewman@google.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

>> This function is only used in mpam_devices.c and won't be exposed in the
>> future, we can make it 'static' and 'inline'.
> 
> Done

Gets used later in mpam_resctl.c so I'll keep as is.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
  2025-11-09  0:07   ` Gavin Shan
@ 2025-11-10 17:10   ` Jonathan Cameron
  2025-11-12 17:21     ` Ben Horgan
  2025-11-12  7:29   ` Shaopeng Tan (Fujitsu)
  2025-11-13  3:23   ` Fenghua Yu
  3 siblings, 1 reply; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 17:10 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On Fri,  7 Nov 2025 12:34:28 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> Add support for creating and destroying structures to allow a hierarchy
> of resources to be created.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
Hi Ben,

Remember to clear out CC'ing yourself.

> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Jonathan:
> Code reordering.

I'm guessing I may have sent things in a slightly less than ideal directly.

Why can't we have ordering as follows (with no forwards declarations)

mpam_class_alloc()
mpam_class_destroy()
//maybe other mpam_class stuff here
mpam_component_alloc()
mpam_component_destroy() - needs mpam_class_destroy()
//maybe other mpam_component stuff here
mpam_vmsc_alloc()
mpam_vmsc_destroy() - needs mpam_component_destroy()
//other mpam_vmsc here
mpam_ris_create_locked() - needs all the destroys.
mpam_ris_destroy() - needs mpam vmsc_destroy()

I may well have missed a more complex dependency chain.

Other than that, LGTM. Given any change in ordering can be trivially verified
by building it and Gavin's comments seem simple to resolve.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


> Comments.
> ---
>  drivers/resctrl/mpam_devices.c  | 393 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  94 ++++++++
>  include/linux/arm_mpam.h        |   5 +
>  3 files changed, 491 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 6c6be133d73a..48a344d5cb43 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -36,6 +36,384 @@ struct srcu_struct mpam_srcu;
>   */
>  static atomic_t mpam_num_msc;
>  
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +static inline void init_garbage(struct mpam_garbage *garbage)
> +{
> +	init_llist_node(&garbage->llist);
> +}
> +
> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = (x);				\
> +	_x->garbage.to_free = _x;			\
> +	llist_add(&_x->garbage.llist, &mpam_garbage);	\
> +} while (0)
> +
> +static void mpam_free_garbage(void)
> +{
> +	struct mpam_garbage *iter, *tmp;
> +	struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +
> +	if (!to_free)
> +		return;
> +
> +	synchronize_srcu(&mpam_srcu);
> +
> +	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> +		if (iter->pdev)
> +			devm_kfree(&iter->pdev->dev, iter->to_free);
> +		else
> +			kfree(iter->to_free);
> +	}
> +}
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), GFP_KERNEL);
> +	if (!vmsc)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(&vmsc->garbage);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
> +
> +static void mpam_component_destroy(struct mpam_component *comp);
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_component_destroy(comp);
> +}
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_find(struct mpam_component *comp, struct mpam_msc *msc)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (vmsc->msc->id == msc->id)
> +			return vmsc;
> +	}
> +
> +	return mpam_vmsc_alloc(comp, msc);
> +}
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	comp = kzalloc(sizeof(*comp), GFP_KERNEL);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(&comp->garbage);
> +
> +	comp->comp_id = id;
> +	INIT_LIST_HEAD_RCU(&comp->vmsc);
> +	/* affinity is updated when ris are added */
> +	INIT_LIST_HEAD_RCU(&comp->class_list);
> +	comp->class = class;
> +
> +	list_add_rcu(&comp->class_list, &class->components);
> +
> +	return comp;
> +}




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-10 17:10   ` Jonathan Cameron
@ 2025-11-12 17:21     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-12 17:21 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Jonathan,

On 11/10/25 17:10, Jonathan Cameron wrote:
> On Fri,  7 Nov 2025 12:34:28 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> From: James Morse <james.morse@arm.com>
>>
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> Add support for creating and destroying structures to allow a hierarchy
>> of resources to be created.
>>
>> CC: Ben Horgan <ben.horgan@arm.com>
> Hi Ben,
> 
> Remember to clear out CC'ing yourself.
> 
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> Jonathan:
>> Code reordering.
> 
> I'm guessing I may have sent things in a slightly less than ideal directly.
> 
> Why can't we have ordering as follows (with no forwards declarations)
> 
> mpam_class_alloc()
> mpam_class_destroy()
> //maybe other mpam_class stuff here
> mpam_component_alloc()
> mpam_component_destroy() - needs mpam_class_destroy()
> //maybe other mpam_component stuff here
> mpam_vmsc_alloc()
> mpam_vmsc_destroy() - needs mpam_component_destroy()
> //other mpam_vmsc here

This works and then I need to add mpam_ris_get_affinity() as
mpam_ris_create_locked() depends on it.

I also add the helper functions it depends on
mpam_get_cpumask_from_cache_id() and get_cpumask_from_node_id().

> mpam_ris_create_locked() - needs all the destroys.
> mpam_ris_destroy() - needs mpam vmsc_destroy()

> 
> I may well have missed a more complex dependency chain.
> 
> Other than that, LGTM. Given any change in ordering can be trivially verified
> by building it and Gavin's comments seem simple to resolve.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
  2025-11-09  0:07   ` Gavin Shan
  2025-11-10 17:10   ` Jonathan Cameron
@ 2025-11-12  7:29   ` Shaopeng Tan (Fujitsu)
  2025-11-13  3:23   ` Fenghua Yu
  3 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  7:29 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which are
> visible to user-space as there are likely to be multiple instances of the L2 cache.
> (e.g. one per cluster or package)
> 
> Add support for creating and destroying structures to allow a hierarchy of
> resources to be created.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
                     ` (2 preceding siblings ...)
  2025-11-12  7:29   ` Shaopeng Tan (Fujitsu)
@ 2025-11-13  3:23   ` Fenghua Yu
  2025-11-13 16:39     ` Ben Horgan
  3 siblings, 1 reply; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13  3:23 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi, Ben and James,

On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>

[SNIP]

> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +	struct platform_device *pdev = msc->pdev;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
> +		return -EINVAL;
> +
> +	if (test_and_set_bit(ris_idx, &msc->ris_idxs))
> +		return -EBUSY;
> +
> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
> +	if (!ris)
> +		return -ENOMEM;

The ris_idx bit in msc->ris_idxs is not cleared on error paths in this 
function. The bit cannot be set again.

Not sure if this is a real problem in any case. Clearing the bit on 
error paths may be clean code.

[SNIP]

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris
  2025-11-13  3:23   ` Fenghua Yu
@ 2025-11-13 16:39     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-13 16:39 UTC (permalink / raw)
  To: Fenghua Yu, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Fenghua,

On 11/13/25 03:23, Fenghua Yu wrote:
> Hi, Ben and James,
> 
> On 11/7/25 04:34, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
> 
> [SNIP]
> 
>> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>> +                  enum mpam_class_types type, u8 class_id,
>> +                  int component_id)
>> +{
>> +    int err;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +    struct mpam_class *class;
>> +    struct mpam_component *comp;
>> +    struct platform_device *pdev = msc->pdev;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
>> +        return -EINVAL;
>> +
>> +    if (test_and_set_bit(ris_idx, &msc->ris_idxs))
>> +        return -EBUSY;
>> +
>> +    ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
>> +    if (!ris)
>> +        return -ENOMEM;
> 
> The ris_idx bit in msc->ris_idxs is not cleared on error paths in this
> function. The bit cannot be set again.
> 
> Not sure if this is a real problem in any case. Clearing the bit on
> error paths may be clean code.

I think this would just add noise. There is no anticipation of any
functionality after failing to create RIS and I expect there are more
places in the code where this assumption is relied on.

> 
> [SNIP]
> 
> Thanks.
> 
> -Fenghua

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 12/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (10 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09  0:25   ` Gavin Shan
  2025-11-07 12:34 ` [PATCH 13/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware Ben Horgan
                   ` (27 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.

Add the definitions for these registers as offset within the page(s).

Link: https://developer.arm.com/documentation/ihi0099/aa/
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Add tags - thanks!
Consistent naming of long counter variants (Jonathan)
---
 drivers/resctrl/mpam_internal.h | 267 ++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)

diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 8f7a28d2c021..51f791cc207b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -140,4 +140,271 @@ extern struct list_head mpam_classes;
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/aa/
+ */
+#define MPAM_ARCHITECTURE_V1    0x10
+
+/* Memory mapped control pages */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR		0x0000  /* features id register */
+#define MPAMF_IIDR		0x0018  /* implementer id register */
+#define MPAMF_AIDR		0x0020  /* architectural id register */
+#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
+#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
+#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
+#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
+#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL	0x0100  /* partid to configure */
+#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
+#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
+#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
+#define MPAMCFG_CASSOC		0x0118  /* cache-associativity config */
+#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
+#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
+#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
+#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
+#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
+#define MSMON_CSU		0x0840  /* current cache-usage */
+#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
+#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
+#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
+#define MSMON_MBWU_L_CAPTURE	0x0890  /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
+#define MPAMF_ESR		0x00F8  /* error status register */
+#define MPAMF_ECR		0x00F0  /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
+#define MPAMF_IDR_EXT			BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
+#define MPAMF_IDR_HAS_MSMON		BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
+#define MPAMF_IDR_HAS_RIS		BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
+#define MPAMF_IDR_HAS_ESR		BIT(39)
+#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
+#define MPAMF_MBW_IDR_WINDWR		BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX	GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
+#define MPAMF_IIDR_REVISION	GENMASK(15, 12)
+#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
+#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
+#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
+#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMCFG_CASSOC - MPAM cache maximum associativity partition configuration register */
+#define MPAMCFG_CASSOC_CASSOC		GENMASK(15, 0)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
+#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ *                     register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ *                    configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN		BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
+#define MPAMF_ESR_PMG		GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR		BIT(31)
+#define MPAMF_ESR_RIS		GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN		BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE			0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
+#define MPAM_ERRCODE_REQ_PMG_RANGE		4
+#define MPAM_ERRCODE_MONITOR_RANGE		5
+#define MPAM_ERRCODE_INTPARTID_RANGE		6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
+#define MPAM_ERRCODE_UNDEFINED_RIS_PART_SEL	8
+#define MPAM_ERRCODE_RIS_NO_CONTROL		9
+#define MPAM_ERRCODE_UNDEFINED_RIS_MON_SEL	10
+#define MPAM_ERRCODE_RIS_NO_MONITOR		11
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ *                    usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
+#define MSMON_CFG_MBWU_CTL_SCLEN		BIT(19)
+#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN			BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU		0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU		0x43
+
+/*
+ * MSMON_CFG_CSU_FLT -  Memory system performance monitor configure cache storage
+ *                      usage monitor filter register
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ *                      bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_x_FLT_PARTID			GENMASK(15, 0)
+#define MSMON_CFG_x_FLT_PMG			GENMASK(23, 16)
+
+#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
+#define MSMON_CFG_CSU_FLT_XCL			BIT(31)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ *            register
+ * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
+ *                     capture register
+ * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
+ *               monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ *                     capture register
+ */
+#define MSMON___VALUE		GENMASK(30, 0)
+#define MSMON___NRDY		BIT(31)
+#define MSMON___L_NRDY		BIT(63)
+#define MSMON___L_VALUE		GENMASK(43, 0)
+#define MSMON___LWD_VALUE	GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ *                  generation register
+ */
+#define MSMON_CAPT_EVNT_NOW	BIT(0)
+
 #endif /* MPAM_INTERNAL_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 12/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-11-07 12:34 ` [PATCH 12/33] arm_mpam: Add MPAM MSC register layout definitions Ben Horgan
@ 2025-11-09  0:25   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09  0:25 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
> 
> Link: https://developer.arm.com/documentation/ihi0099/aa/
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Add tags - thanks!
> Consistent naming of long counter variants (Jonathan)
> ---
>   drivers/resctrl/mpam_internal.h | 267 ++++++++++++++++++++++++++++++++
>   1 file changed, 267 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 13/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (11 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 12/33] arm_mpam: Add MPAM MSC register layout definitions Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-07 12:34 ` [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values Ben Horgan
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Lecopzer Chen, Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.

Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed as each CPU's online call is made.

This adds the low-level MSC register read accessors.

Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced.  The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.

The architecture's context switch code will be enabled by a static-key,
this can be set by mpam_enable(), but must be done from process context,
not a cpuhp callback because both take the cpuhp lock.
Whenever a new MSC has been probed, the mpam_enable() work is scheduled
to test if all the MSCs have been probed. If probing fails, mpam_disable()
is scheduled to unregister the cpuhp callbacks and free memory.

CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Add a tag - thanks!
Include tidying
---
 drivers/resctrl/mpam_devices.c  | 176 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   5 +
 2 files changed, 180 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 48a344d5cb43..4162a7a57626 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,8 +4,10 @@
 #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
 
 #include <linux/acpi.h>
+#include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/cacheinfo.h>
+#include <linux/cpu.h>
 #include <linux/cpumask.h>
 #include <linux/device.h>
 #include <linux/errno.h>
@@ -17,6 +19,7 @@
 #include <linux/printk.h>
 #include <linux/srcu.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include "mpam_internal.h"
 
@@ -36,6 +39,25 @@ struct srcu_struct mpam_srcu;
  */
 static atomic_t mpam_num_msc;
 
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
+/* When mpam is disabled, the printed reason to aid debugging */
+static char *mpam_disable_reason;
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -105,6 +127,21 @@ static void mpam_free_garbage(void)
 	}
 }
 
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
+
 static struct mpam_vmsc *
 mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
 {
@@ -414,6 +451,86 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+	u64 idr;
+	struct device *dev = &msc->pdev->dev;
+
+	lockdep_assert_held(&msc->probe_lock);
+
+	idr = __mpam_read_reg(msc, MPAMF_AIDR);
+	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+		dev_err_once(dev, "MSC does not match MPAM architecture v1.x\n");
+		return -EIO;
+	}
+
+	msc->probed = true;
+
+	return 0;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
+{
+	return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+	int err = 0;
+	struct mpam_msc *msc;
+	bool new_device_probed = false;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			err = mpam_msc_hw_probe(msc);
+		mutex_unlock(&msc->probe_lock);
+
+		if (err)
+			break;
+		new_device_probed = true;
+	}
+
+	if (new_device_probed && !err)
+		schedule_work(&mpam_enable_work);
+	if (err) {
+		mpam_disable_reason = "error during probing";
+		schedule_work(&mpam_broken_work);
+	}
+
+	return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+	return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+					  int (*offline)(unsigned int offline),
+					  char *name)
+{
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+
+	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, name, online,
+					     offline);
+	if (mpam_cpuhp_state <= 0) {
+		pr_err("Failed to register cpuhp callbacks");
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -553,7 +670,8 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 	}
 
 	if (!err && atomic_add_return(1, &mpam_num_msc) == fw_num_msc)
-		pr_info("Discovered all MSC\n");
+		mpam_register_cpuhp_callbacks(mpam_discovery_cpu_online, NULL,
+					      "mpam:drv_probe");
 
 	return err;
 }
@@ -566,6 +684,62 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+static void mpam_enable_once(void)
+{
+	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
+				      "mpam:online");
+
+	pr_info("MPAM enabled\n");
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+	struct mpam_msc *msc, *tmp;
+
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
+		mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+	mpam_free_garbage();
+
+	pr_err_once("MPAM disabled due to %s\n", mpam_disable_reason);
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+	static atomic_t once;
+	struct mpam_msc *msc;
+	bool all_devices_probed = true;
+
+	/* Have we probed all the hw devices? */
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			all_devices_probed = false;
+		mutex_unlock(&msc->probe_lock);
+
+		if (!all_devices_probed)
+			break;
+	}
+
+	if (all_devices_probed && !atomic_fetch_inc(&once))
+		mpam_enable_once();
+}
+
 static int __init mpam_msc_driver_init(void)
 {
 	if (!system_supports_mpam())
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 51f791cc207b..4e1538d29783 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -48,6 +48,7 @@ struct mpam_msc {
 	 * properties become read-only and the lists are protected by SRCU.
 	 */
 	struct mutex		probe_lock;
+	bool			probed;
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
@@ -137,6 +138,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+void mpam_disable(struct work_struct *work);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (12 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 13/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09  0:43   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers Ben Horgan
                   ` (25 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may also have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
The limits from requestors (e.g. CPUs) also need taking into account.

While doing this, RIS entries that firmware didn't describe are created
under MPAM_CLASS_UNKNOWN.

This adds the low level MSC write accessors.

While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
From Jonathan:
Stray comma in printk
Unnecessary braces
---
 drivers/resctrl/mpam_devices.c  | 148 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   6 ++
 include/linux/arm_mpam.h        |  14 +++
 3 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 4162a7a57626..e1e26cb350f7 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
 #include <linux/acpi.h>
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -42,6 +43,15 @@ static atomic_t mpam_num_msc;
 static int mpam_cpuhp_state;
 static DEFINE_MUTEX(mpam_cpuhp_state_lock);
 
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
 /*
  * mpam is enabled once all devices have been probed from CPU online callbacks,
  * scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -142,6 +152,70 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
 
 #define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
 
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	WARN_ON_ONCE(reg + sizeof(u32) >= msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	__mpam_write_reg(msc, reg, val);
+}
+
+#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+	u64 idr_high = 0, idr_low;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	idr_low = mpam_read_partsel_reg(msc, IDR);
+	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+	return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+	guard(spinlock)(&partid_max_lock);
+	if (!partid_max_init) {
+		mpam_partid_max = partid_max;
+		mpam_pmg_max = pmg_max;
+		partid_max_init = true;
+	} else if (!partid_max_published) {
+		mpam_partid_max = min(mpam_partid_max, partid_max);
+		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+	} else {
+		/* New requestors can't lower the values */
+		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+			return -EBUSY;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
 static struct mpam_vmsc *
 mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
 {
@@ -451,9 +525,35 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+						   u8 ris_idx)
+{
+	int err;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (!test_bit(ris_idx, &msc->ris_idxs)) {
+		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+					     0, 0);
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	list_for_each_entry(ris, &msc->ris, msc_list) {
+		if (ris->ris_idx == ris_idx)
+			return ris;
+	}
+
+	return ERR_PTR(-ENOENT);
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
+	u16 partid_max;
+	u8 ris_idx, pmg_max;
+	struct mpam_msc_ris *ris;
 	struct device *dev = &msc->pdev->dev;
 
 	lockdep_assert_held(&msc->probe_lock);
@@ -464,6 +564,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		return -EIO;
 	}
 
+	/* Grab an IDR value to find out how many RIS there are */
+	mutex_lock(&msc->part_sel_lock);
+	idr = mpam_msc_read_idr(msc);
+	mutex_unlock(&msc->part_sel_lock);
+
+	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+	/* Use these values so partid/pmg always starts with a valid value */
+	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		idr = mpam_msc_read_idr(msc);
+		mutex_unlock(&msc->part_sel_lock);
+
+		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+		msc->partid_max = min(msc->partid_max, partid_max);
+		msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+		mutex_lock(&mpam_list_lock);
+		ris = mpam_get_or_create_ris(msc, ris_idx);
+		mutex_unlock(&mpam_list_lock);
+		if (IS_ERR(ris))
+			return PTR_ERR(ris);
+	}
+
+	spin_lock(&partid_max_lock);
+	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+	spin_unlock(&partid_max_lock);
+
 	msc->probed = true;
 
 	return 0;
@@ -686,10 +820,20 @@ static struct platform_driver mpam_msc_driver = {
 
 static void mpam_enable_once(void)
 {
+	/*
+	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+	 * longer change.
+	 */
+	spin_lock(&partid_max_lock);
+	partid_max_published = true;
+	spin_unlock(&partid_max_lock);
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
 
-	pr_info("MPAM enabled\n");
+	/* Use printk() to avoid the pr_fmt adding the function name. */
+	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
+	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
 void mpam_disable(struct work_struct *ignored)
@@ -756,4 +900,6 @@ static int __init mpam_msc_driver_init(void)
 
 	return platform_driver_register(&mpam_msc_driver);
 }
+
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4e1538d29783..768a58a3ab27 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -49,6 +49,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	u16			partid_max;
+	u8			pmg_max;
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
@@ -138,6 +140,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 5a3aab6bb1d4..c7246cfeb091 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -49,4 +49,18 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 }
 #endif
 
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max:		The maximum PARTID value the requestor can generate.
+ * @pmg_max:		The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-07 12:34 ` [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values Ben Horgan
@ 2025-11-09  0:43   ` Gavin Shan
  2025-11-10 23:26     ` Gavin Shan
  2025-11-12  7:57   ` Shaopeng Tan (Fujitsu)
  2025-11-13  3:50   ` Fenghua Yu
  2 siblings, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-09  0:43 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may also have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> The limits from requestors (e.g. CPUs) also need taking into account.
> 
> While doing this, RIS entries that firmware didn't describe are created
> under MPAM_CLASS_UNKNOWN.
> 
> This adds the low level MSC write accessors.
> 
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
>  From Jonathan:
> Stray comma in printk
> Unnecessary braces
> ---
>   drivers/resctrl/mpam_devices.c  | 148 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |   6 ++
>   include/linux/arm_mpam.h        |  14 +++
>   3 files changed, 167 insertions(+), 1 deletion(-)
> 

One comment related to 'partid_max_init', but this looks good to me in
either way:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 4162a7a57626..e1e26cb350f7 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -6,6 +6,7 @@
>   #include <linux/acpi.h>
>   #include <linux/atomic.h>
>   #include <linux/arm_mpam.h>
> +#include <linux/bitfield.h>
>   #include <linux/cacheinfo.h>
>   #include <linux/cpu.h>
>   #include <linux/cpumask.h>
> @@ -42,6 +43,15 @@ static atomic_t mpam_num_msc;
>   static int mpam_cpuhp_state;
>   static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>   
> +/*
> + * The smallest common values for any CPU or MSC in the system.
> + * Generating traffic outside this range will result in screaming interrupts.
> + */
> +u16 mpam_partid_max;
> +u8 mpam_pmg_max;
> +static bool partid_max_init, partid_max_published;
> +static DEFINE_SPINLOCK(partid_max_lock);
> +

If mpam_partid_max and mpam_pmg_max are initialized to USHRT_MAX and 255 here,
the state related to partid_max_init can be removed...

>   /*
>    * mpam is enabled once all devices have been probed from CPU online callbacks,
>    * scheduled via this work_struct. If access to an MSC depends on a CPU that
> @@ -142,6 +152,70 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>   
>   #define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
>   
> +static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	WARN_ON_ONCE(reg + sizeof(u32) >= msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	writel_relaxed(val, msc->mapped_hwpage + reg);
> +}
> +
> +static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	lockdep_assert_held_once(&msc->part_sel_lock);
> +	__mpam_write_reg(msc, reg, val);
> +}
> +
> +#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
> +
> +static u64 mpam_msc_read_idr(struct mpam_msc *msc)
> +{
> +	u64 idr_high = 0, idr_low;
> +
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	idr_low = mpam_read_partsel_reg(msc, IDR);
> +	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
> +		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
> +
> +	return (idr_high << 32) | idr_low;
> +}
> +
> +static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
> +{
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	mpam_write_partsel_reg(msc, PART_SEL, partsel);
> +}
> +
> +static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
> +{
> +	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
> +		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
> +
> +	__mpam_part_sel_raw(partsel, msc);
> +}
> +
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
> +{
> +	guard(spinlock)(&partid_max_lock);
> +	if (!partid_max_init) {
> +		mpam_partid_max = partid_max;
> +		mpam_pmg_max = pmg_max;
> +		partid_max_init = true;
> +	} else if (!partid_max_published) {
> +		mpam_partid_max = min(mpam_partid_max, partid_max);
> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
> +	} else {
> +		/* New requestors can't lower the values */
> +		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
> +			return -EBUSY;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(mpam_register_requestor);
> +

With mpam_partid_max and mpam_pmg_max initialized to USHRT_MAX and 255, this
can be:

	if (!partid_max_published) {
             ...
         } else {
             ...
         }

>   static struct mpam_vmsc *
>   mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
>   {
> @@ -451,9 +525,35 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>   	return err;
>   }
>   
> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> +						   u8 ris_idx)
> +{
> +	int err;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (!test_bit(ris_idx, &msc->ris_idxs)) {
> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> +					     0, 0);
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +
> +	list_for_each_entry(ris, &msc->ris, msc_list) {
> +		if (ris->ris_idx == ris_idx)
> +			return ris;
> +	}
> +
> +	return ERR_PTR(-ENOENT);
> +}
> +
>   static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   {
>   	u64 idr;
> +	u16 partid_max;
> +	u8 ris_idx, pmg_max;
> +	struct mpam_msc_ris *ris;
>   	struct device *dev = &msc->pdev->dev;
>   
>   	lockdep_assert_held(&msc->probe_lock);
> @@ -464,6 +564,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   		return -EIO;
>   	}
>   
> +	/* Grab an IDR value to find out how many RIS there are */
> +	mutex_lock(&msc->part_sel_lock);
> +	idr = mpam_msc_read_idr(msc);
> +	mutex_unlock(&msc->part_sel_lock);
> +
> +	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
> +
> +	/* Use these values so partid/pmg always starts with a valid value */
> +	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +
> +	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		idr = mpam_msc_read_idr(msc);
> +		mutex_unlock(&msc->part_sel_lock);
> +
> +		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +		msc->partid_max = min(msc->partid_max, partid_max);
> +		msc->pmg_max = min(msc->pmg_max, pmg_max);
> +
> +		mutex_lock(&mpam_list_lock);
> +		ris = mpam_get_or_create_ris(msc, ris_idx);
> +		mutex_unlock(&mpam_list_lock);
> +		if (IS_ERR(ris))
> +			return PTR_ERR(ris);
> +	}
> +
> +	spin_lock(&partid_max_lock);
> +	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
> +	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> +	spin_unlock(&partid_max_lock);
> +
>   	msc->probed = true;
>   
>   	return 0;
> @@ -686,10 +820,20 @@ static struct platform_driver mpam_msc_driver = {
>   
>   static void mpam_enable_once(void)
>   {
> +	/*
> +	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> +	 * longer change.
> +	 */
> +	spin_lock(&partid_max_lock);
> +	partid_max_published = true;
> +	spin_unlock(&partid_max_lock);
> +
>   	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
>   				      "mpam:online");
>   
> -	pr_info("MPAM enabled\n");
> +	/* Use printk() to avoid the pr_fmt adding the function name. */
> +	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
> +	       mpam_partid_max + 1, mpam_pmg_max + 1);
>   }
>   
>   void mpam_disable(struct work_struct *ignored)
> @@ -756,4 +900,6 @@ static int __init mpam_msc_driver_init(void)
>   
>   	return platform_driver_register(&mpam_msc_driver);
>   }
> +
> +/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
>   subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4e1538d29783..768a58a3ab27 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -49,6 +49,8 @@ struct mpam_msc {
>   	 */
>   	struct mutex		probe_lock;
>   	bool			probed;
> +	u16			partid_max;
> +	u8			pmg_max;
>   	unsigned long		ris_idxs;
>   	u32			ris_max;
>   
> @@ -138,6 +140,10 @@ struct mpam_msc_ris {
>   extern struct srcu_struct mpam_srcu;
>   extern struct list_head mpam_classes;
>   
> +/* System wide partid/pmg values */
> +extern u16 mpam_partid_max;
> +extern u8 mpam_pmg_max;
> +
>   /* Scheduled work callback to enable mpam once all MSC have been probed */
>   void mpam_enable(struct work_struct *work);
>   void mpam_disable(struct work_struct *work);
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 5a3aab6bb1d4..c7246cfeb091 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -49,4 +49,18 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>   }
>   #endif
>   
> +/**
> + * mpam_register_requestor() - Register a requestor with the MPAM driver
> + * @partid_max:		The maximum PARTID value the requestor can generate.
> + * @pmg_max:		The maximum PMG value the requestor can generate.
> + *
> + * Registers a requestor with the MPAM driver to ensure the chosen system-wide
> + * minimum PARTID and PMG values will allow the requestors features to be used.
> + *
> + * Returns an error if the registration is too late, and a larger PARTID/PMG
> + * value has been advertised to user-space. In this case the requestor should
> + * not use its MPAM features. Returns 0 on success.
> + */
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max);
> +
>   #endif /* __LINUX_ARM_MPAM_H */
Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-09  0:43   ` Gavin Shan
@ 2025-11-10 23:26     ` Gavin Shan
  2025-11-11  9:30       ` Ben Horgan
  0 siblings, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-10 23:26 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/9/25 10:43 AM, Gavin Shan wrote:
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> CPUs can generate traffic with a range of PARTID and PMG values,
>> but each MSC may also have its own maximum size for these fields.
>> Before MPAM can be used, the driver needs to probe each RIS on
>> each MSC, to find the system-wide smallest value that can be used.
>> The limits from requestors (e.g. CPUs) also need taking into account.
>>
>> While doing this, RIS entries that firmware didn't describe are created
>> under MPAM_CLASS_UNKNOWN.
>>
>> This adds the low level MSC write accessors.
>>
>> While we're here, implement the mpam_register_requestor() call
>> for the arch code to register the CPU limits. Future callers of this
>> will tell us about the SMMU and ITS.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>>  From Jonathan:
>> Stray comma in printk
>> Unnecessary braces
>> ---
>>   drivers/resctrl/mpam_devices.c  | 148 +++++++++++++++++++++++++++++++-
>>   drivers/resctrl/mpam_internal.h |   6 ++
>>   include/linux/arm_mpam.h        |  14 +++
>>   3 files changed, 167 insertions(+), 1 deletion(-)
>>
> 
> One comment related to 'partid_max_init', but this looks good to me in
> either way:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 4162a7a57626..e1e26cb350f7 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -6,6 +6,7 @@
>>   #include <linux/acpi.h>
>>   #include <linux/atomic.h>
>>   #include <linux/arm_mpam.h>
>> +#include <linux/bitfield.h>
>>   #include <linux/cacheinfo.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -42,6 +43,15 @@ static atomic_t mpam_num_msc;
>>   static int mpam_cpuhp_state;
>>   static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>> +/*
>> + * The smallest common values for any CPU or MSC in the system.
>> + * Generating traffic outside this range will result in screaming interrupts.
>> + */
>> +u16 mpam_partid_max;
>> +u8 mpam_pmg_max;
>> +static bool partid_max_init, partid_max_published;
>> +static DEFINE_SPINLOCK(partid_max_lock);
>> +
> 
> If mpam_partid_max and mpam_pmg_max are initialized to USHRT_MAX and 255 here,
> the state related to partid_max_init can be removed...
> 
>>   /*
>>    * mpam is enabled once all devices have been probed from CPU online callbacks,
>>    * scheduled via this work_struct. If access to an MSC depends on a CPU that
>> @@ -142,6 +152,70 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>>   #define mpam_read_partsel_reg(msc, reg) _mpam_read_partsel_reg(msc, MPAMF_##reg)
>> +static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
>> +{
>> +    WARN_ON_ONCE(reg + sizeof(u32) >= msc->mapped_hwpage_sz);
>> +    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
>> +
>> +    writel_relaxed(val, msc->mapped_hwpage + reg);
>> +}
>> +
>> +static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
>> +{
>> +    lockdep_assert_held_once(&msc->part_sel_lock);
>> +    __mpam_write_reg(msc, reg, val);
>> +}
>> +
>> +#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
>> +
>> +static u64 mpam_msc_read_idr(struct mpam_msc *msc)
>> +{
>> +    u64 idr_high = 0, idr_low;
>> +
>> +    lockdep_assert_held(&msc->part_sel_lock);
>> +
>> +    idr_low = mpam_read_partsel_reg(msc, IDR);
>> +    if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
>> +        idr_high = mpam_read_partsel_reg(msc, IDR + 4);
>> +
>> +    return (idr_high << 32) | idr_low;
>> +}
>> +
>> +static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
>> +{
>> +    lockdep_assert_held(&msc->part_sel_lock);
>> +
>> +    mpam_write_partsel_reg(msc, PART_SEL, partsel);
>> +}
>> +
>> +static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
>> +{
>> +    u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
>> +              FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
>> +
>> +    __mpam_part_sel_raw(partsel, msc);
>> +}
>> +
>> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>> +{
>> +    guard(spinlock)(&partid_max_lock);
>> +    if (!partid_max_init) {
>> +        mpam_partid_max = partid_max;
>> +        mpam_pmg_max = pmg_max;
>> +        partid_max_init = true;
>> +    } else if (!partid_max_published) {
>> +        mpam_partid_max = min(mpam_partid_max, partid_max);
>> +        mpam_pmg_max = min(mpam_pmg_max, pmg_max);
>> +    } else {
>> +        /* New requestors can't lower the values */
>> +        if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
>> +            return -EBUSY;
>> +    }
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(mpam_register_requestor);
>> +
> 
> With mpam_partid_max and mpam_pmg_max initialized to USHRT_MAX and 255, this
> can be:
> 
>      if (!partid_max_published) {
>              ...
>          } else {
>              ...
>          }
> 
>>   static struct mpam_vmsc *
>>   mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc)
>>   {
>> @@ -451,9 +525,35 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>       return err;
>>   }
>> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
>> +                           u8 ris_idx)
>> +{
>> +    int err;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    if (!test_bit(ris_idx, &msc->ris_idxs)) {
>> +        err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
>> +                         0, 0);
>> +        if (err)
>> +            return ERR_PTR(err);
>> +    }
>> +
>> +    list_for_each_entry(ris, &msc->ris, msc_list) {
>> +        if (ris->ris_idx == ris_idx)
>> +            return ris;
>> +    }
>> +
>> +    return ERR_PTR(-ENOENT);
>> +}
>> +
>>   static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>   {
>>       u64 idr;
>> +    u16 partid_max;
>> +    u8 ris_idx, pmg_max;
>> +    struct mpam_msc_ris *ris;
>>       struct device *dev = &msc->pdev->dev;
>>       lockdep_assert_held(&msc->probe_lock);
>> @@ -464,6 +564,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>           return -EIO;
>>       }
>> +    /* Grab an IDR value to find out how many RIS there are */
>> +    mutex_lock(&msc->part_sel_lock);
>> +    idr = mpam_msc_read_idr(msc);
>> +    mutex_unlock(&msc->part_sel_lock);
>> +
>> +    msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>> +
>> +    /* Use these values so partid/pmg always starts with a valid value */
>> +    msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>> +    msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>> +
>> +    for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
>> +        mutex_lock(&msc->part_sel_lock);
>> +        __mpam_part_sel(ris_idx, 0, msc);
>> +        idr = mpam_msc_read_idr(msc);
>> +        mutex_unlock(&msc->part_sel_lock);
>> +
>> +        partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>> +        pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>> +        msc->partid_max = min(msc->partid_max, partid_max);
>> +        msc->pmg_max = min(msc->pmg_max, pmg_max);
>> +
>> +        mutex_lock(&mpam_list_lock);
>> +        ris = mpam_get_or_create_ris(msc, ris_idx);
>> +        mutex_unlock(&mpam_list_lock);
>> +        if (IS_ERR(ris))
>> +            return PTR_ERR(ris);
>> +    }
>> +
>> +    spin_lock(&partid_max_lock);
>> +    mpam_partid_max = min(mpam_partid_max, msc->partid_max);
>> +    mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
>> +    spin_unlock(&partid_max_lock);
>> +

mpam_register_requestor() could be used here to avoid the capacities
(maximal PARTIDs and PMGs) are unexpectedly lowered.


>>       msc->probed = true;
>>       return 0;
>> @@ -686,10 +820,20 @@ static struct platform_driver mpam_msc_driver = {
>>   static void mpam_enable_once(void)
>>   {
>> +    /*
>> +     * Once the cpuhp callbacks have been changed, mpam_partid_max can no
>> +     * longer change.
>> +     */
>> +    spin_lock(&partid_max_lock);
>> +    partid_max_published = true;
>> +    spin_unlock(&partid_max_lock);
>> +
>>       mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
>>                         "mpam:online");
>> -    pr_info("MPAM enabled\n");
>> +    /* Use printk() to avoid the pr_fmt adding the function name. */
>> +    printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
>> +           mpam_partid_max + 1, mpam_pmg_max + 1);
>>   }
>>   void mpam_disable(struct work_struct *ignored)
>> @@ -756,4 +900,6 @@ static int __init mpam_msc_driver_init(void)
>>       return platform_driver_register(&mpam_msc_driver);
>>   }
>> +
>> +/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
>>   subsys_initcall(mpam_msc_driver_init);
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 4e1538d29783..768a58a3ab27 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -49,6 +49,8 @@ struct mpam_msc {
>>        */
>>       struct mutex        probe_lock;
>>       bool            probed;
>> +    u16            partid_max;
>> +    u8            pmg_max;
>>       unsigned long        ris_idxs;
>>       u32            ris_max;
>> @@ -138,6 +140,10 @@ struct mpam_msc_ris {
>>   extern struct srcu_struct mpam_srcu;
>>   extern struct list_head mpam_classes;
>> +/* System wide partid/pmg values */
>> +extern u16 mpam_partid_max;
>> +extern u8 mpam_pmg_max;
>> +
>>   /* Scheduled work callback to enable mpam once all MSC have been probed */
>>   void mpam_enable(struct work_struct *work);
>>   void mpam_disable(struct work_struct *work);
>> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
>> index 5a3aab6bb1d4..c7246cfeb091 100644
>> --- a/include/linux/arm_mpam.h
>> +++ b/include/linux/arm_mpam.h
>> @@ -49,4 +49,18 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>   }
>>   #endif
>> +/**
>> + * mpam_register_requestor() - Register a requestor with the MPAM driver
>> + * @partid_max:        The maximum PARTID value the requestor can generate.
>> + * @pmg_max:        The maximum PMG value the requestor can generate.
>> + *
>> + * Registers a requestor with the MPAM driver to ensure the chosen system-wide
>> + * minimum PARTID and PMG values will allow the requestors features to be used.
>> + *
>> + * Returns an error if the registration is too late, and a larger PARTID/PMG
>> + * value has been advertised to user-space. In this case the requestor should
>> + * not use its MPAM features. Returns 0 on success.
>> + */
>> +int mpam_register_requestor(u16 partid_max, u8 pmg_max);
>> +
>>   #endif /* __LINUX_ARM_MPAM_H */

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-10 23:26     ` Gavin Shan
@ 2025-11-11  9:30       ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-11  9:30 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

On 11/10/25 23:26, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/9/25 10:43 AM, Gavin Shan wrote:
>> On 11/7/25 10:34 PM, Ben Horgan wrote:
>>> From: James Morse <james.morse@arm.com>
>>>
>>> CPUs can generate traffic with a range of PARTID and PMG values,
>>> but each MSC may also have its own maximum size for these fields.
>>> Before MPAM can be used, the driver needs to probe each RIS on
>>> each MSC, to find the system-wide smallest value that can be used.
>>> The limits from requestors (e.g. CPUs) also need taking into account.
>>>
>>> While doing this, RIS entries that firmware didn't describe are created
>>> under MPAM_CLASS_UNKNOWN.
>>>
>>> This adds the low level MSC write accessors.
>>>
>>> While we're here, implement the mpam_register_requestor() call
>>> for the arch code to register the CPU limits. Future callers of this
>>> will tell us about the SMMU and ITS.
>>>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
>>> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
>>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>>> Tested-by: Peter Newman <peternewman@google.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>> ---
>>> Changes since v3:
>>>  From Jonathan:
>>> Stray comma in printk
>>> Unnecessary braces
>>> ---
>>>   drivers/resctrl/mpam_devices.c  | 148 +++++++++++++++++++++++++++++++-
>>>   drivers/resctrl/mpam_internal.h |   6 ++
>>>   include/linux/arm_mpam.h        |  14 +++
>>>   3 files changed, 167 insertions(+), 1 deletion(-)
[...]
>>>   static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>>   {
>>>       u64 idr;
>>> +    u16 partid_max;
>>> +    u8 ris_idx, pmg_max;
>>> +    struct mpam_msc_ris *ris;
>>>       struct device *dev = &msc->pdev->dev;
>>>       lockdep_assert_held(&msc->probe_lock);
>>> @@ -464,6 +564,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>>           return -EIO;
>>>       }
>>> +    /* Grab an IDR value to find out how many RIS there are */
>>> +    mutex_lock(&msc->part_sel_lock);
>>> +    idr = mpam_msc_read_idr(msc);
>>> +    mutex_unlock(&msc->part_sel_lock);
>>> +
>>> +    msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>>> +
>>> +    /* Use these values so partid/pmg always starts with a valid
>>> value */
>>> +    msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>>> +    msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>>> +
>>> +    for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
>>> +        mutex_lock(&msc->part_sel_lock);
>>> +        __mpam_part_sel(ris_idx, 0, msc);
>>> +        idr = mpam_msc_read_idr(msc);
>>> +        mutex_unlock(&msc->part_sel_lock);
>>> +
>>> +        partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>>> +        pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>>> +        msc->partid_max = min(msc->partid_max, partid_max);
>>> +        msc->pmg_max = min(msc->pmg_max, pmg_max);
>>> +
>>> +        mutex_lock(&mpam_list_lock);
>>> +        ris = mpam_get_or_create_ris(msc, ris_idx);
>>> +        mutex_unlock(&mpam_list_lock);
>>> +        if (IS_ERR(ris))
>>> +            return PTR_ERR(ris);
>>> +    }
>>> +
>>> +    spin_lock(&partid_max_lock);
>>> +    mpam_partid_max = min(mpam_partid_max, msc->partid_max);
>>> +    mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
>>> +    spin_unlock(&partid_max_lock);
>>> +
> 
> mpam_register_requestor() could be used here to avoid the capacities
> (maximal PARTIDs and PMGs) are unexpectedly lowered.
> 

I agree that this is somewhat surprising that without a requestor the
driver supports 1 PARTID and 1 PMG, but it is intentional behaviour. The
driver is only intended to be fully functional when a requestor
(external to this base driver) registers itself and I don't want to add
a dual meaning to this registration. This will be more obvious once the
rest of the mpam support is added.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-07 12:34 ` [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values Ben Horgan
  2025-11-09  0:43   ` Gavin Shan
@ 2025-11-12  7:57   ` Shaopeng Tan (Fujitsu)
  2025-11-13  3:50   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  7:57 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> CPUs can generate traffic with a range of PARTID and PMG values, but each
> MSC may also have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on each MSC, to
> find the system-wide smallest value that can be used.
> The limits from requestors (e.g. CPUs) also need taking into account.
> 
> While doing this, RIS entries that firmware didn't describe are created under
> MPAM_CLASS_UNKNOWN.
> 
> This adds the low level MSC write accessors.
> 
> While we're here, implement the mpam_register_requestor() call for the arch
> code to register the CPU limits. Future callers of this will tell us about the
> SMMU and ITS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---


Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-07 12:34 ` [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values Ben Horgan
  2025-11-09  0:43   ` Gavin Shan
  2025-11-12  7:57   ` Shaopeng Tan (Fujitsu)
@ 2025-11-13  3:50   ` Fenghua Yu
  2025-11-13 16:43     ` Ben Horgan
  2 siblings, 1 reply; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13  3:50 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi, Ben and James,

On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
[SNIP]

> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> +						   u8 ris_idx)
> +{
> +	int err;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (!test_bit(ris_idx, &msc->ris_idxs)) {
> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> +					     0, 0);
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +
> +	list_for_each_entry(ris, &msc->ris, msc_list) {
> +		if (ris->ris_idx == ris_idx)
> +			return ris;
> +	}
> +
> +	return ERR_PTR(-ENOENT);
> +}
> +
>   static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   {
>   	u64 idr;
> +	u16 partid_max;
> +	u8 ris_idx, pmg_max;
> +	struct mpam_msc_ris *ris;
>   	struct device *dev = &msc->pdev->dev;
>   
>   	lockdep_assert_held(&msc->probe_lock);
> @@ -464,6 +564,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   		return -EIO;
>   	}
>   
> +	/* Grab an IDR value to find out how many RIS there are */
> +	mutex_lock(&msc->part_sel_lock);
> +	idr = mpam_msc_read_idr(msc);
> +	mutex_unlock(&msc->part_sel_lock);
> +
> +	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
> +
> +	/* Use these values so partid/pmg always starts with a valid value */
> +	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +
> +	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		idr = mpam_msc_read_idr(msc);
> +		mutex_unlock(&msc->part_sel_lock);
> +
> +		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +		msc->partid_max = min(msc->partid_max, partid_max);
> +		msc->pmg_max = min(msc->pmg_max, pmg_max);
> +
> +		mutex_lock(&mpam_list_lock);
> +		ris = mpam_get_or_create_ris(msc, ris_idx);
> +		mutex_unlock(&mpam_list_lock);
> +		if (IS_ERR(ris))
> +			return PTR_ERR(ris);

It's better to destroy ris's that were previously created before this 
failed ris? Otherwise, there is a memory leak for those allocated ris's?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values
  2025-11-13  3:50   ` Fenghua Yu
@ 2025-11-13 16:43     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-13 16:43 UTC (permalink / raw)
  To: Fenghua Yu, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Fenghua,

On 11/13/25 03:50, Fenghua Yu wrote:
> Hi, Ben and James,
> 
> On 11/7/25 04:34, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
> [SNIP]
> 
>> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
>> +                           u8 ris_idx)
>> +{
>> +    int err;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    if (!test_bit(ris_idx, &msc->ris_idxs)) {
>> +        err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
>> +                         0, 0);
>> +        if (err)
>> +            return ERR_PTR(err);
>> +    }
>> +
>> +    list_for_each_entry(ris, &msc->ris, msc_list) {
>> +        if (ris->ris_idx == ris_idx)
>> +            return ris;
>> +    }
>> +
>> +    return ERR_PTR(-ENOENT);
>> +}
>> +
>>   static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>   {
>>       u64 idr;
>> +    u16 partid_max;
>> +    u8 ris_idx, pmg_max;
>> +    struct mpam_msc_ris *ris;
>>       struct device *dev = &msc->pdev->dev;
>>         lockdep_assert_held(&msc->probe_lock);
>> @@ -464,6 +564,40 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>           return -EIO;
>>       }
>>   +    /* Grab an IDR value to find out how many RIS there are */
>> +    mutex_lock(&msc->part_sel_lock);
>> +    idr = mpam_msc_read_idr(msc);
>> +    mutex_unlock(&msc->part_sel_lock);
>> +
>> +    msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>> +
>> +    /* Use these values so partid/pmg always starts with a valid
>> value */
>> +    msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>> +    msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>> +
>> +    for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
>> +        mutex_lock(&msc->part_sel_lock);
>> +        __mpam_part_sel(ris_idx, 0, msc);
>> +        idr = mpam_msc_read_idr(msc);
>> +        mutex_unlock(&msc->part_sel_lock);
>> +
>> +        partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>> +        pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>> +        msc->partid_max = min(msc->partid_max, partid_max);
>> +        msc->pmg_max = min(msc->pmg_max, pmg_max);
>> +
>> +        mutex_lock(&mpam_list_lock);
>> +        ris = mpam_get_or_create_ris(msc, ris_idx);
>> +        mutex_unlock(&mpam_list_lock);
>> +        if (IS_ERR(ris))
>> +            return PTR_ERR(ris);
> 
> It's better to destroy ris's that were previously created before this
> failed ris? Otherwise, there is a memory leak for those allocated ris's?

This should be handled by mpam_disable() which is run on probe failure.

> 
> Thanks.
> 
> -Fenghua


Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (13 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09  0:49   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports Ben Horgan
                   ` (24 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

The MSC MON_SEL register needs to be accessed from hardirq for the overflow
interrupt, and when taking an IPI to access these registers on platforms
where MSC are not accessible from every CPU. This makes an irqsave
spinlock the obvious lock to protect these registers. On systems with SCMI
or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
The SCMI or PCC platforms can't support an overflow interrupt, and
can't access the registers from hardirq context.

Clearly these two can't exist for one MSC at the same time.

Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
only support 'real' MMIO platforms.

In the future this lock will be split in two allowing SCMI/PCC platforms
to take a mutex. Because there are contexts where the SCMI/PCC platforms
can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
this now, so that all the error handling on these paths is present. This
allows the relevant paths to fail if they are needed on a platform where
this isn't possible, instead of having to make explicit checks of the
interface type.

Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
use devm_mutex_init()
include tiying
stray comma (Jonathan)
---
 drivers/resctrl/mpam_devices.c  |  2 ++
 drivers/resctrl/mpam_internal.h | 39 +++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e1e26cb350f7..3556299a8605 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -19,6 +19,7 @@
 #include <linux/platform_device.h>
 #include <linux/printk.h>
 #include <linux/srcu.h>
+#include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
 
@@ -746,6 +747,7 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 	err = devm_mutex_init(dev, &msc->part_sel_lock);
 	if (err)
 		return ERR_PTR(err);
+	mpam_mon_sel_lock_init(msc);
 	msc->id = pdev->id;
 	msc->pdev = pdev;
 	INIT_LIST_HEAD_RCU(&msc->all_msc_list);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 768a58a3ab27..b62ee55e1ed5 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -10,6 +10,7 @@
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/srcu.h>
+#include <linux/spinlock.h>
 #include <linux/types.h>
 
 #define MPAM_MSC_MAX_NUM_RIS	16
@@ -65,12 +66,50 @@ struct mpam_msc {
 	 */
 	struct mutex		part_sel_lock;
 
+	/*
+	 * mon_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+	 * Access to mon_sel is needed from both process and interrupt contexts,
+	 * but is complicated by firmware-backed platforms that can't make any
+	 * access unless they can sleep.
+	 * Always use the mpam_mon_sel_lock() helpers.
+	 * Accesses to mon_sel need to be able to fail if they occur in the wrong
+	 * context.
+	 * If needed, take msc->probe_lock first.
+	 */
+	raw_spinlock_t		_mon_sel_lock;
+	unsigned long		_mon_sel_flags;
+
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
 
 	struct mpam_garbage	garbage;
 };
 
+/* Returning false here means accesses to mon_sel must fail and report an error. */
+static inline bool __must_check mpam_mon_sel_lock(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(msc->iface != MPAM_IFACE_MMIO);
+
+	raw_spin_lock_irqsave(&msc->_mon_sel_lock, msc->_mon_sel_flags);
+	return true;
+}
+
+static inline void mpam_mon_sel_unlock(struct mpam_msc *msc)
+{
+	raw_spin_unlock_irqrestore(&msc->_mon_sel_lock, msc->_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+	lockdep_assert_held_once(&msc->_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
+{
+	raw_spin_lock_init(&msc->_mon_sel_lock);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-11-07 12:34 ` [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers Ben Horgan
@ 2025-11-09  0:49   ` Gavin Shan
  2025-11-12  8:03   ` Shaopeng Tan (Fujitsu)
  2025-11-13  3:52   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09  0:49 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms
> where MSC are not accessible from every CPU. This makes an irqsave
> spinlock the obvious lock to protect these registers. On systems with SCMI
> or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
> The SCMI or PCC platforms can't support an overflow interrupt, and
> can't access the registers from hardirq context.
> 
> Clearly these two can't exist for one MSC at the same time.
> 
> Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
> only support 'real' MMIO platforms.
> 
> In the future this lock will be split in two allowing SCMI/PCC platforms
> to take a mutex. Because there are contexts where the SCMI/PCC platforms
> can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
> this now, so that all the error handling on these paths is present. This
> allows the relevant paths to fail if they are needed on a platform where
> this isn't possible, instead of having to make explicit checks of the
> interface type.
> 
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> use devm_mutex_init()
> include tiying
> stray comma (Jonathan)
> ---
>   drivers/resctrl/mpam_devices.c  |  2 ++
>   drivers/resctrl/mpam_internal.h | 39 +++++++++++++++++++++++++++++++++
>   2 files changed, 41 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-11-07 12:34 ` [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers Ben Horgan
  2025-11-09  0:49   ` Gavin Shan
@ 2025-11-12  8:03   ` Shaopeng Tan (Fujitsu)
  2025-11-13  3:52   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  8:03 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com


> From: James Morse <james.morse@arm.com>
> 
> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms where
> MSC are not accessible from every CPU. This makes an irqsave spinlock the
> obvious lock to protect these registers. On systems with SCMI or PCC
> mailboxes it must be able to sleep, meaning a mutex must be used.
> The SCMI or PCC platforms can't support an overflow interrupt, and can't
> access the registers from hardirq context.
> 
> Clearly these two can't exist for one MSC at the same time.
> 
> Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and only
> support 'real' MMIO platforms.
> 
> In the future this lock will be split in two allowing SCMI/PCC platforms to take
> a mutex. Because there are contexts where the SCMI/PCC platforms can't
> make an access, mpam_mon_sel_lock() needs to be able to fail. Do this now,
> so that all the error handling on these paths is present. This allows the relevant
> paths to fail if they are needed on a platform where this isn't possible, instead
> of having to make explicit checks of the interface type.
> 
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-11-07 12:34 ` [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers Ben Horgan
  2025-11-09  0:49   ` Gavin Shan
  2025-11-12  8:03   ` Shaopeng Tan (Fujitsu)
@ 2025-11-13  3:52   ` Fenghua Yu
  2 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13  3:52 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan



On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> The MSC MON_SEL register needs to be accessed from hardirq for the overflow
> interrupt, and when taking an IPI to access these registers on platforms
> where MSC are not accessible from every CPU. This makes an irqsave
> spinlock the obvious lock to protect these registers. On systems with SCMI
> or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
> The SCMI or PCC platforms can't support an overflow interrupt, and
> can't access the registers from hardirq context.
> 
> Clearly these two can't exist for one MSC at the same time.
> 
> Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
> only support 'real' MMIO platforms.
> 
> In the future this lock will be split in two allowing SCMI/PCC platforms
> to take a mutex. Because there are contexts where the SCMI/PCC platforms
> can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
> this now, so that all the error handling on these paths is present. This
> allows the relevant paths to fail if they are needed on a platform where
> this isn't possible, instead of having to make explicit checks of the
> interface type.
> 
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (14 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 21:55   ` Gavin Shan
  2025-11-12  8:17   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class Ben Horgan
                   ` (23 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Dave Martin, Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

Expand the probing support with the control and monitor types
we can use with resctrl.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Drop the =0 in the enum (Jonathan)
---
 drivers/resctrl/mpam_devices.c  | 149 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  33 +++++++
 2 files changed, 182 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3556299a8605..4dd93680cc24 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -169,6 +169,22 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
 
 #define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
 
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	mpam_mon_sel_lock_held(msc);
+	return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	mpam_mon_sel_lock_held(msc);
+	__mpam_write_reg(msc, reg, val);
+}
+
+#define mpam_write_monsel_reg(msc, reg, val)   _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
 static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 {
 	u64 idr_high = 0, idr_low;
@@ -549,6 +565,133 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
 	return ERR_PTR(-ENOENT);
 }
 
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris *ris, u32 mon_reg)
+{
+	u32 now;
+	u64 mon_sel;
+	bool can_set, can_clear;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+		return false;
+
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	_mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+	_mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_set = now & MSMON___NRDY;
+
+	_mpam_write_monsel_reg(msc, mon_reg, 0);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_clear = !(now & MSMON___NRDY);
+	mpam_mon_sel_unlock(msc);
+
+	return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg)			\
+	_mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+	int err;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct device *dev = &msc->pdev->dev;
+	struct mpam_props *props = &ris->props;
+
+	lockdep_assert_held(&msc->probe_lock);
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	/* Cache Portion partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+		props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+		if (props->cpbm_wd)
+			mpam_set_feature(mpam_feat_cpor_part, props);
+	}
+
+	/* Memory bandwidth partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+		u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+		/* portion bitmap resolution */
+		props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+		if (props->mbw_pbm_bits &&
+		    FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_part, props);
+
+		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_max, props);
+	}
+
+	/* Performance Monitoring */
+	if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+		u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+		/*
+		 * If the firmware max-nrdy-us property is missing, the
+		 * CSU counters can't be used. Should we wait forever?
+		 */
+		err = device_property_read_u32(&msc->pdev->dev,
+					       "arm,not-ready-us",
+					       &msc->nrdy_usec);
+
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+			u32 csumonidr;
+
+			csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+			props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+			if (props->num_csu_mon) {
+				bool hw_managed;
+
+				mpam_set_feature(mpam_feat_msmon_csu, props);
+
+				/* Is NRDY hardware managed? */
+				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+				if (hw_managed)
+					mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+			}
+
+			/*
+			 * Accept the missing firmware property if NRDY appears
+			 * un-implemented.
+			 */
+			if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
+		}
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+			bool hw_managed;
+			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
+			if (props->num_mbwu_mon)
+				mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+			/* Is NRDY hardware managed? */
+			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+			if (hw_managed)
+				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+			/*
+			 * Don't warn about any missing firmware property for
+			 * MBWU NRDY - it doesn't make any sense!
+			 */
+		}
+	}
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
@@ -592,6 +735,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&mpam_list_lock);
 		if (IS_ERR(ris))
 			return PTR_ERR(ris);
+		ris->idr = idr;
+
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		mpam_ris_hw_probe(ris);
+		mutex_unlock(&msc->part_sel_lock);
 	}
 
 	spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b62ee55e1ed5..09c27000234d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,12 +5,14 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/bitmap.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/srcu.h>
 #include <linux/spinlock.h>
+#include <linux/srcu.h>
 #include <linux/types.h>
 
 #define MPAM_MSC_MAX_NUM_RIS	16
@@ -110,6 +112,33 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
 	raw_spin_lock_init(&msc->_mon_sel_lock);
 }
 
+/* Bits for mpam features bitmaps */
+enum mpam_device_features {
+	mpam_feat_cpor_part,
+	mpam_feat_mbw_part,
+	mpam_feat_mbw_min,
+	mpam_feat_mbw_max,
+	mpam_feat_msmon,
+	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_hw_nrdy,
+	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_hw_nrdy,
+	MPAM_FEATURE_LAST
+};
+
+struct mpam_props {
+	DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
+
+	u16			cpbm_wd;
+	u16			mbw_pbm_bits;
+	u16			bwa_wd;
+	u16			num_csu_mon;
+	u16			num_mbwu_mon;
+};
+
+#define mpam_has_feature(_feat, x)	test_bit(_feat, (x)->features)
+#define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -149,6 +178,8 @@ struct mpam_vmsc {
 	/* mpam_msc_ris in this vmsc */
 	struct list_head	ris;
 
+	struct mpam_props	props;
+
 	/* All RIS in this vMSC are members of this MSC */
 	struct mpam_msc		*msc;
 
@@ -160,6 +191,8 @@ struct mpam_vmsc {
 
 struct mpam_msc_ris {
 	u8			ris_idx;
+	u64			idr;
+	struct mpam_props	props;
 
 	cpumask_t		affinity;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports
  2025-11-07 12:34 ` [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports Ben Horgan
@ 2025-11-09 21:55   ` Gavin Shan
  2025-11-12  8:17   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 21:55 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Expand the probing support with the control and monitor types
> we can use with resctrl.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Drop the =0 in the enum (Jonathan)
> ---
>   drivers/resctrl/mpam_devices.c  | 149 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  33 +++++++
>   2 files changed, 182 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports
  2025-11-07 12:34 ` [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports Ben Horgan
  2025-11-09 21:55   ` Gavin Shan
@ 2025-11-12  8:17   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  8:17 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com,
	Dave Martin


> From: James Morse <james.morse@arm.com>
> 
> Expand the probing support with the control and monitor types we can use with
> resctrl.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (15 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 21:59   ` Gavin Shan
  2025-11-12  8:24   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 18/33] arm_mpam: Reset MSC controls from cpuhp callbacks Ben Horgan
                   ` (22 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.

Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.

If bitmap properties are mismatched within a component we
cannot support the mismatched feature.

Care has to be taken as vMSC may hold mismatched RIS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 214 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   3 +
 2 files changed, 217 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 4dd93680cc24..7c02b918fa0f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -969,6 +969,216 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_mbw_min, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_max, props))
+		return true;
+	return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
+	helper(parent) &&						\
+	((helper(child) && (parent)->field != (child)->field) ||	\
+	 (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias)		     \
+	mpam_has_feature((feat), (parent)) &&				     \
+	((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+	 (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias)			\
+	(alias) && !mpam_has_feature((feat), (parent)) &&		\
+	mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+			     struct mpam_props *child, bool alias)
+{
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+		parent->cpbm_wd = child->cpbm_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+				   cpbm_wd, alias)) {
+		pr_debug("cleared cpor_part\n");
+		mpam_clear_feature(mpam_feat_cpor_part, parent);
+		parent->cpbm_wd = 0;
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+		parent->mbw_pbm_bits = child->mbw_pbm_bits;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+				   mbw_pbm_bits, alias)) {
+		pr_debug("cleared mbw_part\n");
+		mpam_clear_feature(mpam_feat_mbw_part, parent);
+		parent->mbw_pbm_bits = 0;
+	}
+
+	/* bwa_wd is a count of bits, fewer bits means less precision */
+	if (alias && !mpam_has_bwa_wd_feature(parent) &&
+	    mpam_has_bwa_wd_feature(child)) {
+		parent->bwa_wd = child->bwa_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+				     bwa_wd, alias)) {
+		pr_debug("took the min bwa_wd\n");
+		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+	}
+
+	/* For num properties, take the minimum */
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+		parent->num_csu_mon = child->num_csu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+				   num_csu_mon, alias)) {
+		pr_debug("took the min num_csu_mon\n");
+		parent->num_csu_mon = min(parent->num_csu_mon,
+					  child->num_csu_mon);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+		parent->num_mbwu_mon = child->num_mbwu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+				   num_mbwu_mon, alias)) {
+		pr_debug("took the min num_mbwu_mon\n");
+		parent->num_mbwu_mon = min(parent->num_mbwu_mon,
+					   child->num_mbwu_mon);
+	}
+
+	if (alias) {
+		/* Merge features for aliased resources */
+		bitmap_or(parent->features, parent->features, child->features, MPAM_FEATURE_LAST);
+	} else {
+		/* Clear missing features for non aliasing */
+		bitmap_and(parent->features, parent->features, child->features, MPAM_FEATURE_LAST);
+	}
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+	struct mpam_props *cprops = &class->props;
+	struct mpam_props *vprops = &vmsc->props;
+	struct device *dev = &vmsc->msc->pdev->dev;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+	dev_dbg(dev, "Merging features for class:0x%lx &= vmsc:0x%lx\n",
+		(long)cprops->features, (long)vprops->features);
+
+	/* Take the safe value for any common features */
+	__props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_props *vprops = &vmsc->props;
+	struct device *dev = &vmsc->msc->pdev->dev;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+	dev_dbg(dev, "Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+		(long)vprops->features, (long)rprops->features);
+
+	/*
+	 * Merge mismatched features - Copy any features that aren't common,
+	 * but take the safe value for any common features.
+	 */
+	__props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_component *comp;
+
+	comp = list_first_entry(&class->components,
+				struct mpam_component, class_list);
+	vmsc = list_first_entry(&comp->vmsc,
+				struct mpam_vmsc, comp_list);
+
+	class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			__vmsc_props_mismatch(vmsc, ris);
+			class->nrdy_usec = max(class->nrdy_usec,
+					       vmsc->msc->nrdy_usec);
+		}
+	}
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+		__class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together by mpam_enable_merge_vmsc_features()
+ * as the first step so that mpam_enable_init_class_features() can initialise
+ * the class with a representative set of features.
+ * Next the mpam_enable_merge_class_features() bitwise-and's all the vmsc
+ * features to form the class features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, all_classes_list, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_vmsc_features(comp);
+
+		mpam_enable_init_class_features(class);
+
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_class_features(comp);
+	}
+}
+
 static void mpam_enable_once(void)
 {
 	/*
@@ -979,6 +1189,10 @@ static void mpam_enable_once(void)
 	partid_max_published = true;
 	spin_unlock(&partid_max_lock);
 
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+	mutex_unlock(&mpam_list_lock);
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 09c27000234d..1a95b2897ac1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -138,6 +138,7 @@ struct mpam_props {
 
 #define mpam_has_feature(_feat, x)	test_bit(_feat, (x)->features)
 #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
+#define mpam_clear_feature(_feat, x)	clear_bit(_feat, (x)->features)
 
 struct mpam_class {
 	/* mpam_components in this class */
@@ -145,6 +146,8 @@ struct mpam_class {
 
 	cpumask_t		affinity;
 
+	struct mpam_props	props;
+	u32			nrdy_usec;
 	u8			level;
 	enum mpam_class_types	type;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-11-07 12:34 ` [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class Ben Horgan
@ 2025-11-09 21:59   ` Gavin Shan
  2025-11-12  8:24   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 21:59 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> To make a decision about whether to expose an mpam class as
> a resctrl resource we need to know its overall supported
> features and properties.
> 
> Once we've probed all the resources, we can walk the tree
> and produce overall values by merging the bitmaps. This
> eliminates features that are only supported by some MSC
> that make up a component or class.
> 
> If bitmap properties are mismatched within a component we
> cannot support the mismatched feature.
> 
> Care has to be taken as vMSC may hold mismatched RIS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c  | 214 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |   3 +
>   2 files changed, 217 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>



^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-11-07 12:34 ` [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class Ben Horgan
  2025-11-09 21:59   ` Gavin Shan
@ 2025-11-12  8:24   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  8:24 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> To make a decision about whether to expose an mpam class as a resctrl
> resource we need to know its overall supported features and properties.
> 
> Once we've probed all the resources, we can walk the tree and produce overall
> values by merging the bitmaps. This eliminates features that are only
> supported by some MSC that make up a component or class.
> 
> If bitmap properties are mismatched within a component we cannot support
> the mismatched feature.
> 
> Care has to be taken as vMSC may hold mismatched RIS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 18/33] arm_mpam: Reset MSC controls from cpuhp callbacks
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (16 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 22:11   ` Gavin Shan
  2025-11-07 12:34 ` [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU Ben Horgan
                   ` (21 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Rohit Mathew, Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtied. e.g. Kexec.

Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.

MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.

If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.

To reset, write the maximum values for all discovered controls.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Add tags - thanks!
---
 drivers/resctrl/mpam_devices.c  | 109 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   3 +
 2 files changed, 112 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 7c02b918fa0f..6a01247317b7 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/bitfield.h>
+#include <linux/bitmap.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -753,8 +754,104 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+	u32 num_words, msb;
+	u32 bm = ~0;
+	int i;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	if (wd == 0)
+		return;
+
+	/*
+	 * Write all ~0 to all but the last 32bit-word, which may
+	 * have fewer bits...
+	 */
+	num_words = DIV_ROUND_UP(wd, 32);
+	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+		__mpam_write_reg(msc, reg, bm);
+
+	/*
+	 * ....and then the last (maybe) partial 32bit word. When wd is a
+	 * multiple of 32, msb should be 31 to write a full 32bit word.
+	 */
+	msb = (wd - 1) % 32;
+	bm = GENMASK(msb, 0);
+	__mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *rprops = &ris->props;
+
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+
+	mutex_lock(&msc->part_sel_lock);
+	__mpam_part_sel(ris->ris_idx, partid, msc);
+
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+		mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+
+	mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+	u16 partid, partid_max;
+
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+
+	if (ris->in_reset_state)
+		return;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid <= partid_max; partid++)
+		mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+	struct mpam_msc_ris *ris;
+
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+		mpam_reset_ris(ris);
+
+		/*
+		 * Set in_reset_state when coming online. The reset state
+		 * for non-zero partid may be lost while the CPUs are offline.
+		 */
+		ris->in_reset_state = online;
+	}
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
+	struct mpam_msc *msc;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_fetch_inc(&msc->online_refs) == 0)
+			mpam_reset_msc(msc, true);
+	}
+
 	return 0;
 }
 
@@ -793,6 +890,18 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 
 static int mpam_cpu_offline(unsigned int cpu)
 {
+	struct mpam_msc *msc;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_dec_and_test(&msc->online_refs))
+			mpam_reset_msc(msc, false);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1a95b2897ac1..1ad62f13bfb3 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/atomic.h>
 #include <linux/bitmap.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
@@ -45,6 +46,7 @@ struct mpam_msc {
 	enum mpam_msc_iface	iface;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	atomic_t		online_refs;
 
 	/*
 	 * probe_lock is only taken during discovery. After discovery these
@@ -196,6 +198,7 @@ struct mpam_msc_ris {
 	u8			ris_idx;
 	u64			idr;
 	struct mpam_props	props;
+	bool			in_reset_state;
 
 	cpumask_t		affinity;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 18/33] arm_mpam: Reset MSC controls from cpuhp callbacks
  2025-11-07 12:34 ` [PATCH 18/33] arm_mpam: Reset MSC controls from cpuhp callbacks Ben Horgan
@ 2025-11-09 22:11   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 22:11 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> When a CPU comes online, it may bring a newly accessible MSC with
> it. Only the default partid has its value reset by hardware, and
> even then the MSC might not have been reset since its config was
> previously dirtied. e.g. Kexec.
> 
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
> 
> MSC are also reset when CPUs are taken offline to cover cases where
> firmware doesn't reset the MSC over reboot using UEFI, or kexec
> where there is no firmware involvement.
> 
> If the configuration for a RIS has not been touched since it was
> brought online, it does not need resetting again.
> 
> To reset, write the maximum values for all discovered controls.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Add tags - thanks!
> ---
>   drivers/resctrl/mpam_devices.c  | 109 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |   3 +
>   2 files changed, 112 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>




^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (17 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 18/33] arm_mpam: Reset MSC controls from cpuhp callbacks Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 22:13   ` Gavin Shan
  2025-11-14  2:47   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time Ben Horgan
                   ` (20 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.

Add a helper that schedules the provided function if necessary.

Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 6a01247317b7..bcdf4fbc3941 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -807,20 +807,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 	mutex_unlock(&msc->part_sel_lock);
 }
 
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
+	struct mpam_msc_ris *ris = arg;
 
 	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
 
 	if (ris->in_reset_state)
-		return;
+		return 0;
 
 	spin_lock(&partid_max_lock);
 	partid_max = mpam_partid_max;
 	spin_unlock(&partid_max_lock);
 	for (partid = 0; partid <= partid_max; partid++)
 		mpam_reset_ris_partid(ris, partid);
+
+	return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (cpumask_test_cpu(cpu, &msc->accessibility))
+		return cpu;
+
+	return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+	lockdep_assert_irqs_enabled();
+	lockdep_assert_cpus_held();
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+
+	return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
 }
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -828,7 +859,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	struct mpam_msc_ris *ris;
 
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
-		mpam_reset_ris(ris);
+		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
 		/*
 		 * Set in_reset_state when coming online. The reset state
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-11-07 12:34 ` [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU Ben Horgan
@ 2025-11-09 22:13   ` Gavin Shan
  2025-11-14  2:47   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 22:13 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Resetting RIS entries from the cpuhp callback is easy as the
> callback occurs on the correct CPU. This won't be true for any other
> caller that wants to reset or configure an MSC.
> 
> Add a helper that schedules the provided function if necessary.
> 
> Callers should take the cpuhp lock to prevent the cpuhp callbacks from
> changing the MSC state.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
>   1 file changed, 34 insertions(+), 3 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-11-07 12:34 ` [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU Ben Horgan
  2025-11-09 22:13   ` Gavin Shan
@ 2025-11-14  2:47   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-14  2:47 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> Resetting RIS entries from the cpuhp callback is easy as the callback occurs on
> the correct CPU. This won't be true for any other caller that wants to reset or
> configure an MSC.
> 
> Add a helper that schedules the provided function if necessary.
> 
> Callers should take the cpuhp lock to prevent the cpuhp callbacks from
> changing the MSC state.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (18 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 22:16   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 21/33] arm_mpam: Register and enable IRQs Ben Horgan
                   ` (19 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.

Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c | 57 ++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bcdf4fbc3941..a8efd3e02c62 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -809,15 +809,13 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
- * pre-emptible.
+ * pre-emptible. Caller must hold mpam_srcu.
  */
 static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
 
-	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
-
 	if (ris->in_reset_state)
 		return 0;
 
@@ -1341,8 +1339,55 @@ static void mpam_enable_once(void)
 	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		struct mpam_msc *msc = vmsc->msc;
+		struct mpam_msc_ris *ris;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			if (!ris->in_reset_state)
+				mpam_touch_msc(msc, mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+		}
+	}
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(comp, &class->components, class_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_component_locked(comp);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+	cpus_read_lock();
+	mpam_reset_class_locked(class);
+	cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
 void mpam_disable(struct work_struct *ignored)
 {
+	int idx;
+	struct mpam_class *class;
 	struct mpam_msc *msc, *tmp;
 
 	mutex_lock(&mpam_cpuhp_state_lock);
@@ -1352,6 +1397,12 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_class(class);
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	mutex_lock(&mpam_list_lock);
 	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, all_msc_list)
 		mpam_msc_destroy(msc);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-11-07 12:34 ` [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time Ben Horgan
@ 2025-11-09 22:16   ` Gavin Shan
  2025-11-13 20:24   ` Fenghua Yu
  2025-11-14  2:52   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 22:16 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c | 57 ++++++++++++++++++++++++++++++++--
>   1 file changed, 54 insertions(+), 3 deletions(-)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-11-07 12:34 ` [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time Ben Horgan
  2025-11-09 22:16   ` Gavin Shan
@ 2025-11-13 20:24   ` Fenghua Yu
  2025-11-14  2:52   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-13 20:24 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan



On 11/7/25 04:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-11-07 12:34 ` [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time Ben Horgan
  2025-11-09 22:16   ` Gavin Shan
  2025-11-13 20:24   ` Fenghua Yu
@ 2025-11-14  2:52   ` Shaopeng Tan (Fujitsu)
  2 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-14  2:52 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> cpuhp callbacks aren't the only time the MSC configuration may need to be
> reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has misprogrammed an
> MSC. The safest thing to do is reset all the MSCs and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(), which
> can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 21/33] arm_mpam: Register and enable IRQs
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (19 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 22:18   ` Gavin Shan
  2025-11-07 12:34 ` [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled Ben Horgan
                   ` (18 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.

Only the irq handler accesses the MPAMF_ESR register, so no locking is
needed. The work to disable MPAM after an error needs to happen at process
context as it takes mutex. It also unregisters the interrupts, meaning
it can't be done from the threaded part of a threaded interrupt.
Instead, mpam_disable() gets scheduled.

Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.

Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.

CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Add tag, thanks!
Whitespace changes
Use devm_mutex_init()
---
 drivers/resctrl/mpam_devices.c  | 280 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  13 ++
 2 files changed, 293 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a8efd3e02c62..a543a363443c 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
 #include <linux/list.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
@@ -199,6 +202,35 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 	return (idr_high << 32) | idr_low;
 }
 
+static void mpam_msc_clear_esr(struct mpam_msc *msc)
+{
+	u64 esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+
+	if (!esr_low)
+		return;
+
+	/*
+	 * Clearing the high/low bits of MPAMF_ESR can not be atomic.
+	 * Clear the top half first, so that the pending error bits in the
+	 * lower half prevent hardware from updating either half of the
+	 * register.
+	 */
+	if (msc->has_extd_esr)
+		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+	__mpam_write_reg(msc, MPAMF_ESR, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+	u64 esr_high = 0, esr_low;
+
+	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+	if (msc->has_extd_esr)
+		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+	return (esr_high << 32) | esr_low;
+}
+
 static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
 {
 	lockdep_assert_held(&msc->part_sel_lock);
@@ -730,6 +762,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
 		msc->partid_max = min(msc->partid_max, partid_max);
 		msc->pmg_max = min(msc->pmg_max, pmg_max);
+		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
 
 		mutex_lock(&mpam_list_lock);
 		ris = mpam_get_or_create_ris(msc, ris_idx);
@@ -744,6 +777,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&msc->part_sel_lock);
 	}
 
+	/* Clear any stale errors */
+	mpam_msc_clear_esr(msc);
+
 	spin_lock(&partid_max_lock);
 	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
 	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -867,6 +903,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	}
 }
 
+static void _enable_percpu_irq(void *_irq)
+{
+	int *irq = _irq;
+
+	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
 	struct mpam_msc *msc;
@@ -877,6 +920,9 @@ static int mpam_cpu_online(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			_enable_percpu_irq(&msc->reenable_error_ppi);
+
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
 			mpam_reset_msc(msc, true);
 	}
@@ -927,6 +973,9 @@ static int mpam_cpu_offline(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			disable_percpu_irq(msc->reenable_error_ppi);
+
 		if (atomic_dec_and_test(&msc->online_refs))
 			mpam_reset_msc(msc, false);
 	}
@@ -953,6 +1002,42 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
 	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
+static int __setup_ppi(struct mpam_msc *msc)
+{
+	int cpu;
+
+	msc->error_dev_id = alloc_percpu(struct mpam_msc *);
+	if (!msc->error_dev_id)
+		return -ENOMEM;
+
+	for_each_cpu(cpu, &msc->accessibility)
+		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+
+	return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+	int irq;
+
+	irq = platform_get_irq_byname_optional(msc->pdev, "error");
+	if (irq <= 0)
+		return 0;
+
+	/* Allocate and initialise the percpu device pointer for PPI */
+	if (irq_is_percpu(irq))
+		return __setup_ppi(msc);
+
+	/* sanity check: shared interrupts can be routed anywhere? */
+	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+		pr_err_once("msc:%u is a private resource with a shared error interrupt",
+			    msc->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  * An MSC can control traffic from a set of CPUs, but may only be accessible
  * from a (hopefully wider) set of CPUs. The common reason for this is power
@@ -1032,6 +1117,9 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 	if (err)
 		return ERR_PTR(err);
 	err = devm_mutex_init(dev, &msc->part_sel_lock);
+	if (err)
+		return ERR_PTR(err);
+	err = devm_mutex_init(dev, &msc->error_irq_lock);
 	if (err)
 		return ERR_PTR(err);
 	mpam_mon_sel_lock_init(msc);
@@ -1048,6 +1136,10 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 		return ERR_PTR(-EINVAL);
 	}
 
+	err = mpam_msc_setup_error_irq(msc);
+	if (err)
+		return ERR_PTR(err);
+
 	if (device_property_read_u32(&pdev->dev, "pcc-channel", &tmp))
 		msc->iface = MPAM_IFACE_MMIO;
 	else
@@ -1317,8 +1409,177 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
 	}
 }
 
+static char *mpam_errcode_names[16] = {
+	[MPAM_ERRCODE_NONE]			= "No error",
+	[MPAM_ERRCODE_PARTID_SEL_RANGE]		= "PARTID_SEL_Range",
+	[MPAM_ERRCODE_REQ_PARTID_RANGE]		= "Req_PARTID_Range",
+	[MPAM_ERRCODE_MSMONCFG_ID_RANGE]	= "MSMONCFG_ID_RANGE",
+	[MPAM_ERRCODE_REQ_PMG_RANGE]		= "Req_PMG_Range",
+	[MPAM_ERRCODE_MONITOR_RANGE]		= "Monitor_Range",
+	[MPAM_ERRCODE_INTPARTID_RANGE]		= "intPARTID_Range",
+	[MPAM_ERRCODE_UNEXPECTED_INTERNAL]	= "Unexpected_INTERNAL",
+	[MPAM_ERRCODE_UNDEFINED_RIS_PART_SEL]	= "Undefined_RIS_PART_SEL",
+	[MPAM_ERRCODE_RIS_NO_CONTROL]		= "RIS_No_Control",
+	[MPAM_ERRCODE_UNDEFINED_RIS_MON_SEL]	= "Undefined_RIS_MON_SEL",
+	[MPAM_ERRCODE_RIS_NO_MONITOR]		= "RIS_No_Monitor",
+	[12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+	return 0;
+}
+
+/* This can run in mpam_disable(), and the interrupt handler on the same CPU */
+static int mpam_disable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, 0);
+
+	return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+	u64 reg;
+	u16 partid;
+	u8 errcode, pmg, ris;
+
+	if (WARN_ON_ONCE(!msc) ||
+	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &msc->accessibility)))
+		return IRQ_NONE;
+
+	reg = mpam_msc_read_esr(msc);
+
+	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+	if (!errcode)
+		return IRQ_NONE;
+
+	/* Clear level triggered irq */
+	mpam_msc_clear_esr(msc);
+
+	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+	pr_err_ratelimited("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+			   msc->id, mpam_errcode_names[errcode], partid, pmg,
+			   ris);
+
+	/* Disable this interrupt. */
+	mpam_disable_msc_ecr(msc);
+
+	/*
+	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
+	 * unregister the interrupt from the threaded part of the handler.
+	 */
+	mpam_disable_reason = "hardware error interrupt";
+	schedule_work(&mpam_broken_work);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static int mpam_register_irqs(void)
+{
+	int err, irq;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+		/* We anticipate sharing the interrupt with other MSCs */
+		if (irq_is_percpu(irq)) {
+			err = request_percpu_irq(irq, &mpam_ppi_handler,
+						 "mpam:msc:error",
+						 msc->error_dev_id);
+			if (err)
+				return err;
+
+			msc->reenable_error_ppi = irq;
+			smp_call_function_many(&msc->accessibility,
+					       &_enable_percpu_irq, &irq,
+					       true);
+		} else {
+			err = devm_request_irq(&msc->pdev->dev, irq,
+					       &mpam_spi_handler, IRQF_SHARED,
+					       "mpam:msc:error", msc);
+			if (err)
+				return err;
+		}
+
+		mutex_lock(&msc->error_irq_lock);
+		msc->error_irq_req = true;
+		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+		msc->error_irq_hw_enabled = true;
+		mutex_unlock(&msc->error_irq_lock);
+	}
+
+	return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+	int irq;
+	struct mpam_msc *msc;
+
+	guard(cpus_read_lock)();
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		mutex_lock(&msc->error_irq_lock);
+		if (msc->error_irq_hw_enabled) {
+			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+			msc->error_irq_hw_enabled = false;
+		}
+
+		if (msc->error_irq_req) {
+			if (irq_is_percpu(irq)) {
+				msc->reenable_error_ppi = 0;
+				free_percpu_irq(irq, msc->error_dev_id);
+			} else {
+				devm_free_irq(&msc->pdev->dev, irq, msc);
+			}
+			msc->error_irq_req = false;
+		}
+		mutex_unlock(&msc->error_irq_lock);
+	}
+}
+
 static void mpam_enable_once(void)
 {
+	int err;
+
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
 	 * longer change.
@@ -1327,9 +1588,26 @@ static void mpam_enable_once(void)
 	partid_max_published = true;
 	spin_unlock(&partid_max_lock);
 
+	/*
+	 * If all the MSC have been probed, enabling the IRQs happens next.
+	 * That involves cross-calling to a CPU that can reach the MSC, and
+	 * the locks must be taken in this order:
+	 */
+	cpus_read_lock();
 	mutex_lock(&mpam_list_lock);
 	mpam_enable_merge_features(&mpam_classes);
+
+	err = mpam_register_irqs();
+
 	mutex_unlock(&mpam_list_lock);
+	cpus_read_unlock();
+
+	if (err) {
+		pr_warn("Failed to register irqs: %d\n", err);
+		mpam_disable_reason = "Failed to enable.";
+		schedule_work(&mpam_broken_work);
+		return;
+	}
 
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
@@ -1397,6 +1675,8 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	mpam_unregister_irqs();
+
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
 				 srcu_read_lock_held(&mpam_srcu))
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1ad62f13bfb3..3002c8e6cabc 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -46,6 +46,11 @@ struct mpam_msc {
 	enum mpam_msc_iface	iface;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	bool			has_extd_esr;
+
+	int				reenable_error_ppi;
+	struct mpam_msc * __percpu	*error_dev_id;
+
 	atomic_t		online_refs;
 
 	/*
@@ -59,6 +64,14 @@ struct mpam_msc {
 	unsigned long		ris_idxs;
 	u32			ris_max;
 
+	/*
+	 * error_irq_lock is taken when registering/unregistering the error
+	 * interrupt and maniupulating the below flags.
+	 */
+	struct mutex		error_irq_lock;
+	bool			error_irq_req;
+	bool			error_irq_hw_enabled;
+
 	/* mpam_msc_ris of this component */
 	struct list_head	ris;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 21/33] arm_mpam: Register and enable IRQs
  2025-11-07 12:34 ` [PATCH 21/33] arm_mpam: Register and enable IRQs Ben Horgan
@ 2025-11-09 22:18   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 22:18 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the MPAMF_ESR register, so no locking is
> needed. The work to disable MPAM after an error needs to happen at process
> context as it takes mutex. It also unregisters the interrupts, meaning
> it can't be done from the threaded part of a threaded interrupt.
> Instead, mpam_disable() gets scheduled.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Add tag, thanks!
> Whitespace changes
> Use devm_mutex_init()
> ---
>   drivers/resctrl/mpam_devices.c  | 280 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  13 ++
>   2 files changed, 293 insertions(+)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (20 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 21/33] arm_mpam: Register and enable IRQs Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 22:20   ` Gavin Shan
  2025-11-14  4:37   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online Ben Horgan
                   ` (17 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.

After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.

Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in response to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 12 ++++++++++++
 drivers/resctrl/mpam_internal.h |  8 ++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a543a363443c..3a0ad8d93fff 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,6 +29,8 @@
 
 #include "mpam_internal.h"
 
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
+
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
@@ -937,6 +939,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 	struct mpam_msc *msc;
 	bool new_device_probed = false;
 
+	if (mpam_is_enabled())
+		return 0;
+
 	guard(srcu)(&mpam_srcu);
 	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
 				 srcu_read_lock_held(&mpam_srcu)) {
@@ -1475,6 +1480,10 @@ static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
 	/* Disable this interrupt. */
 	mpam_disable_msc_ecr(msc);
 
+	/* Are we racing with the thread disabling MPAM? */
+	if (!mpam_is_enabled())
+		return IRQ_HANDLED;
+
 	/*
 	 * Schedule the teardown work. Don't use a threaded IRQ as we can't
 	 * unregister the interrupt from the threaded part of the handler.
@@ -1609,6 +1618,7 @@ static void mpam_enable_once(void)
 		return;
 	}
 
+	static_branch_enable(&mpam_enabled);
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
 				      "mpam:online");
 
@@ -1675,6 +1685,8 @@ void mpam_disable(struct work_struct *ignored)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	static_branch_disable(&mpam_enabled);
+
 	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 3002c8e6cabc..c6937161877a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -9,6 +9,7 @@
 #include <linux/bitmap.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/jump_label.h>
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/srcu.h>
@@ -20,6 +21,13 @@
 
 struct platform_device;
 
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+	return static_branch_likely(&mpam_enabled);
+}
+
 /*
  * Structures protected by SRCU may not be freed for a surprising amount of
  * time (especially if perf is running). To ensure the MPAM error interrupt can
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-11-07 12:34 ` [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled Ben Horgan
@ 2025-11-09 22:20   ` Gavin Shan
  2025-11-14  4:37   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 22:20 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Once all the MSC have been probed, the system wide usable number of
> PARTID is known and the configuration arrays can be allocated.
> 
> After this point, checking all the MSC have been probed is pointless,
> and the cpuhp callbacks should restore the configuration, instead of
> just resetting the MSC.
> 
> Add a static key to enable this behaviour. This will also allow MPAM
> to be disabled in response to an error, and the architecture code to
> enable/disable the context switch of the MPAM system registers.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c  | 12 ++++++++++++
>   drivers/resctrl/mpam_internal.h |  8 ++++++++
>   2 files changed, 20 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-11-07 12:34 ` [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled Ben Horgan
  2025-11-09 22:20   ` Gavin Shan
@ 2025-11-14  4:37   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-14  4:37 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> Once all the MSC have been probed, the system wide usable number of
> PARTID is known and the configuration arrays can be allocated.
> 
> After this point, checking all the MSC have been probed is pointless, and the
> cpuhp callbacks should restore the configuration, instead of just resetting the
> MSC.
> 
> Add a static key to enable this behaviour. This will also allow MPAM to be
> disabled in response to an error, and the architecture code to enable/disable
> the context switch of the MPAM system registers.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (21 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 22:59   ` Gavin Shan
  2025-11-10 17:27   ` Jonathan Cameron
  2025-11-07 12:34 ` [PATCH 24/33] arm_mpam: Probe and reset the rest of the features Ben Horgan
                   ` (16 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Dave Martin, Ben Horgan

From: James Morse <james.morse@arm.com>

When CPUs come online the MSC's original configuration should be restored.

Add struct mpam_config to hold the configuration. This has a bitmap of
features that were modified. Once the maximum partid is known, allocate
a configuration array for each component, and reprogram each RIS
configuration from this.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Shaopeng Tan (Fujitsu) tan.shaopeng@fujitsu.com
Cc: Peter Newman peternewman@google.com
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Drop tags
Fix component reset, otherwise cpbm wrong and controls not set.
Add a cfg_lock to guard configuration of an msc
---
 drivers/resctrl/mpam_devices.c  | 268 ++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  27 ++++
 2 files changed, 280 insertions(+), 15 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3a0ad8d93fff..8b0944bdaf28 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -144,6 +144,16 @@ static void mpam_free_garbage(void)
 	}
 }
 
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+	WARN_ON_ONCE(!partid_max_published);
+}
+
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
 	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
@@ -343,6 +353,7 @@ mpam_component_alloc(struct mpam_class *class, int id)
 	return comp;
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp);
 static void mpam_class_destroy(struct mpam_class *class);
 
 static void mpam_component_destroy(struct mpam_component *comp)
@@ -351,6 +362,8 @@ static void mpam_component_destroy(struct mpam_component *comp)
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	__destroy_component_cfg(comp);
+
 	list_del_rcu(&comp->class_list);
 	add_to_garbage(comp);
 
@@ -820,31 +833,59 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 	__mpam_write_reg(msc, reg, bm);
 }
 
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+				      struct mpam_config *cfg)
 {
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
 
-	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
-
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
-	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
+	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
+		if (cfg->reset_cpbm)
+			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+					      rprops->cpbm_wd);
+		else
+			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_part, cfg)) {
+		if (cfg->reset_mbw_pbm)
+			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+					      rprops->mbw_pbm_bits);
+		else
+			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_min, cfg))
 		mpam_write_partsel_reg(msc, MBW_MIN, 0);
 
-	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
-		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_max, cfg)) {
+		if (cfg->reset_mbw_max)
+			mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
+		else
+			mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
+	}
 
 	mutex_unlock(&msc->part_sel_lock);
 }
 
+static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
+{
+	*reset_cfg = (struct mpam_config) {
+		.reset_cpbm = true,
+		.reset_mbw_pbm = true,
+		.reset_mbw_max = true,
+	};
+	bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible. Caller must hold mpam_srcu.
@@ -852,16 +893,19 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
+	struct mpam_config reset_cfg;
 	struct mpam_msc_ris *ris = arg;
 
 	if (ris->in_reset_state)
 		return 0;
 
+	mpam_init_reset_cfg(&reset_cfg);
+
 	spin_lock(&partid_max_lock);
 	partid_max = mpam_partid_max;
 	spin_unlock(&partid_max_lock);
 	for (partid = 0; partid <= partid_max; partid++)
-		mpam_reset_ris_partid(ris, partid);
+		mpam_reprogram_ris_partid(ris, partid, &reset_cfg);
 
 	return 0;
 }
@@ -894,6 +938,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 {
 	struct mpam_msc_ris *ris;
 
+	mutex_lock(&msc->cfg_lock);
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
 		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
@@ -903,6 +948,61 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 */
 		ris->in_reset_state = online;
 	}
+	mutex_unlock(&msc->cfg_lock);
+}
+
+struct mpam_write_config_arg {
+	struct mpam_msc_ris *ris;
+	struct mpam_component *comp;
+	u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+	struct mpam_write_config_arg *c = arg;
+
+	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+	return 0;
+}
+
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+	u16 partid;
+	bool reset;
+	struct mpam_config *cfg;
+	struct mpam_msc_ris *ris;
+	struct mpam_write_config_arg arg;
+
+	/*
+	 * No lock for mpam_partid_max as partid_max_published has been
+	 * set by mpam_enabled(), so the values can no longer change.
+	 */
+	mpam_assert_partid_sizes_fixed();
+
+	mutex_lock(&msc->cfg_lock);
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!mpam_is_enabled() && !ris->in_reset_state) {
+			mpam_touch_msc(msc, &mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+			continue;
+		}
+
+		arg.comp = ris->vmsc->comp;
+		arg.ris = ris;
+		reset = true;
+		for (partid = 0; partid <= mpam_partid_max; partid++) {
+			cfg = &ris->vmsc->comp->cfg[partid];
+			if (!bitmap_empty(cfg->features, MPAM_FEATURE_LAST))
+				reset = false;
+
+			arg.partid = partid;
+			mpam_touch_msc(msc, __write_config, &arg);
+		}
+		ris->in_reset_state = reset;
+	}
+	mutex_unlock(&msc->cfg_lock);
 }
 
 static void _enable_percpu_irq(void *_irq)
@@ -926,7 +1026,7 @@ static int mpam_cpu_online(unsigned int cpu)
 			_enable_percpu_irq(&msc->reenable_error_ppi);
 
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
-			mpam_reset_msc(msc, true);
+			mpam_reprogram_msc(msc);
 	}
 
 	return 0;
@@ -1125,6 +1225,9 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
 	if (err)
 		return ERR_PTR(err);
 	err = devm_mutex_init(dev, &msc->error_irq_lock);
+	if (err)
+		return ERR_PTR(err);
+	err = devm_mutex_init(dev, &msc->cfg_lock);
 	if (err)
 		return ERR_PTR(err);
 	mpam_mon_sel_lock_init(msc);
@@ -1585,6 +1688,70 @@ static void mpam_unregister_irqs(void)
 	}
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+	add_to_garbage(comp->cfg);
+}
+
+static void mpam_reset_component_cfg(struct mpam_component *comp)
+{
+	int i;
+	struct mpam_props *cprops = &comp->class->props;
+
+	mpam_assert_partid_sizes_fixed();
+
+	if (!comp->cfg)
+		return;
+
+	for (i = 0; i <= mpam_partid_max; i++) {
+		comp->cfg[i] = (struct mpam_config) {};
+		bitmap_fill(comp->cfg[i].features, MPAM_FEATURE_LAST);
+		bitmap_set((unsigned long *)&comp->cfg[i].cpbm, 0, cprops->cpbm_wd);
+		bitmap_set((unsigned long *)&comp->cfg[i].mbw_pbm, 0, cprops->mbw_pbm_bits);
+		bitmap_set((unsigned long *)&comp->cfg[i].mbw_max, 16 - cprops->bwa_wd, cprops->bwa_wd);
+	}
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+	mpam_assert_partid_sizes_fixed();
+
+	if (comp->cfg)
+		return 0;
+
+	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+	if (!comp->cfg)
+		return -ENOMEM;
+
+	/*
+	 * The array is free()d in one go, so only cfg[0]'s structure needs
+	 * to be initialised.
+	 */
+	init_garbage(&comp->cfg[0].garbage);
+
+	mpam_reset_component_cfg(comp);
+
+	return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list) {
+			int err = __allocate_component_cfg(comp);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
 static void mpam_enable_once(void)
 {
 	int err;
@@ -1604,15 +1771,25 @@ static void mpam_enable_once(void)
 	 */
 	cpus_read_lock();
 	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
+	do {
+		mpam_enable_merge_features(&mpam_classes);
 
-	err = mpam_register_irqs();
+		err = mpam_register_irqs();
+		if (err) {
+			pr_warn("Failed to register irqs: %d\n", err);
+			break;
+		}
 
+		err = mpam_allocate_config();
+		if (err) {
+			pr_err("Failed to allocate configuration arrays.\n");
+			break;
+		}
+	} while (0);
 	mutex_unlock(&mpam_list_lock);
 	cpus_read_unlock();
 
 	if (err) {
-		pr_warn("Failed to register irqs: %d\n", err);
 		mpam_disable_reason = "Failed to enable.";
 		schedule_work(&mpam_broken_work);
 		return;
@@ -1632,6 +1809,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
 	struct mpam_vmsc *vmsc;
 
 	lockdep_assert_cpus_held();
+	mpam_assert_partid_sizes_fixed();
+
+	mpam_reset_component_cfg(comp);
 
 	guard(srcu)(&mpam_srcu);
 	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
@@ -1732,6 +1912,64 @@ void mpam_enable(struct work_struct *work)
 		mpam_enable_once();
 }
 
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+	if (mpam_has_feature(feature, newcfg) &&			\
+	    (newcfg)->member != (cfg)->member) {			\
+		(cfg)->member = (newcfg)->member;			\
+		mpam_set_feature(feature, cfg);				\
+									\
+		(changes) = true;					\
+	}								\
+} while (0)
+
+static bool mpam_update_config(struct mpam_config *cfg,
+			       const struct mpam_config *newcfg)
+{
+	bool has_changes = false;
+
+	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, has_changes);
+	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, has_changes);
+	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, has_changes);
+
+	return has_changes;
+}
+
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg)
+{
+	struct mpam_write_config_arg arg;
+	struct mpam_msc_ris *ris;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	/* Don't pass in the current config! */
+	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+	if (!mpam_update_config(&comp->cfg[partid], cfg))
+		return 0;
+
+	arg.comp = comp;
+	arg.partid = partid;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		msc = vmsc->msc;
+
+		mutex_lock(&msc->cfg_lock);
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			arg.ris = ris;
+			mpam_touch_msc(msc, __write_config, &arg);
+		}
+		mutex_unlock(&msc->cfg_lock);
+	}
+
+	return 0;
+}
+
 static int __init mpam_msc_driver_init(void)
 {
 	if (!system_supports_mpam())
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c6937161877a..842d32f148b5 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -91,6 +91,9 @@ struct mpam_msc {
 	 */
 	struct mutex		part_sel_lock;
 
+	/* cfg_lock protects the msc configuration. */
+	struct mutex		cfg_lock;
+
 	/*
 	 * mon_sel_lock protects access to the MSC hardware registers that are
 	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
@@ -180,6 +183,21 @@ struct mpam_class {
 	struct mpam_garbage	garbage;
 };
 
+struct mpam_config {
+	/* Which configuration values are valid. */
+	DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
+
+	u32	cpbm;
+	u32	mbw_pbm;
+	u16	mbw_max;
+
+	bool	reset_cpbm;
+	bool	reset_mbw_pbm;
+	bool	reset_mbw_max;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_component {
 	u32			comp_id;
 
@@ -188,6 +206,12 @@ struct mpam_component {
 
 	cpumask_t		affinity;
 
+	/*
+	 * Array of configuration values, indexed by partid.
+	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+	 */
+	struct mpam_config	*cfg;
+
 	/* member of mpam_class:components */
 	struct list_head	class_list;
 
@@ -247,6 +271,9 @@ extern u8 mpam_pmg_max;
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
 
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-07 12:34 ` [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online Ben Horgan
@ 2025-11-09 22:59   ` Gavin Shan
  2025-11-13 17:14     ` Ben Horgan
  2025-11-10 17:27   ` Jonathan Cameron
  1 sibling, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 22:59 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Ben

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate
> a configuration array for each component, and reprogram each RIS
> configuration from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Cc: Shaopeng Tan (Fujitsu) tan.shaopeng@fujitsu.com
> Cc: Peter Newman peternewman@google.com
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Drop tags
> Fix component reset, otherwise cpbm wrong and controls not set.
> Add a cfg_lock to guard configuration of an msc
> ---
>   drivers/resctrl/mpam_devices.c  | 268 ++++++++++++++++++++++++++++++--
>   drivers/resctrl/mpam_internal.h |  27 ++++
>   2 files changed, 280 insertions(+), 15 deletions(-)
> 

With the following comments addressed:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3a0ad8d93fff..8b0944bdaf28 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -144,6 +144,16 @@ static void mpam_free_garbage(void)
>   	}
>   }
>   
> +/*
> + * Once mpam is enabled, new requestors cannot further reduce the available
> + * partid. Assert that the size is fixed, and new requestors will be turned
> + * away.
> + */
> +static void mpam_assert_partid_sizes_fixed(void)
> +{
> +	WARN_ON_ONCE(!partid_max_published);
> +}
> +

Would be worthy to be a online function.

>   static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>   {
>   	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> @@ -343,6 +353,7 @@ mpam_component_alloc(struct mpam_class *class, int id)
>   	return comp;
>   }
>   
> +static void __destroy_component_cfg(struct mpam_component *comp);
>   static void mpam_class_destroy(struct mpam_class *class);
>   
>   static void mpam_component_destroy(struct mpam_component *comp)
> @@ -351,6 +362,8 @@ static void mpam_component_destroy(struct mpam_component *comp)
>   
>   	lockdep_assert_held(&mpam_list_lock);
>   
> +	__destroy_component_cfg(comp);
> +
>   	list_del_rcu(&comp->class_list);
>   	add_to_garbage(comp);
>   
> @@ -820,31 +833,59 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>   	__mpam_write_reg(msc, reg, bm);
>   }
>   
> -static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +/* Called via IPI. Call while holding an SRCU reference */
> +static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> +				      struct mpam_config *cfg)
>   {
>   	struct mpam_msc *msc = ris->vmsc->msc;
>   	struct mpam_props *rprops = &ris->props;
>   
> -	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
> -
>   	mutex_lock(&msc->part_sel_lock);
>   	__mpam_part_sel(ris->ris_idx, partid, msc);
>   
> -	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
> +	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
> +		if (cfg->reset_cpbm)
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
> +					      rprops->cpbm_wd);
> +		else
> +			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> +	}

{} is needed by 'if (cfg->reset_cpbm)'

>   
> -	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_part, cfg)) {
> +		if (cfg->reset_mbw_pbm)
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
> +					      rprops->mbw_pbm_bits);
> +		else
> +			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
> +	}
>   

{ } is need by 'if (cfg->reset_mbw_pbm)'

> -	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_min, cfg))
>   		mpam_write_partsel_reg(msc, MBW_MIN, 0);
>   
> -	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> -		mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
> +	    mpam_has_feature(mpam_feat_mbw_max, cfg)) {
> +		if (cfg->reset_mbw_max)
> +			mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
> +		else
> +			mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
> +	}
>   
>   	mutex_unlock(&msc->part_sel_lock);
>   }
>   
> +static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
> +{
> +	*reset_cfg = (struct mpam_config) {
> +		.reset_cpbm = true,
> +		.reset_mbw_pbm = true,
> +		.reset_mbw_max = true,
> +	};
> +	bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
> +}
> +
>   /*
>    * Called via smp_call_on_cpu() to prevent migration, while still being
>    * pre-emptible. Caller must hold mpam_srcu.
> @@ -852,16 +893,19 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
>   static int mpam_reset_ris(void *arg)
>   {
>   	u16 partid, partid_max;
> +	struct mpam_config reset_cfg;
>   	struct mpam_msc_ris *ris = arg;
>   
>   	if (ris->in_reset_state)
>   		return 0;
>   
> +	mpam_init_reset_cfg(&reset_cfg);
> +
>   	spin_lock(&partid_max_lock);
>   	partid_max = mpam_partid_max;
>   	spin_unlock(&partid_max_lock);
>   	for (partid = 0; partid <= partid_max; partid++)
> -		mpam_reset_ris_partid(ris, partid);
> +		mpam_reprogram_ris_partid(ris, partid, &reset_cfg);
>   
>   	return 0;
>   }
> @@ -894,6 +938,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>   {
>   	struct mpam_msc_ris *ris;
>   
> +	mutex_lock(&msc->cfg_lock);
>   	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
>   		mpam_touch_msc(msc, &mpam_reset_ris, ris);
>   
> @@ -903,6 +948,61 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>   		 */
>   		ris->in_reset_state = online;
>   	}
> +	mutex_unlock(&msc->cfg_lock);
> +}
> +
> +struct mpam_write_config_arg {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_component *comp;
> +	u16 partid;
> +};
> +
> +static int __write_config(void *arg)
> +{
> +	struct mpam_write_config_arg *c = arg;
> +
> +	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
> +
> +	return 0;
> +}
> +
> +static void mpam_reprogram_msc(struct mpam_msc *msc)
> +{
> +	u16 partid;
> +	bool reset;
> +	struct mpam_config *cfg;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_write_config_arg arg;
> +
> +	/*
> +	 * No lock for mpam_partid_max as partid_max_published has been
> +	 * set by mpam_enabled(), so the values can no longer change.
> +	 */
> +	mpam_assert_partid_sizes_fixed();
> +
> +	mutex_lock(&msc->cfg_lock);
> +	list_for_each_entry_srcu(ris, &msc->ris, msc_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		if (!mpam_is_enabled() && !ris->in_reset_state) {
> +			mpam_touch_msc(msc, &mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +			continue;
> +		}
> +
> +		arg.comp = ris->vmsc->comp;
> +		arg.ris = ris;
> +		reset = true;
> +		for (partid = 0; partid <= mpam_partid_max; partid++) {
> +			cfg = &ris->vmsc->comp->cfg[partid];
> +			if (!bitmap_empty(cfg->features, MPAM_FEATURE_LAST))
> +				reset = false;
> +

s/!bitmap_empty()/!bitmap_full (?)

> +			arg.partid = partid;
> +			mpam_touch_msc(msc, __write_config, &arg);
> +		}
> +		ris->in_reset_state = reset;
> +	}
> +	mutex_unlock(&msc->cfg_lock);
>   }
>   
>   static void _enable_percpu_irq(void *_irq)
> @@ -926,7 +1026,7 @@ static int mpam_cpu_online(unsigned int cpu)
>   			_enable_percpu_irq(&msc->reenable_error_ppi);
>   
>   		if (atomic_fetch_inc(&msc->online_refs) == 0)
> -			mpam_reset_msc(msc, true);
> +			mpam_reprogram_msc(msc);
>   	}
>   
>   	return 0;
> @@ -1125,6 +1225,9 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
>   	if (err)
>   		return ERR_PTR(err);
>   	err = devm_mutex_init(dev, &msc->error_irq_lock);
> +	if (err)
> +		return ERR_PTR(err);
> +	err = devm_mutex_init(dev, &msc->cfg_lock);
>   	if (err)
>   		return ERR_PTR(err);
>   	mpam_mon_sel_lock_init(msc);
> @@ -1585,6 +1688,70 @@ static void mpam_unregister_irqs(void)
>   	}
>   }
>   
> +static void __destroy_component_cfg(struct mpam_component *comp)
> +{
> +	add_to_garbage(comp->cfg);
> +}
> +
> +static void mpam_reset_component_cfg(struct mpam_component *comp)
> +{
> +	int i;
> +	struct mpam_props *cprops = &comp->class->props;
> +
> +	mpam_assert_partid_sizes_fixed();
> +
> +	if (!comp->cfg)
> +		return;
> +
> +	for (i = 0; i <= mpam_partid_max; i++) {
> +		comp->cfg[i] = (struct mpam_config) {};
> +		bitmap_fill(comp->cfg[i].features, MPAM_FEATURE_LAST);
> +		bitmap_set((unsigned long *)&comp->cfg[i].cpbm, 0, cprops->cpbm_wd);
> +		bitmap_set((unsigned long *)&comp->cfg[i].mbw_pbm, 0, cprops->mbw_pbm_bits);
> +		bitmap_set((unsigned long *)&comp->cfg[i].mbw_max, 16 - cprops->bwa_wd, cprops->bwa_wd);
> +	}
> +}
> +
> +static int __allocate_component_cfg(struct mpam_component *comp)
> +{
> +	mpam_assert_partid_sizes_fixed();
> +
> +	if (comp->cfg)
> +		return 0;
> +
> +	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
> +	if (!comp->cfg)
> +		return -ENOMEM;
> +
> +	/*
> +	 * The array is free()d in one go, so only cfg[0]'s structure needs
> +	 * to be initialised.
> +	 */
> +	init_garbage(&comp->cfg[0].garbage);
> +
> +	mpam_reset_component_cfg(comp);
> +
> +	return 0;
> +}
> +
> +static int mpam_allocate_config(void)
> +{
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		list_for_each_entry(comp, &class->components, class_list) {
> +			int err = __allocate_component_cfg(comp);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>   static void mpam_enable_once(void)
>   {
>   	int err;
> @@ -1604,15 +1771,25 @@ static void mpam_enable_once(void)
>   	 */
>   	cpus_read_lock();
>   	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> +	do {
> +		mpam_enable_merge_features(&mpam_classes);
>   
> -	err = mpam_register_irqs();
> +		err = mpam_register_irqs();
> +		if (err) {
> +			pr_warn("Failed to register irqs: %d\n", err);
> +			break;
> +		}
>   
> +		err = mpam_allocate_config();
> +		if (err) {
> +			pr_err("Failed to allocate configuration arrays.\n");
> +			break;
> +		}
> +	} while (0);
>   	mutex_unlock(&mpam_list_lock);
>   	cpus_read_unlock();
>   
>   	if (err) {
> -		pr_warn("Failed to register irqs: %d\n", err);
>   		mpam_disable_reason = "Failed to enable.";
>   		schedule_work(&mpam_broken_work);
>   		return;
> @@ -1632,6 +1809,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
>   	struct mpam_vmsc *vmsc;
>   
>   	lockdep_assert_cpus_held();
> +	mpam_assert_partid_sizes_fixed();
> +
> +	mpam_reset_component_cfg(comp);
>   
>   	guard(srcu)(&mpam_srcu);
>   	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> @@ -1732,6 +1912,64 @@ void mpam_enable(struct work_struct *work)
>   		mpam_enable_once();
>   }
>   
> +#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
> +	if (mpam_has_feature(feature, newcfg) &&			\
> +	    (newcfg)->member != (cfg)->member) {			\
> +		(cfg)->member = (newcfg)->member;			\
> +		mpam_set_feature(feature, cfg);				\
> +									\
> +		(changes) = true;					\
> +	}								\
> +} while (0)
> +
> +static bool mpam_update_config(struct mpam_config *cfg,
> +			       const struct mpam_config *newcfg)
> +{
> +	bool has_changes = false;
> +
> +	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, has_changes);
> +	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, has_changes);
> +	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, has_changes);
> +
> +	return has_changes;
> +}
> +
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> +		      struct mpam_config *cfg)
> +{
> +	struct mpam_write_config_arg arg;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc *msc;
> +
> +	lockdep_assert_cpus_held();
> +
> +	/* Don't pass in the current config! */
> +	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
> +
> +	if (!mpam_update_config(&comp->cfg[partid], cfg))
> +		return 0;
> +
> +	arg.comp = comp;
> +	arg.partid = partid;
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		msc = vmsc->msc;
> +
> +		mutex_lock(&msc->cfg_lock);
> +		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
> +					 srcu_read_lock_held(&mpam_srcu)) {
> +			arg.ris = ris;
> +			mpam_touch_msc(msc, __write_config, &arg);
> +		}
> +		mutex_unlock(&msc->cfg_lock);
> +	}
> +
> +	return 0;
> +}
> +
>   static int __init mpam_msc_driver_init(void)
>   {
>   	if (!system_supports_mpam())
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index c6937161877a..842d32f148b5 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -91,6 +91,9 @@ struct mpam_msc {
>   	 */
>   	struct mutex		part_sel_lock;
>   
> +	/* cfg_lock protects the msc configuration. */
> +	struct mutex		cfg_lock;
> +
>   	/*
>   	 * mon_sel_lock protects access to the MSC hardware registers that are
>   	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> @@ -180,6 +183,21 @@ struct mpam_class {
>   	struct mpam_garbage	garbage;
>   };
>   
> +struct mpam_config {
> +	/* Which configuration values are valid. */
> +	DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
> +
> +	u32	cpbm;
> +	u32	mbw_pbm;
> +	u16	mbw_max;
> +
> +	bool	reset_cpbm;
> +	bool	reset_mbw_pbm;
> +	bool	reset_mbw_max;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
>   struct mpam_component {
>   	u32			comp_id;
>   
> @@ -188,6 +206,12 @@ struct mpam_component {
>   
>   	cpumask_t		affinity;
>   
> +	/*
> +	 * Array of configuration values, indexed by partid.
> +	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
> +	 */
> +	struct mpam_config	*cfg;
> +
>   	/* member of mpam_class:components */
>   	struct list_head	class_list;
>   
> @@ -247,6 +271,9 @@ extern u8 mpam_pmg_max;
>   void mpam_enable(struct work_struct *work);
>   void mpam_disable(struct work_struct *work);
>   
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> +		      struct mpam_config *cfg);
> +
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-09 22:59   ` Gavin Shan
@ 2025-11-13 17:14     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-13 17:14 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Gavin,

On 11/9/25 22:59, Gavin Shan wrote:
> Hi Ben
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> When CPUs come online the MSC's original configuration should be
>> restored.
>>
>> Add struct mpam_config to hold the configuration. This has a bitmap of
>> features that were modified. Once the maximum partid is known, allocate
>> a configuration array for each component, and reprogram each RIS
>> configuration from this.
>>
>> CC: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Cc: Shaopeng Tan (Fujitsu) tan.shaopeng@fujitsu.com
>> Cc: Peter Newman peternewman@google.com
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> Drop tags
>> Fix component reset, otherwise cpbm wrong and controls not set.
>> Add a cfg_lock to guard configuration of an msc
>> ---
>>   drivers/resctrl/mpam_devices.c  | 268 ++++++++++++++++++++++++++++++--
>>   drivers/resctrl/mpam_internal.h |  27 ++++
>>   2 files changed, 280 insertions(+), 15 deletions(-)
>>
> 
> With the following comments addressed:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> index 3a0ad8d93fff..8b0944bdaf28 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -144,6 +144,16 @@ static void mpam_free_garbage(void)
>>       }
>>   }
>>   +/*
>> + * Once mpam is enabled, new requestors cannot further reduce the
>> available
>> + * partid. Assert that the size is fixed, and new requestors will be
>> turned
>> + * away.
>> + */
>> +static void mpam_assert_partid_sizes_fixed(void)
>> +{
>> +    WARN_ON_ONCE(!partid_max_published);
>> +}
>> +
> 
> Would be worthy to be a online function.

Assuming you mean 'inline'. I don't think it really matters but I'll
leave it as is so that WARN_ON_ONCE() only ever gives one warning.

> 
>>   static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>>   {
>>       WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc-
>> >accessibility));
>> @@ -343,6 +353,7 @@ mpam_component_alloc(struct mpam_class *class, int
>> id)
>>       return comp;
>>   }
>>   +static void __destroy_component_cfg(struct mpam_component *comp);
>>   static void mpam_class_destroy(struct mpam_class *class);
>>     static void mpam_component_destroy(struct mpam_component *comp)
>> @@ -351,6 +362,8 @@ static void mpam_component_destroy(struct
>> mpam_component *comp)
>>         lockdep_assert_held(&mpam_list_lock);
>>   +    __destroy_component_cfg(comp);
>> +
>>       list_del_rcu(&comp->class_list);
>>       add_to_garbage(comp);
>>   @@ -820,31 +833,59 @@ static void mpam_reset_msc_bitmap(struct
>> mpam_msc *msc, u16 reg, u16 wd)
>>       __mpam_write_reg(msc, reg, bm);
>>   }
>>   -static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16
>> partid)
>> +/* Called via IPI. Call while holding an SRCU reference */
>> +static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16
>> partid,
>> +                      struct mpam_config *cfg)
>>   {
>>       struct mpam_msc *msc = ris->vmsc->msc;
>>       struct mpam_props *rprops = &ris->props;
>>   -    WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
>> -
>>       mutex_lock(&msc->part_sel_lock);
>>       __mpam_part_sel(ris->ris_idx, partid, msc);
>>   -    if (mpam_has_feature(mpam_feat_cpor_part, rprops))
>> -        mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
>> +    if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
>> +        mpam_has_feature(mpam_feat_cpor_part, cfg)) {
>> +        if (cfg->reset_cpbm)
>> +            mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
>> +                          rprops->cpbm_wd);
>> +        else
>> +            mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
>> +    }
> 
> {} is needed by 'if (cfg->reset_cpbm)'

Changed to be one line.

> 
>>   -    if (mpam_has_feature(mpam_feat_mbw_part, rprops))
>> -        mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops-
>> >mbw_pbm_bits);
>> +    if (mpam_has_feature(mpam_feat_mbw_part, rprops) &&
>> +        mpam_has_feature(mpam_feat_mbw_part, cfg)) {
>> +        if (cfg->reset_mbw_pbm)
>> +            mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
>> +                          rprops->mbw_pbm_bits);
>> +        else
>> +            mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
>> +    }
>>   
> 
> { } is need by 'if (cfg->reset_mbw_pbm)'

Changed to be one line.

> 
>> -    if (mpam_has_feature(mpam_feat_mbw_min, rprops))
>> +    if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
>> +        mpam_has_feature(mpam_feat_mbw_min, cfg))
>>           mpam_write_partsel_reg(msc, MBW_MIN, 0);
>>   -    if (mpam_has_feature(mpam_feat_mbw_max, rprops))
>> -        mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
>> +    if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
>> +        mpam_has_feature(mpam_feat_mbw_max, cfg)) {
>> +        if (cfg->reset_mbw_max)
>> +            mpam_write_partsel_reg(msc, MBW_MAX, MPAMCFG_MBW_MAX_MAX);
>> +        else
>> +            mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
>> +    }
>>         mutex_unlock(&msc->part_sel_lock);
>>   }
>>   +static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
>> +{
>> +    *reset_cfg = (struct mpam_config) {
>> +        .reset_cpbm = true,
>> +        .reset_mbw_pbm = true,
>> +        .reset_mbw_max = true,
>> +    };
>> +    bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
>> +}
>> +
>>   /*
>>    * Called via smp_call_on_cpu() to prevent migration, while still being
>>    * pre-emptible. Caller must hold mpam_srcu.
>> @@ -852,16 +893,19 @@ static void mpam_reset_ris_partid(struct
>> mpam_msc_ris *ris, u16 partid)
>>   static int mpam_reset_ris(void *arg)
>>   {
>>       u16 partid, partid_max;
>> +    struct mpam_config reset_cfg;
>>       struct mpam_msc_ris *ris = arg;
>>         if (ris->in_reset_state)
>>           return 0;
>>   +    mpam_init_reset_cfg(&reset_cfg);
>> +
>>       spin_lock(&partid_max_lock);
>>       partid_max = mpam_partid_max;
>>       spin_unlock(&partid_max_lock);
>>       for (partid = 0; partid <= partid_max; partid++)
>> -        mpam_reset_ris_partid(ris, partid);
>> +        mpam_reprogram_ris_partid(ris, partid, &reset_cfg);
>>         return 0;
>>   }
>> @@ -894,6 +938,7 @@ static void mpam_reset_msc(struct mpam_msc *msc,
>> bool online)
>>   {
>>       struct mpam_msc_ris *ris;
>>   +    mutex_lock(&msc->cfg_lock);
>>       list_for_each_entry_srcu(ris, &msc->ris, msc_list,
>> srcu_read_lock_held(&mpam_srcu)) {
>>           mpam_touch_msc(msc, &mpam_reset_ris, ris);
>>   @@ -903,6 +948,61 @@ static void mpam_reset_msc(struct mpam_msc
>> *msc, bool online)
>>            */
>>           ris->in_reset_state = online;
>>       }
>> +    mutex_unlock(&msc->cfg_lock);
>> +}
>> +
>> +struct mpam_write_config_arg {
>> +    struct mpam_msc_ris *ris;
>> +    struct mpam_component *comp;
>> +    u16 partid;
>> +};
>> +
>> +static int __write_config(void *arg)
>> +{
>> +    struct mpam_write_config_arg *c = arg;
>> +
>> +    mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c-
>> >partid]);
>> +
>> +    return 0;
>> +}
>> +
>> +static void mpam_reprogram_msc(struct mpam_msc *msc)
>> +{
>> +    u16 partid;
>> +    bool reset;
>> +    struct mpam_config *cfg;
>> +    struct mpam_msc_ris *ris;
>> +    struct mpam_write_config_arg arg;
>> +
>> +    /*
>> +     * No lock for mpam_partid_max as partid_max_published has been
>> +     * set by mpam_enabled(), so the values can no longer change.
>> +     */
>> +    mpam_assert_partid_sizes_fixed();
>> +
>> +    mutex_lock(&msc->cfg_lock);
>> +    list_for_each_entry_srcu(ris, &msc->ris, msc_list,
>> +                 srcu_read_lock_held(&mpam_srcu)) {
>> +        if (!mpam_is_enabled() && !ris->in_reset_state) {
>> +            mpam_touch_msc(msc, &mpam_reset_ris, ris);
>> +            ris->in_reset_state = true;
>> +            continue;
>> +        }
>> +
>> +        arg.comp = ris->vmsc->comp;
>> +        arg.ris = ris;
>> +        reset = true;
>> +        for (partid = 0; partid <= mpam_partid_max; partid++) {
>> +            cfg = &ris->vmsc->comp->cfg[partid];
>> +            if (!bitmap_empty(cfg->features, MPAM_FEATURE_LAST))
>> +                reset = false;
>> +
> 
> s/!bitmap_empty()/!bitmap_full (?)

This is checking if there is any work for the configuration to do and so
checking if there are any feature bits set, !bitmap_empty(), is the
correct thing to do.

> 
>> +            arg.partid = partid;
>> +            mpam_touch_msc(msc, __write_config, &arg);
>> +        }
>> +        ris->in_reset_state = reset;
>> +    }
>> +    mutex_unlock(&msc->cfg_lock);
>>   }
>>     static void _enable_percpu_irq(void *_irq)
>> @@ -926,7 +1026,7 @@ static int mpam_cpu_online(unsigned int cpu)
>>               _enable_percpu_irq(&msc->reenable_error_ppi);
>>             if (atomic_fetch_inc(&msc->online_refs) == 0)
>> -            mpam_reset_msc(msc, true);
>> +            mpam_reprogram_msc(msc);
>>       }
>>         return 0;
>> @@ -1125,6 +1225,9 @@ static struct mpam_msc
>> *do_mpam_msc_drv_probe(struct platform_device *pdev)
>>       if (err)
>>           return ERR_PTR(err);
>>       err = devm_mutex_init(dev, &msc->error_irq_lock);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +    err = devm_mutex_init(dev, &msc->cfg_lock);
>>       if (err)
>>           return ERR_PTR(err);
>>       mpam_mon_sel_lock_init(msc);
>> @@ -1585,6 +1688,70 @@ static void mpam_unregister_irqs(void)
>>       }
>>   }
>>   +static void __destroy_component_cfg(struct mpam_component *comp)
>> +{
>> +    add_to_garbage(comp->cfg);
>> +}
>> +
>> +static void mpam_reset_component_cfg(struct mpam_component *comp)
>> +{
>> +    int i;
>> +    struct mpam_props *cprops = &comp->class->props;
>> +
>> +    mpam_assert_partid_sizes_fixed();
>> +
>> +    if (!comp->cfg)
>> +        return;
>> +
>> +    for (i = 0; i <= mpam_partid_max; i++) {
>> +        comp->cfg[i] = (struct mpam_config) {};
>> +        bitmap_fill(comp->cfg[i].features, MPAM_FEATURE_LAST);
>> +        bitmap_set((unsigned long *)&comp->cfg[i].cpbm, 0, cprops-
>> >cpbm_wd);
>> +        bitmap_set((unsigned long *)&comp->cfg[i].mbw_pbm, 0, cprops-
>> >mbw_pbm_bits);
>> +        bitmap_set((unsigned long *)&comp->cfg[i].mbw_max, 16 -
>> cprops->bwa_wd, cprops->bwa_wd);
>> +    }
>> +}
>> +
>> +static int __allocate_component_cfg(struct mpam_component *comp)
>> +{
>> +    mpam_assert_partid_sizes_fixed();
>> +
>> +    if (comp->cfg)
>> +        return 0;
>> +
>> +    comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg),
>> GFP_KERNEL);
>> +    if (!comp->cfg)
>> +        return -ENOMEM;
>> +
>> +    /*
>> +     * The array is free()d in one go, so only cfg[0]'s structure needs
>> +     * to be initialised.
>> +     */
>> +    init_garbage(&comp->cfg[0].garbage);
>> +
>> +    mpam_reset_component_cfg(comp);
>> +
>> +    return 0;
>> +}
>> +
>> +static int mpam_allocate_config(void)
>> +{
>> +    struct mpam_class *class;
>> +    struct mpam_component *comp;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    list_for_each_entry(class, &mpam_classes, classes_list) {
>> +        list_for_each_entry(comp, &class->components, class_list) {
>> +            int err = __allocate_component_cfg(comp);
>> +            if (err)
>> +                return err;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static void mpam_enable_once(void)
>>   {
>>       int err;
>> @@ -1604,15 +1771,25 @@ static void mpam_enable_once(void)
>>        */
>>       cpus_read_lock();
>>       mutex_lock(&mpam_list_lock);
>> -    mpam_enable_merge_features(&mpam_classes);
>> +    do {
>> +        mpam_enable_merge_features(&mpam_classes);
>>   -    err = mpam_register_irqs();
>> +        err = mpam_register_irqs();
>> +        if (err) {
>> +            pr_warn("Failed to register irqs: %d\n", err);
>> +            break;
>> +        }
>>   +        err = mpam_allocate_config();
>> +        if (err) {
>> +            pr_err("Failed to allocate configuration arrays.\n");
>> +            break;
>> +        }
>> +    } while (0);
>>       mutex_unlock(&mpam_list_lock);
>>       cpus_read_unlock();
>>         if (err) {
>> -        pr_warn("Failed to register irqs: %d\n", err);
>>           mpam_disable_reason = "Failed to enable.";
>>           schedule_work(&mpam_broken_work);
>>           return;
>> @@ -1632,6 +1809,9 @@ static void mpam_reset_component_locked(struct
>> mpam_component *comp)
>>       struct mpam_vmsc *vmsc;
>>         lockdep_assert_cpus_held();
>> +    mpam_assert_partid_sizes_fixed();
>> +
>> +    mpam_reset_component_cfg(comp);
>>         guard(srcu)(&mpam_srcu);
>>       list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
>> @@ -1732,6 +1912,64 @@ void mpam_enable(struct work_struct *work)
>>           mpam_enable_once();
>>   }
>>   +#define maybe_update_config(cfg, feature, newcfg, member, changes)
>> do { \
>> +    if (mpam_has_feature(feature, newcfg) &&            \
>> +        (newcfg)->member != (cfg)->member) {            \
>> +        (cfg)->member = (newcfg)->member;            \
>> +        mpam_set_feature(feature, cfg);                \
>> +                                    \
>> +        (changes) = true;                    \
>> +    }                                \
>> +} while (0)
>> +
>> +static bool mpam_update_config(struct mpam_config *cfg,
>> +                   const struct mpam_config *newcfg)
>> +{
>> +    bool has_changes = false;
>> +
>> +    maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm,
>> has_changes);
>> +    maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm,
>> has_changes);
>> +    maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max,
>> has_changes);
>> +
>> +    return has_changes;
>> +}
>> +
>> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
>> +              struct mpam_config *cfg)
>> +{
>> +    struct mpam_write_config_arg arg;
>> +    struct mpam_msc_ris *ris;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc *msc;
>> +
>> +    lockdep_assert_cpus_held();
>> +
>> +    /* Don't pass in the current config! */
>> +    WARN_ON_ONCE(&comp->cfg[partid] == cfg);
>> +
>> +    if (!mpam_update_config(&comp->cfg[partid], cfg))
>> +        return 0;
>> +
>> +    arg.comp = comp;
>> +    arg.partid = partid;
>> +
>> +    guard(srcu)(&mpam_srcu);
>> +    list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
>> +                 srcu_read_lock_held(&mpam_srcu)) {
>> +        msc = vmsc->msc;
>> +
>> +        mutex_lock(&msc->cfg_lock);
>> +        list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
>> +                     srcu_read_lock_held(&mpam_srcu)) {
>> +            arg.ris = ris;
>> +            mpam_touch_msc(msc, __write_config, &arg);
>> +        }
>> +        mutex_unlock(&msc->cfg_lock);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int __init mpam_msc_driver_init(void)
>>   {
>>       if (!system_supports_mpam())
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/
>> mpam_internal.h
>> index c6937161877a..842d32f148b5 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -91,6 +91,9 @@ struct mpam_msc {
>>        */
>>       struct mutex        part_sel_lock;
>>   +    /* cfg_lock protects the msc configuration. */
>> +    struct mutex        cfg_lock;
>> +
>>       /*
>>        * mon_sel_lock protects access to the MSC hardware registers
>> that are
>>        * affected by MPAMCFG_MON_SEL, and the mbwu_state.
>> @@ -180,6 +183,21 @@ struct mpam_class {
>>       struct mpam_garbage    garbage;
>>   };
>>   +struct mpam_config {
>> +    /* Which configuration values are valid. */
>> +    DECLARE_BITMAP(features, MPAM_FEATURE_LAST);
>> +
>> +    u32    cpbm;
>> +    u32    mbw_pbm;
>> +    u16    mbw_max;
>> +
>> +    bool    reset_cpbm;
>> +    bool    reset_mbw_pbm;
>> +    bool    reset_mbw_max;
>> +
>> +    struct mpam_garbage    garbage;
>> +};
>> +
>>   struct mpam_component {
>>       u32            comp_id;
>>   @@ -188,6 +206,12 @@ struct mpam_component {
>>         cpumask_t        affinity;
>>   +    /*
>> +     * Array of configuration values, indexed by partid.
>> +     * Read from cpuhp callbacks, hold the cpuhp lock when writing.
>> +     */
>> +    struct mpam_config    *cfg;
>> +
>>       /* member of mpam_class:components */
>>       struct list_head    class_list;
>>   @@ -247,6 +271,9 @@ extern u8 mpam_pmg_max;
>>   void mpam_enable(struct work_struct *work);
>>   void mpam_disable(struct work_struct *work);
>>   +int mpam_apply_config(struct mpam_component *comp, u16 partid,
>> +              struct mpam_config *cfg);
>> +
>>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
>> cache_level,
>>                      cpumask_t *affinity);
>>   
> 
> Thanks,
> Gavin
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-07 12:34 ` [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online Ben Horgan
  2025-11-09 22:59   ` Gavin Shan
@ 2025-11-10 17:27   ` Jonathan Cameron
  2025-11-11 17:45     ` Ben Horgan
  2025-11-14 10:33     ` Ben Horgan
  1 sibling, 2 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 17:27 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Fri, 7 Nov 2025 12:34:40 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> When CPUs come online the MSC's original configuration should be restored.
> 
> Add struct mpam_config to hold the configuration. This has a bitmap of
> features that were modified. Once the maximum partid is known, allocate

I'm not following 'were modified'.  When?  Sometime in the past?
Perhaps "features that have been modified when XXX happens" or

Having read the code I think this is something like "are modified as configuration
is read".

> a configuration array for each component, and reprogram each RIS
> configuration from this.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Cc: Shaopeng Tan (Fujitsu) tan.shaopeng@fujitsu.com
> Cc: Peter Newman peternewman@google.com
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Drop tags
> Fix component reset, otherwise cpbm wrong and controls not set.
> Add a cfg_lock to guard configuration of an msc

The use of bitmap_set() for things that aren't unsigned long (arrays) is a bad
idea. Much better to use GENMASK() to fill those.

> ---
>  drivers/resctrl/mpam_devices.c  | 268 ++++++++++++++++++++++++++++++--
>  drivers/resctrl/mpam_internal.h |  27 ++++
>  2 files changed, 280 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3a0ad8d93fff..8b0944bdaf28 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c



> @@ -1125,6 +1225,9 @@ static struct mpam_msc *do_mpam_msc_drv_probe(struct platform_device *pdev)
>  	if (err)
>  		return ERR_PTR(err);
>  	err = devm_mutex_init(dev, &msc->error_irq_lock);
> +	if (err)
> +		return ERR_PTR(err);
Trivial: As in earlier patches. I'd put a blank line here for readability.
> +	err = devm_mutex_init(dev, &msc->cfg_lock);
>  	if (err)
>  		return ERR_PTR(err);
>  	mpam_mon_sel_lock_init(msc);
> @@ -1585,6 +1688,70 @@ static void mpam_unregister_irqs(void)
>  	}
>  }
>  
> +static void __destroy_component_cfg(struct mpam_component *comp)
> +{
> +	add_to_garbage(comp->cfg);
> +}
> +
> +static void mpam_reset_component_cfg(struct mpam_component *comp)
> +{
> +	int i;
> +	struct mpam_props *cprops = &comp->class->props;
> +
> +	mpam_assert_partid_sizes_fixed();
> +
> +	if (!comp->cfg)
> +		return;
> +
> +	for (i = 0; i <= mpam_partid_max; i++) {
> +		comp->cfg[i] = (struct mpam_config) {};
> +		bitmap_fill(comp->cfg[i].features, MPAM_FEATURE_LAST);
> +		bitmap_set((unsigned long *)&comp->cfg[i].cpbm, 0, cprops->cpbm_wd);

Why manipulate a u32 with bitmap_set() with a horrible pretend it's an unsigned long cast.
Instead just do:
		comp->cfg[i].cpbm = GENMASK(cprops->cpbm_wd, 0);
Which is indeed what bitmap_set will do internally due to an optimization for small bitmaps
but lets avoid that making one integer pretend to be another of a different length.


> +		bitmap_set((unsigned long *)&comp->cfg[i].mbw_pbm, 0, cprops->mbw_pbm_bits);
> +		bitmap_set((unsigned long *)&comp->cfg[i].mbw_max, 16 - cprops->bwa_wd, cprops->bwa_wd);
> +	}
> +}


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-10 17:27   ` Jonathan Cameron
@ 2025-11-11 17:45     ` Ben Horgan
  2025-11-14 10:33     ` Ben Horgan
  1 sibling, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-11 17:45 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Jonathan,

On 11/10/25 17:27, Jonathan Cameron wrote:
> On Fri, 7 Nov 2025 12:34:40 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> From: James Morse <james.morse@arm.com>
>>
>> When CPUs come online the MSC's original configuration should be restored.
>>
>> Add struct mpam_config to hold the configuration. This has a bitmap of
>> features that were modified. Once the maximum partid is known, allocate
> 
> I'm not following 'were modified'.  When?  Sometime in the past?
> Perhaps "features that have been modified when XXX happens" or
> 
> Having read the code I think this is something like "are modified as configuration
> is read".
> 
>> a configuration array for each component, and reprogram each RIS
>> configuration from this.
>>
>> CC: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Cc: Shaopeng Tan (Fujitsu) tan.shaopeng@fujitsu.com
>> Cc: Peter Newman peternewman@google.com
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> Drop tags
>> Fix component reset, otherwise cpbm wrong and controls not set.
>> Add a cfg_lock to guard configuration of an msc
> 
> The use of bitmap_set() for things that aren't unsigned long (arrays) is a bad
> idea. Much better to use GENMASK() to fill those.
> 
>> ---
>>  drivers/resctrl/mpam_devices.c  | 268 ++++++++++++++++++++++++++++++--
>>  drivers/resctrl/mpam_internal.h |  27 ++++
>>  2 files changed, 280 insertions(+), 15 deletions(-)

> Why manipulate a u32 with bitmap_set() with a horrible pretend it's an unsigned long cast.
> Instead just do:
> 		comp->cfg[i].cpbm = GENMASK(cprops->cpbm_wd, 0);
> Which is indeed what bitmap_set will do internally due to an optimization for small bitmaps
> but lets avoid that making one integer pretend to be another of a different length.
> 
> 
>> +		bitmap_set((unsigned long *)&comp->cfg[i].mbw_pbm, 0, cprops->mbw_pbm_bits);
>> +		bitmap_set((unsigned long *)&comp->cfg[i].mbw_max, 16 - cprops->bwa_wd, cprops->bwa_wd);
>> +	}
>> +}
> 

It is a bit nasty. I was trying to keep the concept of setting n bits and avoid 
considering explicitly what happens when n is zero.

These bitmap_set() can become:

if (cprops->cpbm_wd)
	comp->cfg[i].cpbm = GENMASK(cprops->cpbm_wd - 1, 0);
if (cprops->mbw_pbm_bits)
	comp->cfg[i].mbw_pbm = GENMASK(cprops->mbw_pbm_bits - 1, 0);
if (cprops->bwa_wd)
	comp->cfg[i].mbw_max = GENMASK(15, 16 - cprops->bwa_wd);

 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-10 17:27   ` Jonathan Cameron
  2025-11-11 17:45     ` Ben Horgan
@ 2025-11-14 10:33     ` Ben Horgan
  2025-11-14 14:34       ` Ben Horgan
  1 sibling, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-14 10:33 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Jonathan,

On 11/10/25 17:27, Jonathan Cameron wrote:
> On Fri, 7 Nov 2025 12:34:40 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
> 
>> From: James Morse <james.morse@arm.com>
>>
>> When CPUs come online the MSC's original configuration should be restored.
>>
>> Add struct mpam_config to hold the configuration. This has a bitmap of
>> features that were modified. Once the maximum partid is known, allocate
> 
> I'm not following 'were modified'.  When?  Sometime in the past?
> Perhaps "features that have been modified when XXX happens" or

The intent of the features bitmp is to only update the configuration in
hardware for the feautures that require it. On reset, this is all
features, but for a user configuration change this is just the
difference from what was previously set.

However, I don't think the difference part is currently working
correctly; the bitmap always has all the bits set and so any update
configures everything. I'll look into this.

> 
> Having read the code I think this is something like "are modified as configuration
> is read".
> 
>> a configuration array for each component, and reprogram each RIS
>> configuration from this.
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-11-14 10:33     ` Ben Horgan
@ 2025-11-14 14:34       ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-14 14:34 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Jonathan,

On 11/14/25 10:33, Ben Horgan wrote:
> Hi Jonathan,
> 
> On 11/10/25 17:27, Jonathan Cameron wrote:
>> On Fri, 7 Nov 2025 12:34:40 +0000
>> Ben Horgan <ben.horgan@arm.com> wrote:
>>
>>> From: James Morse <james.morse@arm.com>
>>>
>>> When CPUs come online the MSC's original configuration should be restored.
>>>
>>> Add struct mpam_config to hold the configuration. This has a bitmap of
>>> features that were modified. Once the maximum partid is known, allocate
>>
>> I'm not following 'were modified'.  When?  Sometime in the past?
>> Perhaps "features that have been modified when XXX happens" or
> 
> The intent of the features bitmp is to only update the configuration in
> hardware for the feautures that require it. On reset, this is all
> features, but for a user configuration change this is just the
> difference from what was previously set.

I wasn't quite correct here. The feature bitmap for each component
indicates which features have been changed (by user configuration) to a
value different from their default. These bits aren't unset when the
setting is changed back to the reset value. It can thus be used on power
restoration to restore the user config. I think this is what James meant
by "were modified".

> 
> However, I don't think the difference part is currently working
> correctly; the bitmap always has all the bits set and so any update
> configures everything. I'll look into this.
> 
>>
>> Having read the code I think this is something like "are modified as configuration
>> is read".
>>
>>> a configuration array for each component, and reprogram each RIS
>>> configuration from this.
> Thanks,
> 
> Ben
> 
> 

-- 
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 24/33] arm_mpam: Probe and reset the rest of the features
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (22 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:01   ` Gavin Shan
  2025-11-14  7:04   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 25/33] arm_mpam: Add helpers to allocate monitors Ben Horgan
                   ` (15 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Rohit Mathew, Zeng Heng, Dave Martin, Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.

Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.

PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
enum order and commas
---
 drivers/resctrl/mpam_devices.c  | 188 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  18 +++
 2 files changed, 206 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8b0944bdaf28..22b8e41a346b 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -258,6 +258,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
 	__mpam_part_sel_raw(partsel, msc);
 }
 
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+		      MPAMCFG_PART_SEL_INTERNAL;
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
 int mpam_register_requestor(u16 partid_max, u8 pmg_max)
 {
 	guard(spinlock)(&partid_max_lock);
@@ -656,10 +665,34 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct device *dev = &msc->pdev->dev;
 	struct mpam_props *props = &ris->props;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	lockdep_assert_held(&msc->probe_lock);
 	lockdep_assert_held(&msc->part_sel_lock);
 
+	/* Cache Capacity Partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+		if (props->cmax_wd &&
+		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+		if (props->cassoc_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cassoc, props);
+	}
+
 	/* Cache Portion partitioning */
 	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
 		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -682,6 +715,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
 		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
 			mpam_set_feature(mpam_feat_mbw_max, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_min, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_prop, props);
+	}
+
+	/* Priority partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+		u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+		props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+		if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_intpri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+		}
+
+		props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+		if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_dspri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+		}
 	}
 
 	/* Performance Monitoring */
@@ -706,6 +764,9 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 
 				mpam_set_feature(mpam_feat_msmon_csu, props);
 
+				if (FIELD_GET(MPAMF_CSUMON_IDR_HAS_XCL, csumonidr))
+					mpam_set_feature(mpam_feat_msmon_csu_xcl, props);
+
 				/* Is NRDY hardware managed? */
 				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
 				if (hw_managed)
@@ -727,6 +788,9 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			if (props->num_mbwu_mon)
 				mpam_set_feature(mpam_feat_msmon_mbwu, props);
 
+			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
+				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
 			/* Is NRDY hardware managed? */
 			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
 			if (hw_managed)
@@ -738,6 +802,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			 */
 		}
 	}
+
+	/*
+	 * RIS with PARTID narrowing don't have enough storage for one
+	 * configuration per PARTID. If these are in a class we could use,
+	 * reduce the supported partid_max to match the number of intpartid.
+	 * If the class is unknown, just ignore it.
+	 */
+	if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+	    class->type != MPAM_CLASS_UNKNOWN) {
+		u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+		u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+		mpam_set_feature(mpam_feat_partid_nrw, props);
+		msc->partid_max = min(msc->partid_max, partid_max);
+	}
 }
 
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -837,12 +916,28 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 				      struct mpam_config *cfg)
 {
+	u32 pri_val = 0;
+	u16 cmax = MPAMCFG_CMAX_CMAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
+	u16 dspri = GENMASK(rprops->dspri_wd, 0);
+	u16 intpri = GENMASK(rprops->intpri_wd, 0);
 
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
+	if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+		/* Update the intpartid mapping */
+		mpam_write_partsel_reg(msc, INTPARTID,
+				       MPAMCFG_INTPARTID_INTERNAL | partid);
+
+		/*
+		 * Then switch to the 'internal' partid to update the
+		 * configuration.
+		 */
+		__mpam_intpart_sel(ris->ris_idx, partid, msc);
+	}
+
 	if (mpam_has_feature(mpam_feat_cpor_part, rprops) &&
 	    mpam_has_feature(mpam_feat_cpor_part, cfg)) {
 		if (cfg->reset_cpbm)
@@ -873,6 +968,35 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 			mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
 	}
 
+	if (mpam_has_feature(mpam_feat_mbw_prop, rprops) &&
+	    mpam_has_feature(mpam_feat_mbw_prop, cfg))
+		mpam_write_partsel_reg(msc, MBW_PROP, 0);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+		mpam_write_partsel_reg(msc, CMAX, cmax);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+		mpam_write_partsel_reg(msc, CMIN, 0);
+
+	if (mpam_has_feature(mpam_feat_cmax_cassoc, rprops))
+		mpam_write_partsel_reg(msc, CASSOC, MPAMCFG_CASSOC_CASSOC);
+
+	if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+	    mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+		/* aces high? */
+		if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+			intpri = 0;
+		if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+			dspri = 0;
+
+		if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+		if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+		mpam_write_partsel_reg(msc, PRI, pri_val);
+	}
+
 	mutex_unlock(&msc->part_sel_lock);
 }
 
@@ -1314,6 +1438,18 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
 		return true;
 	if (mpam_has_feature(mpam_feat_mbw_max, props))
 		return true;
+	if (mpam_has_feature(mpam_feat_mbw_prop, props))
+		return true;
+	return false;
+}
+
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+		return true;
 	return false;
 }
 
@@ -1372,6 +1508,23 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
 	}
 
+	if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+		parent->cmax_wd = child->cmax_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+				     cmax_wd, alias)) {
+		pr_debug("%s took the min cmax_wd\n", __func__);
+		parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+		parent->cassoc_wd = child->cassoc_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+				   cassoc_wd, alias)) {
+		pr_debug("%s cleared cassoc_wd\n", __func__);
+		mpam_clear_feature(mpam_feat_cmax_cassoc, parent);
+		parent->cassoc_wd = 0;
+	}
+
 	/* For num properties, take the minimum */
 	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
 		parent->num_csu_mon = child->num_csu_mon;
@@ -1391,6 +1544,41 @@ static void __props_mismatch(struct mpam_props *parent,
 					   child->num_mbwu_mon);
 	}
 
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+		parent->intpri_wd = child->intpri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+				   intpri_wd, alias)) {
+		pr_debug("%s took the min intpri_wd\n", __func__);
+		parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+		parent->dspri_wd = child->dspri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+				   dspri_wd, alias)) {
+		pr_debug("%s took the min dspri_wd\n", __func__);
+		parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+	}
+
+	/* TODO: alias support for these two */
+	/* {int,ds}pri may not have differing 0-low behaviour */
+	if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+		pr_debug("%s cleared intpri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_intpri_part, parent);
+		mpam_clear_feature(mpam_feat_intpri_part_0_low, parent);
+	}
+	if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+		pr_debug("%s cleared dspri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_dspri_part, parent);
+		mpam_clear_feature(mpam_feat_dspri_part_0_low, parent);
+	}
+
 	if (alias) {
 		/* Merge features for aliased resources */
 		bitmap_or(parent->features, parent->features, child->features, MPAM_FEATURE_LAST);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 842d32f148b5..b947f30fbf14 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -141,14 +141,28 @@ static inline void mpam_mon_sel_lock_init(struct mpam_msc *msc)
 /* Bits for mpam features bitmaps */
 enum mpam_device_features {
 	mpam_feat_cpor_part,
+	mpam_feat_cmax_softlim,
+	mpam_feat_cmax_cmax,
+	mpam_feat_cmax_cmin,
+	mpam_feat_cmax_cassoc,
 	mpam_feat_mbw_part,
 	mpam_feat_mbw_min,
 	mpam_feat_mbw_max,
+	mpam_feat_mbw_prop,
+	mpam_feat_intpri_part,
+	mpam_feat_intpri_part_0_low,
+	mpam_feat_dspri_part,
+	mpam_feat_dspri_part_0_low,
 	mpam_feat_msmon,
 	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_capture,
+	mpam_feat_msmon_csu_xcl,
 	mpam_feat_msmon_csu_hw_nrdy,
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_capture,
+	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
+	mpam_feat_partid_nrw,
 	MPAM_FEATURE_LAST
 };
 
@@ -158,6 +172,10 @@ struct mpam_props {
 	u16			cpbm_wd;
 	u16			mbw_pbm_bits;
 	u16			bwa_wd;
+	u16			cmax_wd;
+	u16			cassoc_wd;
+	u16			intpri_wd;
+	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 24/33] arm_mpam: Probe and reset the rest of the features
  2025-11-07 12:34 ` [PATCH 24/33] arm_mpam: Probe and reset the rest of the features Ben Horgan
@ 2025-11-09 23:01   ` Gavin Shan
  2025-11-14  7:04   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:01 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Zeng Heng, Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> MPAM supports more features than are going to be exposed to resctrl.
> For partid other than 0, the reset values of these controls isn't
> known.
> 
> Discover the rest of the features so they can be reset to avoid any
> side effects when resctrl is in use.
> 
> PARTID narrowing allows MSC/RIS to support less configuration space than
> is usable. If this feature is found on a class of device we are likely
> to use, then reduce the partid_max to make it usable. This allows us
> to map a PARTID to itself.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> CC: Zeng Heng <zengheng4@huawei.com>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> enum order and commas
> ---
>   drivers/resctrl/mpam_devices.c  | 188 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  18 +++
>   2 files changed, 206 insertions(+)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 24/33] arm_mpam: Probe and reset the rest of the features
  2025-11-07 12:34 ` [PATCH 24/33] arm_mpam: Probe and reset the rest of the features Ben Horgan
  2025-11-09 23:01   ` Gavin Shan
@ 2025-11-14  7:04   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-14  7:04 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com,
	Rohit Mathew, Zeng Heng, Dave Martin

> From: James Morse <james.morse@arm.com>
> 
> MPAM supports more features than are going to be exposed to resctrl.
> For partid other than 0, the reset values of these controls isn't known.
> 
> Discover the rest of the features so they can be reset to avoid any side effects
> when resctrl is in use.
> 
> PARTID narrowing allows MSC/RIS to support less configuration space than is
> usable. If this feature is found on a class of device we are likely to use, then
> reduce the partid_max to make it usable. This allows us to map a PARTID to
> itself.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> CC: Zeng Heng <zengheng4@huawei.com>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 25/33] arm_mpam: Add helpers to allocate monitors
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (23 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 24/33] arm_mpam: Probe and reset the rest of the features Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:02   ` Gavin Shan
  2025-11-14  7:14   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value Ben Horgan
                   ` (14 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: James Morse <james.morse@arm.com>

MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  |  2 ++
 drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 22b8e41a346b..f51ffb1400db 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -412,6 +412,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type)
 	class->level = level_idx;
 	class->type = type;
 	INIT_LIST_HEAD_RCU(&class->classes_list);
+	ida_init(&class->ida_csu_mon);
+	ida_init(&class->ida_mbwu_mon);
 
 	list_add_rcu(&class->classes_list, &mpam_classes);
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b947f30fbf14..e59c0f4d9ada 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -198,6 +198,9 @@ struct mpam_class {
 	/* member of mpam_classes */
 	struct list_head	classes_list;
 
+	struct ida		ida_csu_mon;
+	struct ida		ida_mbwu_mon;
+
 	struct mpam_garbage	garbage;
 };
 
@@ -277,6 +280,38 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_max(&class->ida_csu_mon, cprops->num_csu_mon - 1,
+			     GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+	ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_max(&class->ida_mbwu_mon, cprops->num_mbwu_mon - 1,
+			     GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+	ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
 /* List of all classes - protected by srcu*/
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 25/33] arm_mpam: Add helpers to allocate monitors
  2025-11-07 12:34 ` [PATCH 25/33] arm_mpam: Add helpers to allocate monitors Ben Horgan
@ 2025-11-09 23:02   ` Gavin Shan
  2025-11-14  7:14   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:02 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> MPAM's MSC support a number of monitors, each of which supports
> bandwidth counters, or cache-storage-utilisation counters. To use
> a counter, a monitor needs to be configured. Add helpers to allocate
> and free CSU or MBWU monitors.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c  |  2 ++
>   drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
>   2 files changed, 37 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 25/33] arm_mpam: Add helpers to allocate monitors
  2025-11-07 12:34 ` [PATCH 25/33] arm_mpam: Add helpers to allocate monitors Ben Horgan
  2025-11-09 23:02   ` Gavin Shan
@ 2025-11-14  7:14   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-14  7:14 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

> From: James Morse <james.morse@arm.com>
> 
> MPAM's MSC support a number of monitors, each of which supports
> bandwidth counters, or cache-storage-utilisation counters. To use a counter, a
> monitor needs to be configured. Add helpers to allocate and free CSU or MBWU
> monitors.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (24 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 25/33] arm_mpam: Add helpers to allocate monitors Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:13   ` Gavin Shan
  2025-11-12  5:33   ` Shaopeng Tan (Fujitsu)
  2025-11-07 12:34 ` [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management Ben Horgan
                   ` (13 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan, Ben Horgan

From: James Morse <james.morse@arm.com>

Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.

Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.

CC: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Add tag - thanks
Bring config_mismatch into this commit (Jonathan)
whitespace
---
 drivers/resctrl/mpam_devices.c  | 237 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  19 +++
 2 files changed, 256 insertions(+)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index f51ffb1400db..86abbac5e1ad 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -886,6 +886,243 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+struct mon_read {
+	struct mpam_msc_ris		*ris;
+	struct mon_cfg			*ctx;
+	enum mpam_device_features	type;
+	u64				*val;
+	int				err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				   u32 *flt_val)
+{
+	struct mon_cfg *ctx = m->ctx;
+
+	/*
+	 * For CSU counters its implementation-defined what happens when not
+	 * filtering by partid.
+	 */
+	*ctl_val = MSMON_CFG_x_CTL_MATCH_PARTID;
+
+	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
+
+	if (m->ctx->match_pmg) {
+		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
+	}
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val |= MSMON_CFG_CSU_CTL_TYPE_CSU;
+
+		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
+			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
+					       ctx->csu_exclude_clean);
+
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val |= MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+			*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+
+		break;
+	default:
+		return;
+	}
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				    u32 *flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+		return;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		return;
+	default:
+		return;
+	}
+}
+
+/* Remove values set by the hardware to prevent apparent mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+				     u32 flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	/*
+	 * Write the ctl_val with the enable bit cleared, reset the counter,
+	 * then enable counter.
+	 */
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, CSU, 0);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	case mpam_feat_msmon_mbwu:
+		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		/* Counting monitors require NRDY to be reset by software */
+		mpam_write_monsel_reg(msc, MBWU, 0);
+		break;
+	default:
+		return;
+	}
+}
+
+static void __ris_msmon_read(void *arg)
+{
+	u64 now;
+	bool nrdy = false;
+	bool config_mismatch;
+	struct mon_read *m = arg;
+	struct mon_cfg *ctx = m->ctx;
+	struct mpam_msc_ris *ris = m->ris;
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+	if (!mpam_mon_sel_lock(msc)) {
+		m->err = -EIO;
+		return;
+	}
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+	/*
+	 * Read the existing configuration to avoid re-writing the same values.
+	 * This saves waiting for 'nrdy' on subsequent reads.
+	 */
+	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+	clean_msmon_ctl_val(&cur_ctl);
+	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+	config_mismatch = cur_flt != flt_val ||
+			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+	if (config_mismatch)
+		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		now = mpam_read_monsel_reg(msc, CSU);
+		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	case mpam_feat_msmon_mbwu:
+		now = mpam_read_monsel_reg(msc, MBWU);
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	default:
+		m->err = -EINVAL;
+		break;
+	}
+	mpam_mon_sel_unlock(msc);
+
+	if (nrdy) {
+		m->err = -EBUSY;
+		return;
+	}
+
+	now = FIELD_GET(MSMON___VALUE, now);
+	*m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+	int err, any_err = 0;
+	struct mpam_vmsc *vmsc;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		struct mpam_msc *msc = vmsc->msc;
+		struct mpam_msc_ris *ris;
+
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			arg->ris = ris;
+
+			err = smp_call_function_any(&msc->accessibility,
+						    __ris_msmon_read, arg,
+						    true);
+			if (!err && arg->err)
+				err = arg->err;
+
+			/*
+			 * Save one error to be returned to the caller, but
+			 * keep reading counters so that get reprogrammed. On
+			 * platforms with NRDY this lets us wait once.
+			 */
+			if (err)
+				any_err = err;
+		}
+	}
+
+	return any_err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features type, u64 *val)
+{
+	int err;
+	struct mon_read arg;
+	u64 wait_jiffies = 0;
+	struct mpam_props *cprops = &comp->class->props;
+
+	might_sleep();
+
+	if (!mpam_is_enabled())
+		return -EIO;
+
+	if (!mpam_has_feature(type, cprops))
+		return -EOPNOTSUPP;
+
+	arg = (struct mon_read) {
+		.ctx = ctx,
+		.type = type,
+		.val = val,
+	};
+	*val = 0;
+
+	err = _msmon_read(comp, &arg);
+	if (err == -EBUSY && comp->class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+	while (wait_jiffies)
+		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+	if (err == -EBUSY) {
+		arg = (struct mon_read) {
+			.ctx = ctx,
+			.type = type,
+			.val = val,
+		};
+		*val = 0;
+
+		err = _msmon_read(comp, &arg);
+	}
+
+	return err;
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e59c0f4d9ada..d8f8e29987e0 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -184,6 +184,22 @@ struct mpam_props {
 #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
 #define mpam_clear_feature(_feat, x)	clear_bit(_feat, (x)->features)
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	u16			mon;
+	u8			pmg;
+	bool			match_pmg;
+	bool			csu_exclude_clean;
+	u32			partid;
+	enum mon_filter_options opts;
+};
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -327,6 +343,9 @@ void mpam_disable(struct work_struct *work);
 int mpam_apply_config(struct mpam_component *comp, u16 partid,
 		      struct mpam_config *cfg);
 
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features, u64 *val);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-11-07 12:34 ` [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value Ben Horgan
@ 2025-11-09 23:13   ` Gavin Shan
  2025-11-14 10:07     ` Ben Horgan
  2025-11-12  5:33   ` Shaopeng Tan (Fujitsu)
  1 sibling, 1 reply; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:13 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Ben,

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
> 
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
> 
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Add tag - thanks
> Bring config_mismatch into this commit (Jonathan)
> whitespace
> ---
>   drivers/resctrl/mpam_devices.c  | 237 ++++++++++++++++++++++++++++++++
>   drivers/resctrl/mpam_internal.h |  19 +++
>   2 files changed, 256 insertions(+)
> 

With below minor comments addressed:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index f51ffb1400db..86abbac5e1ad 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -886,6 +886,243 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>   	return 0;
>   }
>   
> +struct mon_read {
> +	struct mpam_msc_ris		*ris;
> +	struct mon_cfg			*ctx;
> +	enum mpam_device_features	type;
> +	u64				*val;
> +	int				err;
> +};
> +
> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				   u32 *flt_val)
> +{
> +	struct mon_cfg *ctx = m->ctx;
> +
> +	/*
> +	 * For CSU counters its implementation-defined what happens when not
> +	 * filtering by partid.
> +	 */
> +	*ctl_val = MSMON_CFG_x_CTL_MATCH_PARTID;
> +
> +	*flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
> +
> +	if (m->ctx->match_pmg) {
> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
> +		*flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
> +	}
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val |= MSMON_CFG_CSU_CTL_TYPE_CSU;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
> +			*flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
> +					       ctx->csu_exclude_clean);

{} missed here.

> +
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val |= MSMON_CFG_MBWU_CTL_TYPE_MBWU;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
> +			*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
> +
> +		break;
> +	default:
> +		return;

s/return/break

We may add a error message here:

		pr_warn("Unsupported feature %d\n", m->type);
		

> +	}
> +}
> +
> +static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				    u32 *flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
> +		return;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		return;
> +	default:
> +		return;
> +	}
> +}

s/return/break for all cases, and maybe a warning message for the
default case.

> +
> +/* Remove values set by the hardware to prevent apparent mismatches. */
> +static void clean_msmon_ctl_val(u32 *cur_ctl)
> +{
> +	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +}
> +

May be worthy be to a inline function.

> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> +				     u32 flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	/*
> +	 * Write the ctl_val with the enable bit cleared, reset the counter,
> +	 * then enable counter.
> +	 */
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, CSU, 0);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		/* Counting monitors require NRDY to be reset by software */
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +		break;
> +	default:
> +		return;

s/return/break, and maybe a warning message for the default case.

> +	}
> +}
> +
> +static void __ris_msmon_read(void *arg)
> +{
> +	u64 now;
> +	bool nrdy = false;
> +	bool config_mismatch;
> +	struct mon_read *m = arg;
> +	struct mon_cfg *ctx = m->ctx;
> +	struct mpam_msc_ris *ris = m->ris;
> +	struct mpam_props *rprops = &ris->props;
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> +
> +	if (!mpam_mon_sel_lock(msc)) {
> +		m->err = -EIO;
> +		return;
> +	}
> +	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
> +		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +	/*
> +	 * Read the existing configuration to avoid re-writing the same values.
> +	 * This saves waiting for 'nrdy' on subsequent reads.
> +	 */
> +	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
> +	clean_msmon_ctl_val(&cur_ctl);
> +	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> +	config_mismatch = cur_flt != flt_val ||
> +			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
> +
> +	if (config_mismatch)
> +		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		now = mpam_read_monsel_reg(msc, CSU);
> +		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		now = mpam_read_monsel_reg(msc, MBWU);
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	default:
> +		m->err = -EINVAL;
> +		break;

This 'break' can be droped.

> +	}
> +	mpam_mon_sel_unlock(msc);
> +
> +	if (nrdy) {
> +		m->err = -EBUSY;
> +		return;
> +	}
> +
> +	now = FIELD_GET(MSMON___VALUE, now);
> +	*m->val += now;
> +}
> +
> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> +	int err, any_err = 0;
> +	struct mpam_vmsc *vmsc;
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
> +				 srcu_read_lock_held(&mpam_srcu)) {
> +		struct mpam_msc *msc = vmsc->msc;
> +		struct mpam_msc_ris *ris;
> +
> +		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
> +					 srcu_read_lock_held(&mpam_srcu)) {
> +			arg->ris = ris;
> +
> +			err = smp_call_function_any(&msc->accessibility,
> +						    __ris_msmon_read, arg,
> +						    true);
> +			if (!err && arg->err)
> +				err = arg->err;
> +
> +			/*
> +			 * Save one error to be returned to the caller, but
> +			 * keep reading counters so that get reprogrammed. On
> +			 * platforms with NRDY this lets us wait once.
> +			 */
> +			if (err)
> +				any_err = err;
> +		}
> +	}
> +
> +	return any_err;
> +}
> +
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features type, u64 *val)
> +{
> +	int err;
> +	struct mon_read arg;
> +	u64 wait_jiffies = 0;
> +	struct mpam_props *cprops = &comp->class->props;
> +
> +	might_sleep();
> +
> +	if (!mpam_is_enabled())
> +		return -EIO;
> +
> +	if (!mpam_has_feature(type, cprops))
> +		return -EOPNOTSUPP;
> +
> +	arg = (struct mon_read) {
> +		.ctx = ctx,
> +		.type = type,
> +		.val = val,
> +	};
> +	*val = 0;
> +
> +	err = _msmon_read(comp, &arg);
> +	if (err == -EBUSY && comp->class->nrdy_usec)
> +		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
> +
> +	while (wait_jiffies)
> +		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
> +
> +	if (err == -EBUSY) {
> +		arg = (struct mon_read) {
> +			.ctx = ctx,
> +			.type = type,
> +			.val = val,
> +		};
> +		*val = 0;
> +
> +		err = _msmon_read(comp, &arg);
> +	}
> +
> +	return err;
> +}
> +
>   static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>   {
>   	u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index e59c0f4d9ada..d8f8e29987e0 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -184,6 +184,22 @@ struct mpam_props {
>   #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
>   #define mpam_clear_feature(_feat, x)	clear_bit(_feat, (x)->features)
>   
> +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> +enum mon_filter_options {
> +	COUNT_BOTH	= 0,
> +	COUNT_WRITE	= 1,
> +	COUNT_READ	= 2,
> +};
> +
> +struct mon_cfg {
> +	u16			mon;
> +	u8			pmg;
> +	bool			match_pmg;
> +	bool			csu_exclude_clean;
> +	u32			partid;
> +	enum mon_filter_options opts;
> +};
> +
>   struct mpam_class {
>   	/* mpam_components in this class */
>   	struct list_head	components;
> @@ -327,6 +343,9 @@ void mpam_disable(struct work_struct *work);
>   int mpam_apply_config(struct mpam_component *comp, u16 partid,
>   		      struct mpam_config *cfg);
>   
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features, u64 *val);
> +
>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>   				   cpumask_t *affinity);
>   
Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-11-09 23:13   ` Gavin Shan
@ 2025-11-14 10:07     ` Ben Horgan
  0 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-14 10:07 UTC (permalink / raw)
  To: Gavin Shan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

Hi Gavin,

On 11/9/25 23:13, Gavin Shan wrote:
> Hi Ben,
> 
> On 11/7/25 10:34 PM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Reading a monitor involves configuring what you want to monitor, and
>> reading the value. Components made up of multiple MSC may need values
>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>> The maximum 'not ready' time should have been provided by firmware.
>>
>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>> not ready, then wait the full timeout value before trying again.
>>
>> CC: Shanker Donthineni <sdonthineni@nvidia.com>
>> Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
>> Tested-by: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v3:
>> Add tag - thanks
>> Bring config_mismatch into this commit (Jonathan)
>> whitespace
>> ---
>>   drivers/resctrl/mpam_devices.c  | 237 ++++++++++++++++++++++++++++++++
>>   drivers/resctrl/mpam_internal.h |  19 +++
>>   2 files changed, 256 insertions(+)
>>
> 
> With below minor comments addressed:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>

Thanks!

> 
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> index f51ffb1400db..86abbac5e1ad 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -886,6 +886,243 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>       return 0;
>>   }
>>   +struct mon_read {
>> +    struct mpam_msc_ris        *ris;
>> +    struct mon_cfg            *ctx;
>> +    enum mpam_device_features    type;
>> +    u64                *val;
>> +    int                err;
>> +};
>> +
>> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>> +                   u32 *flt_val)
>> +{
>> +    struct mon_cfg *ctx = m->ctx;
>> +
>> +    /*
>> +     * For CSU counters its implementation-defined what happens when not
>> +     * filtering by partid.
>> +     */
>> +    *ctl_val = MSMON_CFG_x_CTL_MATCH_PARTID;
>> +
>> +    *flt_val = FIELD_PREP(MSMON_CFG_x_FLT_PARTID, ctx->partid);
>> +
>> +    if (m->ctx->match_pmg) {
>> +        *ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>> +        *flt_val |= FIELD_PREP(MSMON_CFG_x_FLT_PMG, ctx->pmg);
>> +    }
>> +
>> +    switch (m->type) {
>> +    case mpam_feat_msmon_csu:
>> +        *ctl_val |= MSMON_CFG_CSU_CTL_TYPE_CSU;
>> +
>> +        if (mpam_has_feature(mpam_feat_msmon_csu_xcl, &m->ris->props))
>> +            *flt_val |= FIELD_PREP(MSMON_CFG_CSU_FLT_XCL,
>> +                           ctx->csu_exclude_clean);
> 
> {} missed here.

Changed to one line.

> 
>> +
>> +        break;
>> +    case mpam_feat_msmon_mbwu:
>> +        *ctl_val |= MSMON_CFG_MBWU_CTL_TYPE_MBWU;
>> +
>> +        if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
>> +            *flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
>> +
>> +        break;
>> +    default:
>> +        return;
> 
> s/return/break
> 
> We may add a error message here:
> 
>         pr_warn("Unsupported feature %d\n", m->type);
> 

Using pr_warn("Unexpected monitor type %d\n", m->type);       
> 
>> +    }
>> +}
>> +
>> +static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>> +                    u32 *flt_val)
>> +{
>> +    struct mpam_msc *msc = m->ris->vmsc->msc;
>> +
>> +    switch (m->type) {
>> +    case mpam_feat_msmon_csu:
>> +        *ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
>> +        *flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
>> +        return;
>> +    case mpam_feat_msmon_mbwu:
>> +        *ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
>> +        *flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
>> +        return;
>> +    default:
>> +        return;
>> +    }
>> +}
> 
> s/return/break for all cases, and maybe a warning message for the
> default case.

Done

> 
>> +
>> +/* Remove values set by the hardware to prevent apparent mismatches. */
>> +static void clean_msmon_ctl_val(u32 *cur_ctl)
>> +{
>> +    *cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
>> +}
>> +
> 
> May be worthy be to a inline function.

Done

> 
>> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>> +                     u32 flt_val)
>> +{
>> +    struct mpam_msc *msc = m->ris->vmsc->msc;
>> +
>> +    /*
>> +     * Write the ctl_val with the enable bit cleared, reset the counter,
>> +     * then enable counter.
>> +     */
>> +    switch (m->type) {
>> +    case mpam_feat_msmon_csu:
>> +        mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
>> +        mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
>> +        mpam_write_monsel_reg(msc, CSU, 0);
>> +        mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val |
>> MSMON_CFG_x_CTL_EN);
>> +        break;
>> +    case mpam_feat_msmon_mbwu:
>> +        mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
>> +        mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>> +        mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val |
>> MSMON_CFG_x_CTL_EN);
>> +        /* Counting monitors require NRDY to be reset by software */
>> +        mpam_write_monsel_reg(msc, MBWU, 0);
>> +        break;
>> +    default:
>> +        return;
> 
> s/return/break, and maybe a warning message for the default case.

Done. Using the same warning as above.

> 
>> +    }
>> +}
>> +
>> +static void __ris_msmon_read(void *arg)
>> +{
>> +    u64 now;
>> +    bool nrdy = false;
>> +    bool config_mismatch;
>> +    struct mon_read *m = arg;
>> +    struct mon_cfg *ctx = m->ctx;
>> +    struct mpam_msc_ris *ris = m->ris;
>> +    struct mpam_props *rprops = &ris->props;
>> +    struct mpam_msc *msc = m->ris->vmsc->msc;
>> +    u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
>> +
>> +    if (!mpam_mon_sel_lock(msc)) {
>> +        m->err = -EIO;
>> +        return;
>> +    }
>> +    mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
>> +          FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
>> +    mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
>> +
>> +    /*
>> +     * Read the existing configuration to avoid re-writing the same
>> values.
>> +     * This saves waiting for 'nrdy' on subsequent reads.
>> +     */
>> +    read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
>> +    clean_msmon_ctl_val(&cur_ctl);
>> +    gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>> +    config_mismatch = cur_flt != flt_val ||
>> +              cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>> +
>> +    if (config_mismatch)
>> +        write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
>> +
>> +    switch (m->type) {
>> +    case mpam_feat_msmon_csu:
>> +        now = mpam_read_monsel_reg(msc, CSU);
>> +        if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
>> +            nrdy = now & MSMON___NRDY;
>> +        break;
>> +    case mpam_feat_msmon_mbwu:
>> +        now = mpam_read_monsel_reg(msc, MBWU);
>> +        if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
>> +            nrdy = now & MSMON___NRDY;
>> +        break;
>> +    default:
>> +        m->err = -EINVAL;
>> +        break;
> 
> This 'break' can be droped.

Done

> 
>> +    }
>> +    mpam_mon_sel_unlock(msc);
>> +
>> +    if (nrdy) {
>> +        m->err = -EBUSY;
>> +        return;
>> +    }
>> +
>> +    now = FIELD_GET(MSMON___VALUE, now);
>> +    *m->val += now;
>> +}
>> +
>> +static int _msmon_read(struct mpam_component *comp, struct mon_read
>> *arg)
>> +{
>> +    int err, any_err = 0;
>> +    struct mpam_vmsc *vmsc;
>> +
>> +    guard(srcu)(&mpam_srcu);
>> +    list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
>> +                 srcu_read_lock_held(&mpam_srcu)) {
>> +        struct mpam_msc *msc = vmsc->msc;
>> +        struct mpam_msc_ris *ris;
>> +
>> +        list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
>> +                     srcu_read_lock_held(&mpam_srcu)) {
>> +            arg->ris = ris;
>> +
>> +            err = smp_call_function_any(&msc->accessibility,
>> +                            __ris_msmon_read, arg,
>> +                            true);
>> +            if (!err && arg->err)
>> +                err = arg->err;
>> +
>> +            /*
>> +             * Save one error to be returned to the caller, but
>> +             * keep reading counters so that get reprogrammed. On
>> +             * platforms with NRDY this lets us wait once.
>> +             */
>> +            if (err)
>> +                any_err = err;
>> +        }
>> +    }
>> +
>> +    return any_err;
>> +}
>> +
>> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>> +            enum mpam_device_features type, u64 *val)
>> +{
>> +    int err;
>> +    struct mon_read arg;
>> +    u64 wait_jiffies = 0;
>> +    struct mpam_props *cprops = &comp->class->props;
>> +
>> +    might_sleep();
>> +
>> +    if (!mpam_is_enabled())
>> +        return -EIO;
>> +
>> +    if (!mpam_has_feature(type, cprops))
>> +        return -EOPNOTSUPP;
>> +
>> +    arg = (struct mon_read) {
>> +        .ctx = ctx,
>> +        .type = type,
>> +        .val = val,
>> +    };
>> +    *val = 0;
>> +
>> +    err = _msmon_read(comp, &arg);
>> +    if (err == -EBUSY && comp->class->nrdy_usec)
>> +        wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
>> +
>> +    while (wait_jiffies)
>> +        wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
>> +
>> +    if (err == -EBUSY) {
>> +        arg = (struct mon_read) {
>> +            .ctx = ctx,
>> +            .type = type,
>> +            .val = val,
>> +        };
>> +        *val = 0;
>> +
>> +        err = _msmon_read(comp, &arg);
>> +    }
>> +
>> +    return err;
>> +}
>> +
>>   static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16
>> wd)
>>   {
>>       u32 num_words, msb;
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/
>> mpam_internal.h
>> index e59c0f4d9ada..d8f8e29987e0 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -184,6 +184,22 @@ struct mpam_props {
>>   #define mpam_set_feature(_feat, x)    set_bit(_feat, (x)->features)
>>   #define mpam_clear_feature(_feat, x)    clear_bit(_feat, (x)->features)
>>   +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
>> +enum mon_filter_options {
>> +    COUNT_BOTH    = 0,
>> +    COUNT_WRITE    = 1,
>> +    COUNT_READ    = 2,
>> +};
>> +
>> +struct mon_cfg {
>> +    u16            mon;
>> +    u8            pmg;
>> +    bool            match_pmg;
>> +    bool            csu_exclude_clean;
>> +    u32            partid;
>> +    enum mon_filter_options opts;
>> +};
>> +
>>   struct mpam_class {
>>       /* mpam_components in this class */
>>       struct list_head    components;
>> @@ -327,6 +343,9 @@ void mpam_disable(struct work_struct *work);
>>   int mpam_apply_config(struct mpam_component *comp, u16 partid,
>>                 struct mpam_config *cfg);
>>   +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
>> +            enum mpam_device_features, u64 *val);
>> +
>>   int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
>> cache_level,
>>                      cpumask_t *affinity);
>>   
> Thanks,
> Gavin
> 

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-11-07 12:34 ` [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value Ben Horgan
  2025-11-09 23:13   ` Gavin Shan
@ 2025-11-12  5:33   ` Shaopeng Tan (Fujitsu)
  1 sibling, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-12  5:33 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com


> From: James Morse <james.morse@arm.com>
> 
> Reading a monitor involves configuring what you want to monitor, and reading
> the value. Components made up of multiple MSC may need values from each
> MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
> 
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns not ready,
> then wait the full timeout value before trying again.
> 
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (25 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:15   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state Ben Horgan
                   ` (12 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Zeng Heng, Ben Horgan

From: James Morse <james.morse@arm.com>

Bandwidth counters need to run continuously to correctly reflect the
bandwidth.

Save the counter state when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter. Support for
tracking overflow with the same structure will be added in a
subsequent commit.

Cc: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Drop tags
Fix correction accounting
Split out overflow checking
---
 drivers/resctrl/mpam_devices.c  | 126 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  21 +++++-
 2 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 86abbac5e1ad..2d1cef824b8e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -994,6 +994,7 @@ static void __ris_msmon_read(void *arg)
 	struct mon_read *m = arg;
 	struct mon_cfg *ctx = m->ctx;
 	struct mpam_msc_ris *ris = m->ris;
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1024,11 +1025,21 @@ static void __ris_msmon_read(void *arg)
 		now = mpam_read_monsel_reg(msc, CSU);
 		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
 		now = mpam_read_monsel_reg(msc, MBWU);
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
+
+		if (nrdy)
+			break;
+
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+
+		/* Include bandwidth consumed before the last hardware reset */
+		now += mbwu_state->correction;
 		break;
 	default:
 		m->err = -EINVAL;
@@ -1041,7 +1052,6 @@ static void __ris_msmon_read(void *arg)
 		return;
 	}
 
-	now = FIELD_GET(MSMON___VALUE, now);
 	*m->val += now;
 }
 
@@ -1239,6 +1249,67 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 	mutex_unlock(&msc->part_sel_lock);
 }
 
+/* Call with msc cfg_lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+	int i;
+	struct mon_read mwbu_arg;
+	struct mpam_msc_ris *ris = _ris;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		if (ris->mbwu_state[i].enabled) {
+			mwbu_arg.ris = ris;
+			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+			mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+			__ris_msmon_read(&mwbu_arg);
+		}
+	}
+
+	return 0;
+}
+
+/* Call with MSC cfg_lock held */
+static int mpam_save_mbwu_state(void *arg)
+{
+	int i;
+	u64 val;
+	struct mon_cfg *cfg;
+	u32 cur_flt, cur_ctl, mon_sel;
+	struct mpam_msc_ris *ris = arg;
+	struct msmon_mbwu_state *mbwu_state;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		mbwu_state = &ris->mbwu_state[i];
+		cfg = &mbwu_state->cfg;
+
+		if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+			return -EIO;
+
+		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+		val = mpam_read_monsel_reg(msc, MBWU);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+
+		cfg->mon = i;
+		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
+		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+		cfg->partid = FIELD_GET(MSMON_CFG_x_FLT_PARTID, cur_flt);
+		mbwu_state->correction += val;
+		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+		mpam_mon_sel_unlock(msc);
+	}
+
+	return 0;
+}
+
 static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
 {
 	*reset_cfg = (struct mpam_config) {
@@ -1310,6 +1381,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 * for non-zero partid may be lost while the CPUs are offline.
 		 */
 		ris->in_reset_state = online;
+
+		if (mpam_is_enabled() && !online)
+			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
 	}
 	mutex_unlock(&msc->cfg_lock);
 }
@@ -1364,6 +1438,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
 			mpam_touch_msc(msc, __write_config, &arg);
 		}
 		ris->in_reset_state = reset;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
 	}
 	mutex_unlock(&msc->cfg_lock);
 }
@@ -2117,7 +2194,22 @@ static void mpam_unregister_irqs(void)
 
 static void __destroy_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
 	add_to_garbage(comp->cfg);
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		if (mpam_mon_sel_lock(msc)) {
+			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+				add_to_garbage(ris->mbwu_state);
+			mpam_mon_sel_unlock(msc);
+		}
+	}
 }
 
 static void mpam_reset_component_cfg(struct mpam_component *comp)
@@ -2141,6 +2233,8 @@ static void mpam_reset_component_cfg(struct mpam_component *comp)
 
 static int __allocate_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_vmsc *vmsc;
+
 	mpam_assert_partid_sizes_fixed();
 
 	if (comp->cfg)
@@ -2158,6 +2252,36 @@ static int __allocate_component_cfg(struct mpam_component *comp)
 
 	mpam_reset_component_cfg(comp);
 
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		struct mpam_msc *msc;
+		struct mpam_msc_ris *ris;
+		struct msmon_mbwu_state *mbwu_state;
+
+		if (!vmsc->props.num_mbwu_mon)
+			continue;
+
+		msc = vmsc->msc;
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->props.num_mbwu_mon)
+				continue;
+
+			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+					     sizeof(*ris->mbwu_state),
+					     GFP_KERNEL);
+			if (!mbwu_state) {
+				__destroy_component_cfg(comp);
+				return -ENOMEM;
+			}
+
+			init_garbage(&mbwu_state[0].garbage);
+
+			if (mpam_mon_sel_lock(msc)) {
+				ris->mbwu_state = mbwu_state;
+				mpam_mon_sel_unlock(msc);
+			}
+		}
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d8f8e29987e0..1f2b04b7703e 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -91,7 +91,10 @@ struct mpam_msc {
 	 */
 	struct mutex		part_sel_lock;
 
-	/* cfg_lock protects the msc configuration. */
+	/*
+	 * cfg_lock protects the msc configuration and guards against mbwu_state
+	 * and save and restore racing.
+	 */
 	struct mutex		cfg_lock;
 
 	/*
@@ -200,6 +203,19 @@ struct mon_cfg {
 	enum mon_filter_options opts;
 };
 
+/* Changes to msmon_mbwu_state are protected by the msc's mon_sel_lock. */
+struct msmon_mbwu_state {
+	bool		enabled;
+	struct mon_cfg	cfg;
+
+	/*
+	 * The value to add to the new reading to account for power management.
+	 */
+	u64		correction;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -293,6 +309,9 @@ struct mpam_msc_ris {
 	/* parent: */
 	struct mpam_vmsc	*vmsc;
 
+	/* msmon mbwu configuration is preserved over reset */
+	struct msmon_mbwu_state	*mbwu_state;
+
 	struct mpam_garbage	garbage;
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management
  2025-11-07 12:34 ` [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management Ben Horgan
@ 2025-11-09 23:15   ` Gavin Shan
  2025-11-10 13:49   ` Zeng Heng
  2025-11-10 17:31   ` Jonathan Cameron
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:15 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Zeng Heng

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> Save the counter state when the hardware is reset due to CPU hotplug.
> Add struct mbwu_state to track the bandwidth counter. Support for
> tracking overflow with the same structure will be added in a
> subsequent commit.
> 
> Cc: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Drop tags
> Fix correction accounting
> Split out overflow checking
> ---
>   drivers/resctrl/mpam_devices.c  | 126 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  21 +++++-
>   2 files changed, 145 insertions(+), 2 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management
  2025-11-07 12:34 ` [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management Ben Horgan
  2025-11-09 23:15   ` Gavin Shan
@ 2025-11-10 13:49   ` Zeng Heng
  2025-11-10 17:31   ` Jonathan Cameron
  2 siblings, 0 replies; 147+ messages in thread
From: Zeng Heng @ 2025-11-10 13:49 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 2025/11/7 20:34, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> Save the counter state when the hardware is reset due to CPU hotplug.
> Add struct mbwu_state to track the bandwidth counter. Support for
> tracking overflow with the same structure will be added in a
> subsequent commit.
> 
> Cc: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Drop tags
> Fix correction accounting
> Split out overflow checking
> ---
>   drivers/resctrl/mpam_devices.c  | 126 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  21 +++++-
>   2 files changed, 145 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 86abbac5e1ad..2d1cef824b8e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -994,6 +994,7 @@ static void __ris_msmon_read(void *arg)
>   	struct mon_read *m = arg;
>   	struct mon_cfg *ctx = m->ctx;
>   	struct mpam_msc_ris *ris = m->ris;
> +	struct msmon_mbwu_state *mbwu_state;
>   	struct mpam_props *rprops = &ris->props;
>   	struct mpam_msc *msc = m->ris->vmsc->msc;
>   	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> @@ -1024,11 +1025,21 @@ static void __ris_msmon_read(void *arg)
>   		now = mpam_read_monsel_reg(msc, CSU);
>   		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
>   			nrdy = now & MSMON___NRDY;
> +		now = FIELD_GET(MSMON___VALUE, now);
>   		break;
>   	case mpam_feat_msmon_mbwu:
>   		now = mpam_read_monsel_reg(msc, MBWU);
>   		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
>   			nrdy = now & MSMON___NRDY;
> +		now = FIELD_GET(MSMON___VALUE, now);
> +
> +		if (nrdy)
> +			break;
> +
> +		mbwu_state = &ris->mbwu_state[ctx->mon];
> +
> +		/* Include bandwidth consumed before the last hardware reset */
> +		now += mbwu_state->correction;
>   		break;
>   	default:
>   		m->err = -EINVAL;
> @@ -1041,7 +1052,6 @@ static void __ris_msmon_read(void *arg)
>   		return;
>   	}
>   
> -	now = FIELD_GET(MSMON___VALUE, now);
>   	*m->val += now;
>   }
>   
> @@ -1239,6 +1249,67 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>   	mutex_unlock(&msc->part_sel_lock);
>   }
>   
> +/* Call with msc cfg_lock held */
> +static int mpam_restore_mbwu_state(void *_ris)
> +{
> +	int i;
> +	struct mon_read mwbu_arg;
> +	struct mpam_msc_ris *ris = _ris;
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		if (ris->mbwu_state[i].enabled) {
> +			mwbu_arg.ris = ris;
> +			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
> +			mwbu_arg.type = mpam_feat_msmon_mbwu;
> +
> +			__ris_msmon_read(&mwbu_arg);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/* Call with MSC cfg_lock held */
> +static int mpam_save_mbwu_state(void *arg)
> +{
> +	int i;
> +	u64 val;
> +	struct mon_cfg *cfg;
> +	u32 cur_flt, cur_ctl, mon_sel;
> +	struct mpam_msc_ris *ris = arg;
> +	struct msmon_mbwu_state *mbwu_state;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		mbwu_state = &ris->mbwu_state[i];
> +		cfg = &mbwu_state->cfg;
> +
> +		if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
> +			return -EIO;
> +
> +		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
> +			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
> +
> +		val = mpam_read_monsel_reg(msc, MBWU);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +
> +		cfg->mon = i;
> +		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
> +		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
> +		cfg->partid = FIELD_GET(MSMON_CFG_x_FLT_PARTID, cur_flt);
> +		mbwu_state->correction += val;
> +		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
> +		mpam_mon_sel_unlock(msc);
> +	}
> +
> +	return 0;
> +}
> +
>   static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
>   {
>   	*reset_cfg = (struct mpam_config) {
> @@ -1310,6 +1381,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>   		 * for non-zero partid may be lost while the CPUs are offline.
>   		 */
>   		ris->in_reset_state = online;
> +
> +		if (mpam_is_enabled() && !online)
> +			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
>   	}
>   	mutex_unlock(&msc->cfg_lock);
>   }
> @@ -1364,6 +1438,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
>   			mpam_touch_msc(msc, __write_config, &arg);
>   		}
>   		ris->in_reset_state = reset;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
> +			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
>   	}
>   	mutex_unlock(&msc->cfg_lock);
>   }
> @@ -2117,7 +2194,22 @@ static void mpam_unregister_irqs(void)
>   
>   static void __destroy_component_cfg(struct mpam_component *comp)
>   {
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
>   	add_to_garbage(comp->cfg);
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		if (mpam_mon_sel_lock(msc)) {
> +			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
> +				add_to_garbage(ris->mbwu_state);
> +			mpam_mon_sel_unlock(msc);
> +		}
> +	}
>   }
>   
>   static void mpam_reset_component_cfg(struct mpam_component *comp)
> @@ -2141,6 +2233,8 @@ static void mpam_reset_component_cfg(struct mpam_component *comp)
>   
>   static int __allocate_component_cfg(struct mpam_component *comp)
>   {
> +	struct mpam_vmsc *vmsc;
> +
>   	mpam_assert_partid_sizes_fixed();
>   
>   	if (comp->cfg)
> @@ -2158,6 +2252,36 @@ static int __allocate_component_cfg(struct mpam_component *comp)
>   
>   	mpam_reset_component_cfg(comp);
>   
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		struct mpam_msc *msc;
> +		struct mpam_msc_ris *ris;
> +		struct msmon_mbwu_state *mbwu_state;
> +
> +		if (!vmsc->props.num_mbwu_mon)
> +			continue;
> +
> +		msc = vmsc->msc;
> +		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
> +			if (!ris->props.num_mbwu_mon)
> +				continue;
> +
> +			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
> +					     sizeof(*ris->mbwu_state),
> +					     GFP_KERNEL);
> +			if (!mbwu_state) {
> +				__destroy_component_cfg(comp);
> +				return -ENOMEM;
> +			}
> +
> +			init_garbage(&mbwu_state[0].garbage);
> +
> +			if (mpam_mon_sel_lock(msc)) {
> +				ris->mbwu_state = mbwu_state;
> +				mpam_mon_sel_unlock(msc);
> +			}
> +		}
> +	}
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d8f8e29987e0..1f2b04b7703e 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -91,7 +91,10 @@ struct mpam_msc {
>   	 */
>   	struct mutex		part_sel_lock;
>   
> -	/* cfg_lock protects the msc configuration. */
> +	/*
> +	 * cfg_lock protects the msc configuration and guards against mbwu_state
> +	 * and save and restore racing.
> +	 */
>   	struct mutex		cfg_lock;
>   
>   	/*
> @@ -200,6 +203,19 @@ struct mon_cfg {
>   	enum mon_filter_options opts;
>   };
>   
> +/* Changes to msmon_mbwu_state are protected by the msc's mon_sel_lock. */
> +struct msmon_mbwu_state {
> +	bool		enabled;
> +	struct mon_cfg	cfg;
> +
> +	/*
> +	 * The value to add to the new reading to account for power management.
> +	 */
> +	u64		correction;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
>   struct mpam_class {
>   	/* mpam_components in this class */
>   	struct list_head	components;
> @@ -293,6 +309,9 @@ struct mpam_msc_ris {
>   	/* parent: */
>   	struct mpam_vmsc	*vmsc;
>   
> +	/* msmon mbwu configuration is preserved over reset */
> +	struct msmon_mbwu_state	*mbwu_state;
> +
>   	struct mpam_garbage	garbage;
>   };
>   

Reviewed-by: Zeng Heng <zengheng4@huawei.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management
  2025-11-07 12:34 ` [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management Ben Horgan
  2025-11-09 23:15   ` Gavin Shan
  2025-11-10 13:49   ` Zeng Heng
@ 2025-11-10 17:31   ` Jonathan Cameron
  2 siblings, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 17:31 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Zeng Heng

On Fri, 7 Nov 2025 12:34:44 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> Save the counter state when the hardware is reset due to CPU hotplug.
> Add struct mbwu_state to track the bandwidth counter. Support for
> tracking overflow with the same structure will be added in a
> subsequent commit.
> 
> Cc: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

One trivial below.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

>  
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d8f8e29987e0..1f2b04b7703e 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -91,7 +91,10 @@ struct mpam_msc {
>  	 */
>  	struct mutex		part_sel_lock;
>  
> -	/* cfg_lock protects the msc configuration. */
> +	/*
> +	 * cfg_lock protects the msc configuration and guards against mbwu_state
> +	 * and save and restore racing.

Stray "and" that isn't needed on this second line.

> +	 */
>  	struct mutex		cfg_lock;
>  



^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (26 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:16   ` Gavin Shan
                     ` (2 more replies)
  2025-11-07 12:34 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters Ben Horgan
                   ` (11 subsequent siblings)
  39 siblings, 3 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Zeng Heng

Use the overflow status bit to track overflow on each bandwidth counter
read and add the counter size to the correction when overflow is detected.

This assumes that only a single overflow has occurred since the last read
of the counter. Overflow interrupts, on hardware that supports them could
be used to remove this limitation.

Cc: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 24 ++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  3 ++-
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2d1cef824b8e..eea082dfcddc 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -986,11 +986,18 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 	}
 }
 
+static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
+{
+	/* TODO: scaling, and long counters */
+	return BIT_ULL(hweight_long(MSMON___VALUE));
+}
+
 static void __ris_msmon_read(void *arg)
 {
 	u64 now;
 	bool nrdy = false;
 	bool config_mismatch;
+	bool overflow;
 	struct mon_read *m = arg;
 	struct mon_cfg *ctx = m->ctx;
 	struct mpam_msc_ris *ris = m->ris;
@@ -1012,13 +1019,20 @@ static void __ris_msmon_read(void *arg)
 	 * This saves waiting for 'nrdy' on subsequent reads.
 	 */
 	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+	overflow = cur_ctl & MSMON_CFG_x_CTL_OFLOW_STATUS;
+
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
 	config_mismatch = cur_flt != flt_val ||
 			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
 
-	if (config_mismatch)
+	if (config_mismatch) {
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+		overflow = false;
+	} else if (overflow) {
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL,
+				      cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS);
+	}
 
 	switch (m->type) {
 	case mpam_feat_msmon_csu:
@@ -1038,7 +1052,13 @@ static void __ris_msmon_read(void *arg)
 
 		mbwu_state = &ris->mbwu_state[ctx->mon];
 
-		/* Include bandwidth consumed before the last hardware reset */
+		if (overflow)
+			mbwu_state->correction += mpam_msmon_overflow_val(m->type);
+
+		/*
+		 * Include bandwidth consumed before the last hardware reset and
+		 * a counter size increment for each overflow.
+		 */
 		now += mbwu_state->correction;
 		break;
 	default:
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1f2b04b7703e..7c99d4f3dc9c 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -209,7 +209,8 @@ struct msmon_mbwu_state {
 	struct mon_cfg	cfg;
 
 	/*
-	 * The value to add to the new reading to account for power management.
+	 * The value to add to the new reading to account for power management,
+	 * and overflow.
 	 */
 	u64		correction;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state
  2025-11-07 12:34 ` [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state Ben Horgan
@ 2025-11-09 23:16   ` Gavin Shan
  2025-11-10 13:50   ` Zeng Heng
  2025-11-10 17:32   ` Jonathan Cameron
  2 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:16 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Zeng Heng

On 11/7/25 10:34 PM, Ben Horgan wrote:
> Use the overflow status bit to track overflow on each bandwidth counter
> read and add the counter size to the correction when overflow is detected.
> 
> This assumes that only a single overflow has occurred since the last read
> of the counter. Overflow interrupts, on hardware that supports them could
> be used to remove this limitation.
> 
> Cc: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c  | 24 ++++++++++++++++++++++--
>   drivers/resctrl/mpam_internal.h |  3 ++-
>   2 files changed, 24 insertions(+), 3 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state
  2025-11-07 12:34 ` [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state Ben Horgan
  2025-11-09 23:16   ` Gavin Shan
@ 2025-11-10 13:50   ` Zeng Heng
  2025-11-10 17:32   ` Jonathan Cameron
  2 siblings, 0 replies; 147+ messages in thread
From: Zeng Heng @ 2025-11-10 13:50 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 2025/11/7 20:34, Ben Horgan wrote:
> Use the overflow status bit to track overflow on each bandwidth counter
> read and add the counter size to the correction when overflow is detected.
> 
> This assumes that only a single overflow has occurred since the last read
> of the counter. Overflow interrupts, on hardware that supports them could
> be used to remove this limitation.
> 
> Cc: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_devices.c  | 24 ++++++++++++++++++++++--
>   drivers/resctrl/mpam_internal.h |  3 ++-
>   2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 2d1cef824b8e..eea082dfcddc 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -986,11 +986,18 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>   	}
>   }
>   
> +static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
> +{
> +	/* TODO: scaling, and long counters */
> +	return BIT_ULL(hweight_long(MSMON___VALUE));
> +}
> +
>   static void __ris_msmon_read(void *arg)
>   {
>   	u64 now;
>   	bool nrdy = false;
>   	bool config_mismatch;
> +	bool overflow;
>   	struct mon_read *m = arg;
>   	struct mon_cfg *ctx = m->ctx;
>   	struct mpam_msc_ris *ris = m->ris;
> @@ -1012,13 +1019,20 @@ static void __ris_msmon_read(void *arg)
>   	 * This saves waiting for 'nrdy' on subsequent reads.
>   	 */
>   	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
> +	overflow = cur_ctl & MSMON_CFG_x_CTL_OFLOW_STATUS;
> +
>   	clean_msmon_ctl_val(&cur_ctl);
>   	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
>   	config_mismatch = cur_flt != flt_val ||
>   			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
>   
> -	if (config_mismatch)
> +	if (config_mismatch) {
>   		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +		overflow = false;
> +	} else if (overflow) {
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL,
> +				      cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS);
> +	}
>   
>   	switch (m->type) {
>   	case mpam_feat_msmon_csu:
> @@ -1038,7 +1052,13 @@ static void __ris_msmon_read(void *arg)
>   
>   		mbwu_state = &ris->mbwu_state[ctx->mon];
>   
> -		/* Include bandwidth consumed before the last hardware reset */
> +		if (overflow)
> +			mbwu_state->correction += mpam_msmon_overflow_val(m->type);
> +
> +		/*
> +		 * Include bandwidth consumed before the last hardware reset and
> +		 * a counter size increment for each overflow.
> +		 */
>   		now += mbwu_state->correction;
>   		break;
>   	default:
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 1f2b04b7703e..7c99d4f3dc9c 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -209,7 +209,8 @@ struct msmon_mbwu_state {
>   	struct mon_cfg	cfg;
>   
>   	/*
> -	 * The value to add to the new reading to account for power management.
> +	 * The value to add to the new reading to account for power management,
> +	 * and overflow.
>   	 */
>   	u64		correction;
>   

Reviewed-by: Zeng Heng <zengheng4@huawei.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state
  2025-11-07 12:34 ` [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state Ben Horgan
  2025-11-09 23:16   ` Gavin Shan
  2025-11-10 13:50   ` Zeng Heng
@ 2025-11-10 17:32   ` Jonathan Cameron
  2 siblings, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 17:32 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Zeng Heng

On Fri, 7 Nov 2025 12:34:45 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> Use the overflow status bit to track overflow on each bandwidth counter
> read and add the counter size to the correction when overflow is detected.
> 
> This assumes that only a single overflow has occurred since the last read
> of the counter. Overflow interrupts, on hardware that supports them could
> be used to remove this limitation.
> 
> Cc: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (27 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:16   ` Gavin Shan
  2025-11-07 12:34 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported Ben Horgan
                   ` (10 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: Rohit Mathew <rohit.mathew@arm.com>

mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
indicating support for long counters.

Probe these feature bits.

The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
monitors are supported, instead of muddling this with which size of
bandwidth monitors, add an explicit 31 bit counter feature.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[ morse: Added 31bit counter feature to simplify later logic ]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Only set the exact counter length that is supported
---
 drivers/resctrl/mpam_devices.c  | 34 +++++++++++++++++++++------------
 drivers/resctrl/mpam_internal.h |  3 +++
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index eea082dfcddc..e93f2e919a85 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -783,25 +783,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 				dev_err_once(dev, "Counters are not usable because not-ready timeout was not provided by firmware.");
 		}
 		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
-			bool hw_managed;
+			bool has_long, hw_managed;
 			u32 mbwumon_idr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
 
 			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumon_idr);
-			if (props->num_mbwu_mon)
+			if (props->num_mbwu_mon) {
 				mpam_set_feature(mpam_feat_msmon_mbwu, props);
 
-			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
-				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+				if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumon_idr))
+					mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
 
-			/* Is NRDY hardware managed? */
-			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
-			if (hw_managed)
-				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+				has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumon_idr);
+				if (has_long) {
+					if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumon_idr))
+						mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+					else
+						mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+				} else
+					mpam_set_feature(mpam_feat_msmon_mbwu_31counter, props);
 
-			/*
-			 * Don't warn about any missing firmware property for
-			 * MBWU NRDY - it doesn't make any sense!
-			 */
+				/* Is NRDY hardware managed? */
+				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+				if (hw_managed)
+					mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+				/*
+				 * Don't warn about any missing firmware property for
+				 * MBWU NRDY - it doesn't make any sense!
+				 */
+			}
 		}
 	}
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 7c99d4f3dc9c..f3bf26b7fdaf 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -162,6 +162,9 @@ enum mpam_device_features {
 	mpam_feat_msmon_csu_xcl,
 	mpam_feat_msmon_csu_hw_nrdy,
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_31counter,
+	mpam_feat_msmon_mbwu_44counter,
+	mpam_feat_msmon_mbwu_63counter,
 	mpam_feat_msmon_mbwu_capture,
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
  2025-11-07 12:34 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters Ben Horgan
@ 2025-11-09 23:16   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:16 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> mpam v0.1 and versions above v1.0 support optional long counter for
> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
> indicating support for long counters.
> 
> Probe these feature bits.
> 
> The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
> monitors are supported, instead of muddling this with which size of
> bandwidth monitors, add an explicit 31 bit counter feature.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [ morse: Added 31bit counter feature to simplify later logic ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Only set the exact counter length that is supported
> ---
>   drivers/resctrl/mpam_devices.c  | 34 +++++++++++++++++++++------------
>   drivers/resctrl/mpam_internal.h |  3 +++
>   2 files changed, 25 insertions(+), 12 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (28 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:17   ` Gavin Shan
  2025-11-07 12:34 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state Ben Horgan
                   ` (9 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan, Shaopeng Tan

From: Rohit Mathew <rohit.mathew@arm.com>

Now that the larger counter sizes are probed, make use of them.

Callers of mpam_msmon_read() may not know (or care!) about the different
counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
driver pick the counter to use.

Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit, added explicit counter selection ]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Cc: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Peter:
Fix type checking, use mpam_feat_msmon_mbwu_<n>counter
Reset/configuration order of long counters
---
 drivers/resctrl/mpam_devices.c | 145 ++++++++++++++++++++++++++++-----
 1 file changed, 126 insertions(+), 19 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e93f2e919a85..59d12ee1195d 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -904,6 +904,50 @@ struct mon_read {
 	int				err;
 };
 
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+	int retry = 3;
+	u32 mbwu_l_low;
+	u64 mbwu_l_high1, mbwu_l_high2;
+
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+	do {
+		mbwu_l_high1 = mbwu_l_high2;
+		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+		retry--;
+	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+	if (mbwu_l_high1 == mbwu_l_high2)
+		return (mbwu_l_high1 << 32) | mbwu_l_low;
+
+	pr_warn("Failed to read a stable value\n");
+	return MSMON___L_NRDY;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
+	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
 static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 				   u32 *flt_val)
 {
@@ -931,7 +975,9 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 					       ctx->csu_exclude_clean);
 
 		break;
-	case mpam_feat_msmon_mbwu:
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
 		*ctl_val |= MSMON_CFG_MBWU_CTL_TYPE_MBWU;
 
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
@@ -953,7 +999,9 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
 		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
 		return;
-	case mpam_feat_msmon_mbwu:
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
 		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
 		return;
@@ -966,6 +1014,9 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 static void clean_msmon_ctl_val(u32 *cur_ctl)
 {
 	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+
+	if (FIELD_GET(MSMON_CFG_x_CTL_TYPE, *cur_ctl) == MSMON_CFG_MBWU_CTL_TYPE_MBWU)
+		*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
 }
 
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -984,12 +1035,17 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CSU, 0);
 		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 		break;
-	case mpam_feat_msmon_mbwu:
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
 		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 		/* Counting monitors require NRDY to be reset by software */
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (m->type == mpam_feat_msmon_mbwu_31counter)
+			mpam_write_monsel_reg(msc, MBWU, 0);
+		else
+			mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
 		break;
 	default:
 		return;
@@ -998,8 +1054,17 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 
 static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
 {
-	/* TODO: scaling, and long counters */
-	return BIT_ULL(hweight_long(MSMON___VALUE));
+	/* TODO: implement scaling counters */
+	switch (type) {
+	case mpam_feat_msmon_mbwu_63counter:
+		return BIT_ULL(hweight_long(MSMON___LWD_VALUE));
+	case mpam_feat_msmon_mbwu_44counter:
+		return BIT_ULL(hweight_long(MSMON___L_VALUE));
+	case mpam_feat_msmon_mbwu_31counter:
+		return BIT_ULL(hweight_long(MSMON___VALUE));
+	default:
+		return 0;
+	}
 }
 
 static void __ris_msmon_read(void *arg)
@@ -1029,7 +1094,12 @@ static void __ris_msmon_read(void *arg)
 	 * This saves waiting for 'nrdy' on subsequent reads.
 	 */
 	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
-	overflow = cur_ctl & MSMON_CFG_x_CTL_OFLOW_STATUS;
+
+	if (mpam_feat_msmon_mbwu_31counter == m->type)
+		overflow = cur_ctl & MSMON_CFG_x_CTL_OFLOW_STATUS;
+	else if (mpam_feat_msmon_mbwu_44counter == m->type ||
+		 mpam_feat_msmon_mbwu_63counter == m->type)
+		overflow = cur_ctl & MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
 
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
@@ -1041,7 +1111,9 @@ static void __ris_msmon_read(void *arg)
 		overflow = false;
 	} else if (overflow) {
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL,
-				      cur_ctl & ~MSMON_CFG_x_CTL_OFLOW_STATUS);
+				      cur_ctl &
+				      ~(MSMON_CFG_x_CTL_OFLOW_STATUS |
+					MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L));
 	}
 
 	switch (m->type) {
@@ -1051,11 +1123,24 @@ static void __ris_msmon_read(void *arg)
 			nrdy = now & MSMON___NRDY;
 		now = FIELD_GET(MSMON___VALUE, now);
 		break;
-	case mpam_feat_msmon_mbwu:
-		now = mpam_read_monsel_reg(msc, MBWU);
-		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
-			nrdy = now & MSMON___NRDY;
-		now = FIELD_GET(MSMON___VALUE, now);
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
+		if (m->type != mpam_feat_msmon_mbwu_31counter) {
+			now = mpam_msc_read_mbwu_l(msc);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___L_NRDY;
+
+			if (m->type == mpam_feat_msmon_mbwu_63counter)
+				now = FIELD_GET(MSMON___LWD_VALUE, now);
+			else
+				now = FIELD_GET(MSMON___L_VALUE, now);
+		} else {
+			now = mpam_read_monsel_reg(msc, MBWU);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY;
+			now = FIELD_GET(MSMON___VALUE, now);
+		}
 
 		if (nrdy)
 			break;
@@ -1119,13 +1204,26 @@ static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
 	return any_err;
 }
 
+static enum mpam_device_features mpam_msmon_choose_counter(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, cprops))
+		return mpam_feat_msmon_mbwu_63counter;
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, cprops))
+		return mpam_feat_msmon_mbwu_44counter;
+
+	return mpam_feat_msmon_mbwu_31counter;
+}
+
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features type, u64 *val)
 {
 	int err;
 	struct mon_read arg;
 	u64 wait_jiffies = 0;
-	struct mpam_props *cprops = &comp->class->props;
+	struct mpam_class *class = comp->class;
+	struct mpam_props *cprops = &class->props;
 
 	might_sleep();
 
@@ -1135,6 +1233,9 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	if (!mpam_has_feature(type, cprops))
 		return -EOPNOTSUPP;
 
+	if (type == mpam_feat_msmon_mbwu)
+		type = mpam_msmon_choose_counter(class);
+
 	arg = (struct mon_read) {
 		.ctx = ctx,
 		.type = type,
@@ -1143,8 +1244,8 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	*val = 0;
 
 	err = _msmon_read(comp, &arg);
-	if (err == -EBUSY && comp->class->nrdy_usec)
-		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+	if (err == -EBUSY && class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(class->nrdy_usec);
 
 	while (wait_jiffies)
 		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
@@ -1285,12 +1386,13 @@ static int mpam_restore_mbwu_state(void *_ris)
 	int i;
 	struct mon_read mwbu_arg;
 	struct mpam_msc_ris *ris = _ris;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
 		if (ris->mbwu_state[i].enabled) {
 			mwbu_arg.ris = ris;
 			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
-			mwbu_arg.type = mpam_feat_msmon_mbwu;
+			mwbu_arg.type = mpam_msmon_choose_counter(class);
 
 			__ris_msmon_read(&mwbu_arg);
 		}
@@ -1325,8 +1427,13 @@ static int mpam_save_mbwu_state(void *arg)
 		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
 
-		val = mpam_read_monsel_reg(msc, MBWU);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			val = mpam_msc_read_mbwu_l(msc);
+			mpam_msc_zero_mbwu_l(msc);
+		} else {
+			val = mpam_read_monsel_reg(msc, MBWU);
+			mpam_write_monsel_reg(msc, MBWU, 0);
+		}
 
 		cfg->mon = i;
 		cfg->pmg = FIELD_GET(MSMON_CFG_x_FLT_PMG, cur_flt);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
  2025-11-07 12:34 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported Ben Horgan
@ 2025-11-09 23:17   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:17 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Shaopeng Tan

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> Now that the larger counter sizes are probed, make use of them.
> 
> Callers of mpam_msmon_read() may not know (or care!) about the different
> counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
> driver pick the counter to use.
> 
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit, added explicit counter selection ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Cc: Peter Newman <peternewman@google.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Peter:
> Fix type checking, use mpam_feat_msmon_mbwu_<n>counter
> Reset/configuration order of long counters
> ---
>   drivers/resctrl/mpam_devices.c | 145 ++++++++++++++++++++++++++++-----
>   1 file changed, 126 insertions(+), 19 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (29 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:18   ` Gavin Shan
  2025-11-10 17:34   ` Jonathan Cameron
  2025-11-07 12:34 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset Ben Horgan
                   ` (8 subsequent siblings)
  39 siblings, 2 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Fenghua Yu, Ben Horgan

From: James Morse <james.morse@arm.com>

resctrl expects to reset the bandwidth counters when the filesystem
is mounted.

To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Cc: Peter Newman <peternewman@google.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v3:
Correct type checking, use mpam_feat_msmon_mbwu_<n>counter
---
 drivers/resctrl/mpam_devices.c  | 48 ++++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  2 ++
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 59d12ee1195d..eb6417f57e23 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1075,6 +1075,7 @@ static void __ris_msmon_read(void *arg)
 	bool overflow;
 	struct mon_read *m = arg;
 	struct mon_cfg *ctx = m->ctx;
+	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
 	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
@@ -1089,6 +1090,20 @@ static void __ris_msmon_read(void *arg)
 		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
 	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
 
+	switch (m->type) {
+	case mpam_feat_msmon_mbwu_31counter:
+	case mpam_feat_msmon_mbwu_44counter:
+	case mpam_feat_msmon_mbwu_63counter:
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (mbwu_state) {
+			reset_on_next_read = mbwu_state->reset_on_next_read;
+			mbwu_state->reset_on_next_read = false;
+		}
+		break;
+	default:
+		break;
+	}
+
 	/*
 	 * Read the existing configuration to avoid re-writing the same values.
 	 * This saves waiting for 'nrdy' on subsequent reads.
@@ -1106,7 +1121,7 @@ static void __ris_msmon_read(void *arg)
 	config_mismatch = cur_flt != flt_val ||
 			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
 
-	if (config_mismatch) {
+	if (config_mismatch || reset_on_next_read) {
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
 		overflow = false;
 	} else if (overflow) {
@@ -1264,6 +1279,37 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	return err;
 }
 
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	if (!mpam_is_enabled())
+		return;
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(vmsc, &comp->vmsc, comp_list,
+				 srcu_read_lock_held(&mpam_srcu)) {
+		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+			continue;
+
+		msc = vmsc->msc;
+		list_for_each_entry_srcu(ris, &vmsc->ris, vmsc_list,
+					 srcu_read_lock_held(&mpam_srcu)) {
+			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+				continue;
+
+			if (WARN_ON_ONCE(!mpam_mon_sel_lock(msc)))
+				continue;
+
+			ris->mbwu_state[ctx->mon].correction = 0;
+			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+			mpam_mon_sel_unlock(msc);
+		}
+	}
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index f3bf26b7fdaf..0061d355ad3e 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -209,6 +209,7 @@ struct mon_cfg {
 /* Changes to msmon_mbwu_state are protected by the msc's mon_sel_lock. */
 struct msmon_mbwu_state {
 	bool		enabled;
+	bool		reset_on_next_read;
 	struct mon_cfg	cfg;
 
 	/*
@@ -368,6 +369,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
 
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
  2025-11-07 12:34 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state Ben Horgan
@ 2025-11-09 23:18   ` Gavin Shan
  2025-11-10 17:34   ` Jonathan Cameron
  1 sibling, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:18 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Fenghua Yu

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> resctrl expects to reset the bandwidth counters when the filesystem
> is mounted.
> 
> To allow this, add a helper that clears the saved mbwu state. Instead
> of cross calling to each CPU that can access the component MSC to
> write to the counter, set a flag that causes it to be zero'd on the
> the next read. This is easily done by forcing a configuration update.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Cc: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v3:
> Correct type checking, use mpam_feat_msmon_mbwu_<n>counter
> ---
>   drivers/resctrl/mpam_devices.c  | 48 ++++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  2 ++
>   2 files changed, 49 insertions(+), 1 deletion(-)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
  2025-11-07 12:34 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state Ben Horgan
  2025-11-09 23:18   ` Gavin Shan
@ 2025-11-10 17:34   ` Jonathan Cameron
  1 sibling, 0 replies; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-10 17:34 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Fenghua Yu

On Fri, 7 Nov 2025 12:34:48 +0000
Ben Horgan <ben.horgan@arm.com> wrote:

> From: James Morse <james.morse@arm.com>
> 
> resctrl expects to reset the bandwidth counters when the filesystem
> is mounted.
> 
> To allow this, add a helper that clears the saved mbwu state. Instead
> of cross calling to each CPU that can access the component MSC to
> write to the counter, set a flag that causes it to be zero'd on the
> the next read. This is easily done by forcing a configuration update.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Cc: Peter Newman <peternewman@google.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>




^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (30 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:19   ` Gavin Shan
  2025-11-07 12:34 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() Ben Horgan
                   ` (7 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Jonathan Cameron, Ben Horgan

From: James Morse <james.morse@arm.com>

The bitmap reset code has been a source of bugs. Add a unit test.

This currently has to be built in, as the rest of the driver is
builtin.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/Kconfig             |  9 ++++
 drivers/resctrl/mpam_devices.c      |  4 ++
 drivers/resctrl/test_mpam_devices.c | 69 +++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 drivers/resctrl/test_mpam_devices.c

diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index ef2f3adf64a9..6368376fa51d 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -12,4 +12,13 @@ config ARM64_MPAM_DRIVER_DEBUG
 	help
 	  Say yes here to enable debug messages from the MPAM driver.
 
+config MPAM_KUNIT_TEST
+	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+	depends on KUNIT=y
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to run tests in the MPAM driver.
+
+	  If unsure, say N.
+
 endif
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index eb6417f57e23..b04050ad351a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2723,3 +2723,7 @@ static int __init mpam_msc_driver_init(void)
 
 /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..0cfb41b665c4
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+	struct mpam_msc fake_msc = {};
+	u32 *test_result;
+
+	if (!buf)
+		return;
+
+	fake_msc.mapped_hwpage = buf;
+	fake_msc.mapped_hwpage_sz = SZ_16K;
+	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+	/* Satisfy lockdep checks */
+	mutex_init(&fake_msc.part_sel_lock);
+	mutex_lock(&fake_msc.part_sel_lock);
+
+	test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+	KUNIT_EXPECT_EQ(test, test_result[0], 1);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 1);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	{}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+	.name = "mpam_devices_test_suite",
+	.test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
  2025-11-07 12:34 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset Ben Horgan
@ 2025-11-09 23:19   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:19 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> The bitmap reset code has been a source of bugs. Add a unit test.
> 
> This currently has to be built in, as the rest of the driver is
> builtin.
> 
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/Kconfig             |  9 ++++
>   drivers/resctrl/mpam_devices.c      |  4 ++
>   drivers/resctrl/test_mpam_devices.c | 69 +++++++++++++++++++++++++++++
>   3 files changed, 82 insertions(+)
>   create mode 100644 drivers/resctrl/test_mpam_devices.c
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (31 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset Ben Horgan
@ 2025-11-07 12:34 ` Ben Horgan
  2025-11-09 23:19   ` Gavin Shan
  2025-11-07 12:47 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (6 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:34 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

From: James Morse <james.morse@arm.com>

When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.

Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
 drivers/resctrl/mpam_internal.h     |  14 +-
 drivers/resctrl/test_mpam_devices.c | 320 ++++++++++++++++++++++++++++
 2 files changed, 333 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 0061d355ad3e..622b120bad3b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -23,6 +23,12 @@ struct platform_device;
 
 DECLARE_STATIC_KEY_FALSE(mpam_enabled);
 
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
 static inline bool mpam_is_enabled(void)
 {
 	return static_branch_likely(&mpam_enabled);
@@ -184,7 +190,13 @@ struct mpam_props {
 	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
-};
+
+/*
+ * Kunit tests use memset() to set up feature combinations that should be
+ * removed, and will false-positive if the compiler introduces padding that
+ * isn't cleared during sanitisation.
+ */
+} PACKED_FOR_KUNIT;
 
 #define mpam_has_feature(_feat, x)	test_bit(_feat, (x)->features)
 #define mpam_set_feature(_feat, x)	set_bit(_feat, (x)->features)
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 0cfb41b665c4..3e8d564a0c64 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,324 @@
 
 #include <kunit/test.h>
 
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+	struct mpam_props parent = { 0 };
+	struct mpam_props child;
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, false);
+
+	memset(&child, 0, sizeof(child));
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, true);
+
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static struct list_head fake_classes_list;
+static struct mpam_class fake_class = { 0 };
+static struct mpam_component fake_comp1 = { 0 };
+static struct mpam_component fake_comp2 = { 0 };
+static struct mpam_vmsc fake_vmsc1 = { 0 };
+static struct mpam_vmsc fake_vmsc2 = { 0 };
+static struct mpam_msc fake_msc1 = { 0 };
+static struct mpam_msc fake_msc2 = { 0 };
+static struct mpam_msc_ris fake_ris1 = { 0 };
+static struct mpam_msc_ris fake_ris2 = { 0 };
+static struct platform_device fake_pdev = { 0 };
+
+static inline void reset_fake_hierarchy(void)
+{
+	INIT_LIST_HEAD(&fake_classes_list);
+
+	memset(&fake_class, 0, sizeof(fake_class));
+	fake_class.level = 3;
+	fake_class.type = MPAM_CLASS_CACHE;
+	INIT_LIST_HEAD_RCU(&fake_class.components);
+	INIT_LIST_HEAD(&fake_class.classes_list);
+
+	memset(&fake_comp1, 0, sizeof(fake_comp1));
+	memset(&fake_comp2, 0, sizeof(fake_comp2));
+	fake_comp1.comp_id = 1;
+	fake_comp2.comp_id = 2;
+	INIT_LIST_HEAD(&fake_comp1.vmsc);
+	INIT_LIST_HEAD(&fake_comp1.class_list);
+	INIT_LIST_HEAD(&fake_comp2.vmsc);
+	INIT_LIST_HEAD(&fake_comp2.class_list);
+
+	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));
+	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));
+	INIT_LIST_HEAD(&fake_vmsc1.ris);
+	INIT_LIST_HEAD(&fake_vmsc1.comp_list);
+	fake_vmsc1.msc = &fake_msc1;
+	INIT_LIST_HEAD(&fake_vmsc2.ris);
+	INIT_LIST_HEAD(&fake_vmsc2.comp_list);
+	fake_vmsc2.msc = &fake_msc2;
+
+	memset(&fake_ris1, 0, sizeof(fake_ris1));
+	memset(&fake_ris2, 0, sizeof(fake_ris2));
+	fake_ris1.ris_idx = 1;
+	INIT_LIST_HEAD(&fake_ris1.msc_list);
+	fake_ris2.ris_idx = 2;
+	INIT_LIST_HEAD(&fake_ris2.msc_list);
+
+	fake_msc1.pdev = &fake_pdev;
+	fake_msc2.pdev = &fake_pdev;
+
+	list_add(&fake_class.classes_list, &fake_classes_list);
+}
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+	reset_fake_hierarchy();
+
+	mutex_lock(&mpam_list_lock);
+
+	/* One Class+Comp, two RIS in one vMSC with common features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't control the same resource,
+	 * mismatched features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with incompatible overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 5;
+	fake_ris2.props.cpbm_wd = 3;
+	fake_ris1.props.mbw_pbm_bits = 5;
+	fake_ris2.props.mbw_pbm_bits = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't control the same resource,
+	 * mismatched features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+	reset_fake_hierarchy();
+
+	/* One Class+Comp, two MSC with overlapping features that need tweaking */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+	fake_ris1.props.bwa_wd = 5;
+	fake_ris2.props.bwa_wd = 3;
+	fake_ris1.props.cmax_wd = 5;
+	fake_ris2.props.cmax_wd = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * RIS with different control properties need to be sanitised so the
+	 * class has the common set of properties.
+	 */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+	reset_fake_hierarchy();
+
+	/* One Class Two Comp with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	reset_fake_hierarchy();
+
+	/* One Class Two Comp with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple components can't control the same resource, mismatched features can
+	 * not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	mutex_unlock(&mpam_list_lock);
+}
+
 static void test_mpam_reset_msc_bitmap(struct kunit *test)
 {
 	char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -58,6 +376,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
 
 static struct kunit_case mpam_devices_test_cases[] = {
 	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	KUNIT_CASE(test_mpam_enable_merge_features),
+	KUNIT_CASE(test__props_mismatch),
 	{}
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-11-07 12:34 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() Ben Horgan
@ 2025-11-09 23:19   ` Gavin Shan
  0 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-09 23:19 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
> 
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
> 
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>   drivers/resctrl/mpam_internal.h     |  14 +-
>   drivers/resctrl/test_mpam_devices.c | 320 ++++++++++++++++++++++++++++
>   2 files changed, 333 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (32 preceding siblings ...)
  2025-11-07 12:34 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() Ben Horgan
@ 2025-11-07 12:47 ` Ben Horgan
  2025-11-07 21:22 ` Fenghua Yu
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 147+ messages in thread
From: Ben Horgan @ 2025-11-07 12:47 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

This is a v4. Apologies for forgetting the version number!

On 11/7/25 12:34, Ben Horgan wrote:
> Hi all,
> 
> This version of the series comes to you from me as James is otherwise
> engaged. I hope I have done his work justice. I've made quite a few
Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (33 preceding siblings ...)
  2025-11-07 12:47 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
@ 2025-11-07 21:22 ` Fenghua Yu
  2025-11-07 23:22 ` Carl Worth
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 147+ messages in thread
From: Fenghua Yu @ 2025-11-07 21:22 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, gregkh,
	gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao



On 11/7/25 04:34, Ben Horgan wrote:
> Hi all,
> 
> This version of the series comes to you from me as James is otherwise
> engaged. I hope I have done his work justice. I've made quite a few
> changes, rework, bugs, typos, all the usual. In order to aid review,
> as Jonathan suggested, I've split out some patches and made an effort
> to minimise the amount of churn between patches.

[SNIP]

> This series is based on v6.18-rc4, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v4

Build, boot, ACPI parse, and Kunit tests pass.

> 
> The rest of the driver can be found here: (no updated version - based on v3)
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1
[SNIP]

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (34 preceding siblings ...)
  2025-11-07 21:22 ` Fenghua Yu
@ 2025-11-07 23:22 ` Carl Worth
  2025-11-10 16:15   ` Ben Horgan
  2025-11-10  1:05 ` Gavin Shan
                   ` (3 subsequent siblings)
  39 siblings, 1 reply; 147+ messages in thread
From: Carl Worth @ 2025-11-07 23:22 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao,
	Ben Horgan

Ben Horgan <ben.horgan@arm.com> writes:
> This version of the series comes to you from me as James is otherwise
> engaged. I hope I have done his work justice. I've made quite a few
> changes, rework, bugs, typos, all the usual. In order to aid review,
> as Jonathan suggested, I've split out some patches and made an effort
> to minimise the amount of churn between patches.

I've built this and booted on an Ampere system. It ends up reporting a
successful message of:

	MPAM enabled with 1 PARTIDs and 1 PMGs

So the code seems happy enough as far as that goes.

But the expected number of PARTIDs on this system is much larger than 1,
(MPAM with a single PARTID would not be useful at all).

> See below for a public branch. No public updated version of the
> snapshot (the rest of the driver) I'm afraid.

Looking closer, it looks like the bogus value of 0 for mpam_partid_max
is because the following patch (which does appear in James' various
snapshots) isn't present yet in the code submitted to this point:

	commit 33c1f50970917ac9f2a8e224d850936374df6173
	Author: James Morse <james.morse@arm.com>
	Date:   Fri Jul 4 14:22:30 2025 +0100
	
	    arm64: mpam: Advertise the CPUs MPAM limits to the driver
	    
	    Requestors need to populate the MPAM fields on the interconnect. For
	    the CPUs these fields are taken from the corresponding MPAMy_ELx
	    register. Each requestor may have a limit on the largest PARTID or
	    PMG value that can be used. The MPAM driver has to determine the
	    system-wide minimum supported PARTID and PMG values.
	    
	    To do this, the driver needs to be told what each requestor's
	    limit is.

So, I guess I'm wondering what more I could do to test this code at this
point, prior to merging it.

I'm very interested in seeing this code land upstream as soon as
possible, and I've got access to an implementation to test whatever I
can.

So let me know what else I can do and I'll be glad to contribute my
Tested-by when I've done it.

-Carl

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 23:22 ` Carl Worth
@ 2025-11-10 16:15   ` Ben Horgan
  2025-11-11  0:45     ` Carl Worth
  0 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-10 16:15 UTC (permalink / raw)
  To: Carl Worth, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi Carl,

On 11/7/25 23:22, Carl Worth wrote:
> Ben Horgan <ben.horgan@arm.com> writes:
>> This version of the series comes to you from me as James is otherwise
>> engaged. I hope I have done his work justice. I've made quite a few
>> changes, rework, bugs, typos, all the usual. In order to aid review,
>> as Jonathan suggested, I've split out some patches and made an effort
>> to minimise the amount of churn between patches.
> 
> I've built this and booted on an Ampere system. It ends up reporting a
> successful message of:
> 
> 	MPAM enabled with 1 PARTIDs and 1 PMGs
> 
> So the code seems happy enough as far as that goes.
> 
> But the expected number of PARTIDs on this system is much larger than 1,
> (MPAM with a single PARTID would not be useful at all).
> 
>> See below for a public branch. No public updated version of the
>> snapshot (the rest of the driver) I'm afraid.
> 
> Looking closer, it looks like the bogus value of 0 for mpam_partid_max
> is because the following patch (which does appear in James' various
> snapshots) isn't present yet in the code submitted to this point:
> 
> 	commit 33c1f50970917ac9f2a8e224d850936374df6173
> 	Author: James Morse <james.morse@arm.com>
> 	Date:   Fri Jul 4 14:22:30 2025 +0100
> 	
> 	    arm64: mpam: Advertise the CPUs MPAM limits to the driver
> 	    
> 	    Requestors need to populate the MPAM fields on the interconnect. For
> 	    the CPUs these fields are taken from the corresponding MPAMy_ELx
> 	    register. Each requestor may have a limit on the largest PARTID or
> 	    PMG value that can be used. The MPAM driver has to determine the
> 	    system-wide minimum supported PARTID and PMG values.
> 	    
> 	    To do this, the driver needs to be told what each requestor's
> 	    limit is.

Yes, the driver does barely anything without a requestor.

> 
> So, I guess I'm wondering what more I could do to test this code at this
> point, prior to merging it.
> 
> I'm very interested in seeing this code land upstream as soon as
> possible, and I've got access to an implementation to test whatever I
> can.
> 
> So let me know what else I can do and I'll be glad to contribute my
> Tested-by when I've done it.
> 
> -Carl

Thanks for the quick response and testing.

I've just responded to the cover letter with a branch containing the
rest of the driver. (Just so it's not hidden in this thread) It's
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.18-rc4

With that, you should be able to enable all usable PARTID and PMG, mount
resctrl, add tasks/cpus to resctrl control groups, run benchmarks to
check that the controls are respected and check that the monitors give
expected values.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-10 16:15   ` Ben Horgan
@ 2025-11-11  0:45     ` Carl Worth
  0 siblings, 0 replies; 147+ messages in thread
From: Carl Worth @ 2025-11-11  0:45 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Ben Horgan <ben.horgan@arm.com> writes:
> Yes, the driver does barely anything without a requestor.
...
> Thanks for the quick response and testing.

Sure thing!

> I've just responded to the cover letter with a branch containing the
> rest of the driver. (Just so it's not hidden in this thread) It's
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.18-rc4

Yes. I saw that. That's very helpful, so thank you!

> With that, you should be able to enable all usable PARTID and PMG, mount
> resctrl, add tasks/cpus to resctrl control groups, run benchmarks to
> check that the controls are respected and check that the monitors give
> expected values.

Indeed. I now get:

	MPAM enabled with 256 PARTIDs and 2 PMGs

And I have mounted resctrl, added cpus to resctrl control groups, and
then manipulated the L3 cache bitmap control and verified that
benchmarks respond as expected to changes to the control.

I was going to report that I am encountering an unexpected alignment
fault in one of the KUNIT tests, but then I realized that that's in
test_get_mba_granularity which comes in as part of the snapshot, and is
not part of the code that is submitted here. So I'll continue to
invetigate that, but that need not delay the code of interest in the
current series.

Given all of that, for the series:

Tested-by: Carl Worth <carl@os.amperecomputing.com>

-Carl

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (35 preceding siblings ...)
  2025-11-07 23:22 ` Carl Worth
@ 2025-11-10  1:05 ` Gavin Shan
  2025-11-10 13:56 ` Zeng Heng
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 147+ messages in thread
From: Gavin Shan @ 2025-11-10  1:05 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On 11/7/25 10:34 PM, Ben Horgan wrote:
[...]

> 
> This series is based on v6.18-rc4, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v4
> 

Run booting and kunit test on NVidia's grace-hopper machine. The result looks good.

Tested-by: Gavin Shan <gshan@redhat.com>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (36 preceding siblings ...)
  2025-11-10  1:05 ` Gavin Shan
@ 2025-11-10 13:56 ` Zeng Heng
  2025-11-10 16:03 ` Ben Horgan
  2025-11-16 17:16 ` Drew Fustini
  39 siblings, 0 replies; 147+ messages in thread
From: Zeng Heng @ 2025-11-10 13:56 UTC (permalink / raw)
  To: Ben Horgan, james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao


On 2025/11/7 20:34, Ben Horgan wrote:

> 
> The expectation is this will go via the arm64 tree.
> 
> This series is based on v6.18-rc4, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/v4
> 

Run booting and kunit test on Huawei's Kunpeng machine. The result looks 
fine.

Tested-by: Zeng Heng <zengheng4@huawei.com>

Best Regards,
Zeng Heng

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (37 preceding siblings ...)
  2025-11-10 13:56 ` Zeng Heng
@ 2025-11-10 16:03 ` Ben Horgan
  2025-11-11  7:09   ` Shaopeng Tan (Fujitsu)
  2025-11-16 17:16 ` Drew Fustini
  39 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-10 16:03 UTC (permalink / raw)
  To: james.morse
  Cc: amitsinght, baisheng.gao, baolin.wang, bobo.shaobowang, carl,
	catalin.marinas, dakr, dave.martin, david, dfustini, fenghuay,
	gregkh, gshan, guohanjun, jeremy.linton, jonathan.cameron, kobak,
	lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

Hi all,

On 11/7/25 12:34, Ben Horgan wrote:

> 
> The rest of the driver can be found here: (no updated version - based on v3)
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1
> 

James has kindly hosted a branch with my rebase of the rest of the
driver here.
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.18-rc4

This can be used to help with more testing.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-10 16:03 ` Ben Horgan
@ 2025-11-11  7:09   ` Shaopeng Tan (Fujitsu)
  0 siblings, 0 replies; 147+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-11-11  7:09 UTC (permalink / raw)
  To: 'Ben Horgan', james.morse@arm.com
  Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
	baolin.wang@linux.alibaba.com, bobo.shaobowang@huawei.com,
	carl@os.amperecomputing.com, catalin.marinas@arm.com,
	dakr@kernel.org, dave.martin@arm.com, david@redhat.com,
	dfustini@baylibre.com, fenghuay@nvidia.com,
	gregkh@linuxfoundation.org, gshan@redhat.com,
	guohanjun@huawei.com, jeremy.linton@arm.com,
	jonathan.cameron@huawei.com, kobak@nvidia.com,
	lcherian@marvell.com, lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, lpieralisi@kernel.org,
	peternewman@google.com, quic_jiles@quicinc.com, rafael@kernel.org,
	robh@kernel.org, rohit.mathew@arm.com,
	scott@os.amperecomputing.com, sdonthineni@nvidia.com,
	sudeep.holla@arm.com, will@kernel.org, xhao@linux.alibaba.com

Hello Ben,

> Hi all,
> 
> On 11/7/25 12:34, Ben Horgan wrote:
> 
> >
> > The rest of the driver can be found here: (no updated version - based
> > on v3) https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> > mpam/snapshot/v6.18-rc1

I ran Kunit tests on NVIDIA's grace machine. It seems no problem.

> James has kindly hosted a branch with my rebase of the rest of the driver here.
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpa
> m/snapshot/v6.18-rc4

I verified the cache allocation and monitoring on NVIDIA's grace machine. It also seems to be fine.

Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>

Best regards,
Shaopeng TAN


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
                   ` (38 preceding siblings ...)
  2025-11-10 16:03 ` Ben Horgan
@ 2025-11-16 17:16 ` Drew Fustini
  2025-11-18 14:11   ` Ben Horgan
  39 siblings, 1 reply; 147+ messages in thread
From: Drew Fustini @ 2025-11-16 17:16 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	jonathan.cameron, kobak, lcherian, lenb, linux-acpi,
	linux-arm-kernel, linux-kernel, lpieralisi, peternewman,
	quic_jiles, rafael, robh, rohit.mathew, scott, sdonthineni,
	sudeep.holla, tan.shaopeng, will, xhao

On Fri, Nov 07, 2025 at 12:34:17PM +0000, Ben Horgan wrote:
> Hi all,
[snip]
> The rest of the driver can be found here: (no updated version - based on v3)
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1

Does anyone know of a hosting platform that offers ARM machines that
have MPAM?

I see there are Ampere systems on Oracle but I wasn't sure if those have
MPAM.

I'm getting a RISC-V QoS patch series ready (Ssqosid ext and CBQRI spec)
and I'd like to get a better understanding of how resctrl works on MPAM
in practice.

Thanks,
Drew

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-16 17:16 ` Drew Fustini
@ 2025-11-18 14:11   ` Ben Horgan
  2025-11-18 22:55     ` Drew Fustini
  0 siblings, 1 reply; 147+ messages in thread
From: Ben Horgan @ 2025-11-18 14:11 UTC (permalink / raw)
  To: Drew Fustini
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	jonathan.cameron, kobak, lcherian, lenb, linux-acpi,
	linux-arm-kernel, linux-kernel, lpieralisi, peternewman,
	quic_jiles, rafael, robh, rohit.mathew, scott, sdonthineni,
	sudeep.holla, tan.shaopeng, will, xhao

Hi Drew,

On 11/16/25 17:16, Drew Fustini wrote:
> On Fri, Nov 07, 2025 at 12:34:17PM +0000, Ben Horgan wrote:
>> Hi all,
> [snip]
>> The rest of the driver can be found here: (no updated version - based on v3)
>> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1
> 
> Does anyone know of a hosting platform that offers ARM machines that
> have MPAM?

As far as I know there aren't any.

There is some MPAM support in the Orion Radxa board which is likely the
cheapest option. The MPAM acpi table isn't in the firmware though so
you'd need to load a custom table. James has this working.

> 
> I see there are Ampere systems on Oracle but I wasn't sure if those have
> MPAM.
> 
> I'm getting a RISC-V QoS patch series ready (Ssqosid ext and CBQRI spec)
> and I'd like to get a better understanding of how resctrl works on MPAM
> in practice.
> 
> Thanks,
> Drew


Thanks,

Ben


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-18 14:11   ` Ben Horgan
@ 2025-11-18 22:55     ` Drew Fustini
  2025-11-19 10:00       ` Jonathan Cameron
  0 siblings, 1 reply; 147+ messages in thread
From: Drew Fustini @ 2025-11-18 22:55 UTC (permalink / raw)
  To: Ben Horgan
  Cc: james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	jonathan.cameron, kobak, lcherian, lenb, linux-acpi,
	linux-arm-kernel, linux-kernel, lpieralisi, peternewman,
	quic_jiles, rafael, robh, rohit.mathew, scott, sdonthineni,
	sudeep.holla, tan.shaopeng, will, xhao

On Tue, Nov 18, 2025 at 02:11:31PM +0000, Ben Horgan wrote:
> Hi Drew,
> 
> On 11/16/25 17:16, Drew Fustini wrote:
> > On Fri, Nov 07, 2025 at 12:34:17PM +0000, Ben Horgan wrote:
> >> Hi all,
> > [snip]
> >> The rest of the driver can be found here: (no updated version - based on v3)
> >> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1
> > 
> > Does anyone know of a hosting platform that offers ARM machines that
> > have MPAM?
> 
> As far as I know there aren't any.
> 
> There is some MPAM support in the Orion Radxa board which is likely the
> cheapest option. The MPAM acpi table isn't in the firmware though so
> you'd need to load a custom table. James has this working.

Thank you, I didn't realize that there was a dev board that supports
MPAM. I didn't want to the expense or noise of a rackable server :)

Drew

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-18 22:55     ` Drew Fustini
@ 2025-11-19 10:00       ` Jonathan Cameron
  2025-11-19 20:09         ` Drew Fustini
  0 siblings, 1 reply; 147+ messages in thread
From: Jonathan Cameron @ 2025-11-19 10:00 UTC (permalink / raw)
  To: Drew Fustini
  Cc: Ben Horgan, james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Tue, 18 Nov 2025 14:55:07 -0800
Drew Fustini <fustini@kernel.org> wrote:

> On Tue, Nov 18, 2025 at 02:11:31PM +0000, Ben Horgan wrote:
> > Hi Drew,
> > 
> > On 11/16/25 17:16, Drew Fustini wrote:  
> > > On Fri, Nov 07, 2025 at 12:34:17PM +0000, Ben Horgan wrote:  
> > >> Hi all,  
> > > [snip]  
> > >> The rest of the driver can be found here: (no updated version - based on v3)
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1  
> > > 
> > > Does anyone know of a hosting platform that offers ARM machines that
> > > have MPAM?  
> > 
> > As far as I know there aren't any.
> > 
> > There is some MPAM support in the Orion Radxa board which is likely the
> > cheapest option. The MPAM acpi table isn't in the firmware though so
> > you'd need to load a custom table. James has this working.  
> 
> Thank you, I didn't realize that there was a dev board that supports
> MPAM. I didn't want to the expense or noise of a rackable server :)
> 
> Drew
> 
Hi Drew,

Obvious not functional as such, but I did spin qemu emulation with a bunch
of introspection so you could see what was configured.  Aim was to poke
corner cases more easily than with real hardware. Did it's job at the time
and shook out some bugs.

I haven't rebased it recently though.
https://lore.kernel.org/qemu-devel/20230808115713.2613-1-Jonathan.Cameron@huawei.com/

https://gitlab.com/jic23/qemu/-/commits/mpam-2023-sept-01
Has what looks to be a slightly more recent rebase.

No monitor support though.

I might bring this back to poke the rest of this series as it moves forwards
(or if anyone else wants to they are welcome to do so)

FWIW we could in theory hook this up to the cache plugins to get some 'plausible'
numbers, but I never bothered as we have hardware (as seen by tested-by's on this
series).

Jonathan

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-11-19 10:00       ` Jonathan Cameron
@ 2025-11-19 20:09         ` Drew Fustini
  0 siblings, 0 replies; 147+ messages in thread
From: Drew Fustini @ 2025-11-19 20:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ben Horgan, james.morse, amitsinght, baisheng.gao, baolin.wang,
	bobo.shaobowang, carl, catalin.marinas, dakr, dave.martin, david,
	dfustini, fenghuay, gregkh, gshan, guohanjun, jeremy.linton,
	kobak, lcherian, lenb, linux-acpi, linux-arm-kernel, linux-kernel,
	lpieralisi, peternewman, quic_jiles, rafael, robh, rohit.mathew,
	scott, sdonthineni, sudeep.holla, tan.shaopeng, will, xhao

On Wed, Nov 19, 2025 at 10:00:51AM +0000, Jonathan Cameron wrote:
> On Tue, 18 Nov 2025 14:55:07 -0800
> Drew Fustini <fustini@kernel.org> wrote:
> 
> > On Tue, Nov 18, 2025 at 02:11:31PM +0000, Ben Horgan wrote:
> > > Hi Drew,
> > > 
> > > On 11/16/25 17:16, Drew Fustini wrote:  
> > > > On Fri, Nov 07, 2025 at 12:34:17PM +0000, Ben Horgan wrote:  
> > > >> Hi all,  
> > > > [snip]  
> > > >> The rest of the driver can be found here: (no updated version - based on v3)
> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.18-rc1  
> > > > 
> > > > Does anyone know of a hosting platform that offers ARM machines that
> > > > have MPAM?  
> > > 
> > > As far as I know there aren't any.
> > > 
> > > There is some MPAM support in the Orion Radxa board which is likely the
> > > cheapest option. The MPAM acpi table isn't in the firmware though so
> > > you'd need to load a custom table. James has this working.  
> > 
> > Thank you, I didn't realize that there was a dev board that supports
> > MPAM. I didn't want to the expense or noise of a rackable server :)
> > 
> > Drew
> > 
> Hi Drew,
> 
> Obvious not functional as such, but I did spin qemu emulation with a bunch
> of introspection so you could see what was configured.  Aim was to poke
> corner cases more easily than with real hardware. Did it's job at the time
> and shook out some bugs.
> 
> I haven't rebased it recently though.
> https://lore.kernel.org/qemu-devel/20230808115713.2613-1-Jonathan.Cameron@huawei.com/
> 
> https://gitlab.com/jic23/qemu/-/commits/mpam-2023-sept-01
> Has what looks to be a slightly more recent rebase.
> 
> No monitor support though.
> 
> I might bring this back to poke the rest of this series as it moves forwards
> (or if anyone else wants to they are welcome to do so)
> 
> FWIW we could in theory hook this up to the cache plugins to get some 'plausible'
> numbers, but I never bothered as we have hardware (as seen by tested-by's on this
> series).
> 
> Jonathan

Thanks for pointing out your series and the branch from James. Qemu is
the cheapest way for me to try MPAM :)

Drew

^ permalink raw reply	[flat|nested] 147+ messages in thread

end of thread, other threads:[~2025-11-19 20:09 UTC | newest]

Thread overview: 147+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 12:34 [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
2025-11-07 12:34 ` [PATCH 01/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container Ben Horgan
2025-11-08  4:31   ` Gavin Shan
2025-11-12 10:14     ` Ben Horgan
2025-11-12  5:45   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 02/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels Ben Horgan
2025-11-11  7:34   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 03/33] ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure Ben Horgan
2025-11-08  4:54   ` Gavin Shan
2025-11-10 15:51     ` Ben Horgan
2025-11-10 15:46   ` Jonathan Cameron
2025-11-10 16:28     ` Ben Horgan
2025-11-10 17:00   ` Jeremy Linton
2025-11-11 16:48     ` Ben Horgan
2025-11-12 20:22   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 04/33] ACPI / PPTT: Find cache level by cache-id Ben Horgan
2025-11-08  5:11   ` Gavin Shan
2025-11-10 16:02   ` Jonathan Cameron
2025-11-11 17:02     ` Ben Horgan
2025-11-12 20:23   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 05/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id Ben Horgan
2025-11-08  5:10   ` Gavin Shan
2025-11-10 16:04   ` Jonathan Cameron
2025-11-12 20:26   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 06/33] arm64: kconfig: Add Kconfig entry for MPAM Ben Horgan
2025-11-07 12:34 ` [PATCH 07/33] platform: Define platform_device_put cleanup handler Ben Horgan
2025-11-10  1:03   ` Gavin Shan
2025-11-10 16:07   ` Jonathan Cameron
2025-11-12 20:32   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 08/33] ACPI: Define acpi_put_table cleanup handler and acpi_get_table_ret() helper Ben Horgan
2025-11-10  1:03   ` Gavin Shan
2025-11-10 16:11   ` Jonathan Cameron
2025-11-12  7:02   ` Shaopeng Tan (Fujitsu)
2025-11-12 20:39   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 09/33] ACPI / MPAM: Parse the MPAM table Ben Horgan
2025-11-08  8:54   ` Gavin Shan
2025-11-10 16:27     ` Jonathan Cameron
2025-11-12 14:45     ` Ben Horgan
2025-11-10 16:23   ` Jonathan Cameron
2025-11-12  7:01   ` Shaopeng Tan (Fujitsu)
2025-11-12 14:55     ` Ben Horgan
2025-11-13  2:16   ` Fenghua Yu
2025-11-13 12:09     ` Ben Horgan
2025-11-13  2:33   ` Fenghua Yu
2025-11-13 14:24     ` Ben Horgan
2025-11-07 12:34 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Ben Horgan
2025-11-08  9:28   ` Gavin Shan
2025-11-10 16:44     ` Jonathan Cameron
2025-11-12 15:32     ` Ben Horgan
2025-11-10 16:58   ` Jonathan Cameron
2025-11-12 16:05     ` Ben Horgan
2025-11-12  7:22   ` Shaopeng Tan (Fujitsu)
2025-11-12 15:37     ` Ben Horgan
2025-11-13  2:46   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 11/33] arm_mpam: Add the class and component structures for firmware described ris Ben Horgan
2025-11-09  0:07   ` Gavin Shan
2025-11-12 16:39     ` Ben Horgan
2025-11-12 16:48       ` Ben Horgan
2025-11-10 17:10   ` Jonathan Cameron
2025-11-12 17:21     ` Ben Horgan
2025-11-12  7:29   ` Shaopeng Tan (Fujitsu)
2025-11-13  3:23   ` Fenghua Yu
2025-11-13 16:39     ` Ben Horgan
2025-11-07 12:34 ` [PATCH 12/33] arm_mpam: Add MPAM MSC register layout definitions Ben Horgan
2025-11-09  0:25   ` Gavin Shan
2025-11-07 12:34 ` [PATCH 13/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware Ben Horgan
2025-11-07 12:34 ` [PATCH 14/33] arm_mpam: Probe hardware to find the supported partid/pmg values Ben Horgan
2025-11-09  0:43   ` Gavin Shan
2025-11-10 23:26     ` Gavin Shan
2025-11-11  9:30       ` Ben Horgan
2025-11-12  7:57   ` Shaopeng Tan (Fujitsu)
2025-11-13  3:50   ` Fenghua Yu
2025-11-13 16:43     ` Ben Horgan
2025-11-07 12:34 ` [PATCH 15/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers Ben Horgan
2025-11-09  0:49   ` Gavin Shan
2025-11-12  8:03   ` Shaopeng Tan (Fujitsu)
2025-11-13  3:52   ` Fenghua Yu
2025-11-07 12:34 ` [PATCH 16/33] arm_mpam: Probe the hardware features resctrl supports Ben Horgan
2025-11-09 21:55   ` Gavin Shan
2025-11-12  8:17   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 17/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class Ben Horgan
2025-11-09 21:59   ` Gavin Shan
2025-11-12  8:24   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 18/33] arm_mpam: Reset MSC controls from cpuhp callbacks Ben Horgan
2025-11-09 22:11   ` Gavin Shan
2025-11-07 12:34 ` [PATCH 19/33] arm_mpam: Add a helper to touch an MSC from any CPU Ben Horgan
2025-11-09 22:13   ` Gavin Shan
2025-11-14  2:47   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 20/33] arm_mpam: Extend reset logic to allow devices to be reset any time Ben Horgan
2025-11-09 22:16   ` Gavin Shan
2025-11-13 20:24   ` Fenghua Yu
2025-11-14  2:52   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 21/33] arm_mpam: Register and enable IRQs Ben Horgan
2025-11-09 22:18   ` Gavin Shan
2025-11-07 12:34 ` [PATCH 22/33] arm_mpam: Use a static key to indicate when mpam is enabled Ben Horgan
2025-11-09 22:20   ` Gavin Shan
2025-11-14  4:37   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 23/33] arm_mpam: Allow configuration to be applied and restored during cpu online Ben Horgan
2025-11-09 22:59   ` Gavin Shan
2025-11-13 17:14     ` Ben Horgan
2025-11-10 17:27   ` Jonathan Cameron
2025-11-11 17:45     ` Ben Horgan
2025-11-14 10:33     ` Ben Horgan
2025-11-14 14:34       ` Ben Horgan
2025-11-07 12:34 ` [PATCH 24/33] arm_mpam: Probe and reset the rest of the features Ben Horgan
2025-11-09 23:01   ` Gavin Shan
2025-11-14  7:04   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 25/33] arm_mpam: Add helpers to allocate monitors Ben Horgan
2025-11-09 23:02   ` Gavin Shan
2025-11-14  7:14   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 26/33] arm_mpam: Add mpam_msmon_read() to read monitor value Ben Horgan
2025-11-09 23:13   ` Gavin Shan
2025-11-14 10:07     ` Ben Horgan
2025-11-12  5:33   ` Shaopeng Tan (Fujitsu)
2025-11-07 12:34 ` [PATCH 27/33] arm_mpam: Track bandwidth counter state for power management Ben Horgan
2025-11-09 23:15   ` Gavin Shan
2025-11-10 13:49   ` Zeng Heng
2025-11-10 17:31   ` Jonathan Cameron
2025-11-07 12:34 ` [PATCH 28/33] arm_mpam: Consider overflow in bandwidth counter state Ben Horgan
2025-11-09 23:16   ` Gavin Shan
2025-11-10 13:50   ` Zeng Heng
2025-11-10 17:32   ` Jonathan Cameron
2025-11-07 12:34 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters Ben Horgan
2025-11-09 23:16   ` Gavin Shan
2025-11-07 12:34 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported Ben Horgan
2025-11-09 23:17   ` Gavin Shan
2025-11-07 12:34 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state Ben Horgan
2025-11-09 23:18   ` Gavin Shan
2025-11-10 17:34   ` Jonathan Cameron
2025-11-07 12:34 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset Ben Horgan
2025-11-09 23:19   ` Gavin Shan
2025-11-07 12:34 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() Ben Horgan
2025-11-09 23:19   ` Gavin Shan
2025-11-07 12:47 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Ben Horgan
2025-11-07 21:22 ` Fenghua Yu
2025-11-07 23:22 ` Carl Worth
2025-11-10 16:15   ` Ben Horgan
2025-11-11  0:45     ` Carl Worth
2025-11-10  1:05 ` Gavin Shan
2025-11-10 13:56 ` Zeng Heng
2025-11-10 16:03 ` Ben Horgan
2025-11-11  7:09   ` Shaopeng Tan (Fujitsu)
2025-11-16 17:16 ` Drew Fustini
2025-11-18 14:11   ` Ben Horgan
2025-11-18 22:55     ` Drew Fustini
2025-11-19 10:00       ` Jonathan Cameron
2025-11-19 20:09         ` Drew Fustini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).